Opening for Site Reliability Engineer (SRE) Location: Orlando, FL (Day 1 onsite) at Orlando, Florida, USA |
Email: [email protected] |
From: Rahman, Webster Tech Solutions Inc [email protected] Reply to: [email protected] Position: Site Reliability Engineer (SRE) Location: Orlando, FL (Day 1 onsite) Job Description: Site Reliability Engineer (SRE) or Production Reliability Engineer (PRE) Job Description: We are seeking an accomplished and driven SRE/PRE Layer 3 to join our forward-thinking team. As an SRE/PRE Layer 3, you will be a key contributor in architecting, designing, and maintaining highly reliable and scalable systems. You will collaborate with cross-functional teams to develop advanced automation, implement best practices, and drive the evolution of our infrastructure and reliability initiatives. Responsibilities: Lead the design, implementation, and management of complex systems architecture that emphasizes reliability, scalability, and performance. Collaborate closely with engineering teams to set and uphold service-level objectives (SLOs) and work on continuous improvements to achieve these goals. Mentor and guide junior members of the SRE/PRE team, fostering their technical growth and professional development. Solve intricate technical challenges across the entire technology stack, from hardware and infrastructure to applications and databases. Develop and implement robust automation solutions for deployment, configuration management, and infrastructure provisioning. Play a pivotal role in capacity planning, performance tuning, and optimizing systems for seamless scalability. Drive the establishment of comprehensive monitoring, alerting, and logging strategies to ensure prompt identification and resolution of issues. Participate in on-call rotations and respond promptly to incidents, taking ownership of resolution and post-incident analysis. Continuously advance best practices and processes, promoting a culture of reliability and operational excellence. Collaborate with stakeholders to ensure alignment between development and operations, contributing to product evolution and enhancements. Qualifications: Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience). 7+ years of experience in an SRE, PRE, or similar role, demonstrating a proven track record in driving system reliability and performance. Proficiency in programming languages such as Python, Go, or similar for automation and tool development. Expertise in cloud platforms (e.g., AWS, GCP, Azure) and container technologies (e.g., Kubernetes, Docker). Deep understanding of networking, operating systems, and distributed systems architecture. Experience with infrastructure as code tools (e.g., Terraform, Ansible) for provisioning and configuration management. Strong grasp of observability tools and practices (e.g., Prometheus, Grafana, ELK stack). Exceptional troubleshooting skills and the ability to diagnose complex technical issues. Outstanding communication skills to collaborate effectively with diverse teams. Proactive mindset and a focus on delivering exceptional customer experiences. Optional: Relevant certifications such as Certified Kubernetes Administrator, AWS DevOps Professional, or similar. (1.) To ensure customer engagement or satisfaction and reference ability (2.) To plan for Program and Delivery Management and ensure that the agreed deliverables in terms of margin are met. (3.) To anchor process improve mentorcompliance (human error reporting) and other organizational initiatives (automation , Lean IT implemetation) (4.) To guide, manage, develop, engage the team therby ensuring employee retention (5.) To ensure upskillor creation of resources through internal academiesor trainings and growth rotation Keywords: information technology golang Florida Opening for Site Reliability Engineer (SRE) Location: Orlando, FL (Day 1 onsite) [email protected] |
[email protected] View all |
Fri Jun 28 03:22:00 UTC 2024 |