Site Reliability Manager || onsite dallas texas || 2nd round F2F at Dallas, Texas, USA |
Email: [email protected] |
From: Aditya Pratap Singh, Adventa tech [email protected] Reply to: [email protected] Job Description - Position: Manager, Site Reliability Location: Dallas Hybrid: 2-3 days onsite Contract to hire Interview process: 2 rounds, 1st is remote, 2nd is onsite with a coding test IT experience 12-15 years 3-5 + years of experience as a Lead Job Title: Manager, Site Reliability Job Summary: As the Manager, Site Reliability Engineer (SRE), you will lead a team of SREs responsible for the availability, performance, and scalability of our services. You will work closely with development, operations, and product teams to build and maintain reliable systems, implement best practices, and ensure seamless deployment processes. Your leadership will be pivotal in fostering a culture of reliability and continuous improvement. Key Responsibilities: Team Leadership: Manage and mentor a team of SREs, providing guidance, performance feedback, and professional development opportunities. Foster a collaborative and inclusive team environment, encouraging innovation and knowledge sharing. System Reliability: Design, implement, and maintain scalable, resilient, and high-performance systems. Develop and enforce reliability standards, best practices, and processes across the organization. Monitor and analyze system performance and reliability metrics, identifying areas for improvement. Incident Management: Lead incident response efforts, ensuring timely resolution of production issues. Conduct root cause analysis and post-mortems to prevent recurrence and improve system robustness. Develop and maintain incident response plans, including documentation and communication protocols. Automation and Tooling: Drive automation initiatives to reduce manual intervention, improve efficiency, and minimize downtime. Implement and maintain monitoring, alerting, and logging tools to ensure visibility into system health. Develop and maintain CI/CD pipelines to streamline deployment processes. Collaboration and Communication: Work closely with development teams to design and implement reliable and scalable applications. Collaborate with product teams to understand requirements and ensure reliability considerations are integrated into the development process. Communicate effectively with stakeholders, providing regular updates on system reliability and performance. Security and Compliance: Ensure systems adhere to security best practices and compliance requirements. Conduct regular security assessments and audits, implementing necessary improvements. Stay informed about emerging security threats and technologies, adapting practices as needed. Qualifications: Education and Experience: Bachelor's degree in Computer Science, Engineering, or a related field; Master's degree preferred. 7+ years of experience in Site Reliability Engineering, DevOps, or related roles. 3+ years of experience in a leadership or management position. Technical Skills: Proficiency in cloud platforms (AWS, Google Cloud Platform, Azure) and container orchestration (Kubernetes, Docker). Strong scripting and programming skills (Python, Go, Bash, etc.). Experience with infrastructure as code (Terraform, Ansible, etc.) and configuration management tools. Knowledge of networking, security, and database management. Soft Skills: Excellent leadership and team management abilities. Strong problem-solving and analytical skills. Effective communication and interpersonal skills. Ability to work in a fast-paced, dynamic environment and manage multiple priorities. Keywords: continuous integration continuous deployment information technology golang Site Reliability Manager || onsite dallas texas || 2nd round F2F [email protected] |
[email protected] View all |
Thu Jun 27 22:54:00 UTC 2024 |