Job Details

Home

Site Reliability Manager || onsite dallas texas || 2nd round F2F at Dallas, Texas, USA

Email: [email protected]

From:

Aditya Pratap Singh,

Adventa tech

[email protected]

Reply to:   [email protected]

Job Description -
Position:            Manager, Site Reliability
Location:           Dallas
Hybrid:               2-3 days onsite
Contract to hire
Interview process:                       2 rounds, 1st is remote, 2nd is onsite with a coding test

IT experience 12-15 years

3-5 + years of experience as a Lead

Job Title: Manager, Site Reliability

Job Summary: As the Manager, Site Reliability Engineer (SRE), you will lead a team of SREs responsible for the availability, performance, and scalability of our services. You will work closely with development, operations, and product teams to build and maintain reliable systems, implement best practices, and ensure seamless deployment processes. Your leadership will be pivotal in fostering a culture of reliability and continuous improvement.

Key Responsibilities:
Team Leadership:
Manage and mentor a team of SREs, providing guidance, performance feedback, and professional development opportunities.
Foster a collaborative and inclusive team environment, encouraging innovation and knowledge sharing.
System Reliability:
Design, implement, and maintain scalable, resilient, and high-performance systems.
Develop and enforce reliability standards, best practices, and processes across the organization.
Monitor and analyze system performance and reliability metrics, identifying areas for improvement.
Incident Management:
Lead incident response efforts, ensuring timely resolution of production issues.
Conduct root cause analysis and post-mortems to prevent recurrence and improve system robustness.
Develop and maintain incident response plans, including documentation and communication protocols.
Automation and Tooling:
Drive automation initiatives to reduce manual intervention, improve efficiency, and minimize downtime.
Implement and maintain monitoring, alerting, and logging tools to ensure visibility into system health.
Develop and maintain CI/CD pipelines to streamline deployment processes.
Collaboration and Communication:
Work closely with development teams to design and implement reliable and scalable applications.
Collaborate with product teams to understand requirements and ensure reliability considerations are integrated into the development process.
Communicate effectively with stakeholders, providing regular updates on system reliability and performance.
Security and Compliance:
Ensure systems adhere to security best practices and compliance requirements.
Conduct regular security assessments and audits, implementing necessary improvements.
Stay informed about emerging security threats and technologies, adapting practices as needed.

Qualifications:
Education and Experience:
Bachelor's degree in Computer Science, Engineering, or a related field; Master's degree preferred.
7+ years of experience in Site Reliability Engineering, DevOps, or related roles.
3+ years of experience in a leadership or management position.
Technical Skills:
Proficiency in cloud platforms (AWS, Google Cloud Platform, Azure) and container orchestration (Kubernetes, Docker).
Strong scripting and programming skills (Python, Go, Bash, etc.).
Experience with infrastructure as code (Terraform, Ansible, etc.) and configuration management tools.
Knowledge of networking, security, and database management.
Soft Skills:
Excellent leadership and team management abilities.
Strong problem-solving and analytical skills.
Effective communication and interpersonal skills.
Ability to work in a fast-paced, dynamic environment and manage multiple priorities.

Keywords: continuous integration continuous deployment information technology golang
Site Reliability Manager || onsite dallas texas || 2nd round F2F
[email protected]

[email protected]
View all

Thu Jun 27 22:54:00 UTC 2024

To remove this job post send "job_kill 1517599" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

aditya@adventatech.com wrote:
From:

Aditya Pratap Singh,

Adventa tech

aditya@adventatech.com

Reply to:   aditya@adventatech.com

Job Description -
Position:            Manager, Site Reliability
Location:           Dallas
Hybrid:               2-3 days onsite
Contract to hire
Interview process:                       2 rounds, 1st is remote, 2nd is onsite with a coding test

IT experience 12-15 years

3-5 + years of experience as a Lead

Job Title: Manager, Site Reliability

Job Summary: As the Manager, Site Reliability Engineer (SRE), you will lead a team of SREs responsible for the availability, performance, and scalability of our services. You will work closely with development, operations, and product teams to build and maintain reliable systems, implement best practices, and ensure seamless deployment processes. Your leadership will be pivotal in fostering a culture of reliability and continuous improvement.

Key Responsibilities:
Team Leadership:
Manage and mentor a team of SREs, providing guidance, performance feedback, and professional development opportunities.
Foster a collaborative and inclusive team environment, encouraging innovation and knowledge sharing.
System Reliability:
Design, implement, and maintain scalable, resilient, and high-performance systems.
Develop and enforce reliability standards, best practices, and processes across the organization.
Monitor and analyze system performance and reliability metrics, identifying areas for improvement.
Incident Management:
Lead incident response efforts, ensuring timely resolution of production issues.
Conduct root cause analysis and post-mortems to prevent recurrence and improve system robustness.
Develop and maintain incident response plans, including documentation and communication protocols.
Automation and Tooling:
Drive automation initiatives to reduce manual intervention, improve efficiency, and minimize downtime.
Implement and maintain monitoring, alerting, and logging tools to ensure visibility into system health.
Develop and maintain CI/CD pipelines to streamline deployment processes.
Collaboration and Communication:
Work closely with development teams to design and implement reliable and scalable applications.
Collaborate with product teams to understand requirements and ensure reliability considerations are integrated into the development process.
Communicate effectively with stakeholders, providing regular updates on system reliability and performance.
Security and Compliance:
Ensure systems adhere to security best practices and compliance requirements.
Conduct regular security assessments and audits, implementing necessary improvements.
Stay informed about emerging security threats and technologies, adapting practices as needed.

Qualifications:
Education and Experience:
Bachelor's degree in Computer Science, Engineering, or a related field; Master's degree preferred.
7+ years of experience in Site Reliability Engineering, DevOps, or related roles.
3+ years of experience in a leadership or management position.
Technical Skills:
Proficiency in cloud platforms (AWS, Google Cloud Platform, Azure) and container orchestration (Kubernetes, Docker).
Strong scripting and programming skills (Python, Go, Bash, etc.).
Experience with infrastructure as code (Terraform, Ansible, etc.) and configuration management tools.
Knowledge of networking, security, and database management.
Soft Skills:
Excellent leadership and team management abilities.
Strong problem-solving and analytical skills.
Effective communication and interpersonal skills.
Ability to work in a fast-paced, dynamic environment and manage multiple priorities.

Keywords: continuous integration continuous deployment information technology golang 
Site Reliability Manager || onsite dallas texas  || 2nd round F2F
aditya@adventatech.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,