Home

Site Reliability Manager || Hybrid at Remote, Remote, USA
Email: [email protected]
Role: Site Reliability Manager

Location: Dallas Hybrid 2-3 days
onsite

Duration:  6 + Months 

C2C

Visa: No h1b/cpt

MOI: Video

Job Summary:
 As the Manager, Site Reliability
Engineer (SRE), you will lead a team of SREs responsible for the availability,
performance, and scalability of our services. You will work closely with
development, operations, and product teams to build and maintain reliable
systems, implement best practices, and ensure seamless deployment processes.
Your leadership will be pivotal in fostering a culture of reliability and
continuous improvement.

Key Responsibilities:

Team Leadership:

Manage and mentor a team of SREs, providing guidance, performance
feedback, and professional development opportunities.

Foster a collaborative and inclusive team environment, encouraging
innovation and knowledge sharing.

System Reliability:

Design, implement, and maintain scalable, resilient, and
high-performance systems.

Develop and enforce reliability standards, best practices, and
processes across the organization.

Monitor and analyze system performance and reliability metrics,
identifying areas for improvement.

Incident Management:

Lead incident response efforts, ensuring timely resolution of
production issues.

Conduct root cause analysis and post-mortems to prevent recurrence
and improve system robustness.

Develop and maintain incident response plans, including
documentation and communication protocols.

Automation and Tooling:

Drive automation initiatives to reduce manual intervention, improve
efficiency, and minimize downtime.

Implement and maintain monitoring, alerting, and logging tools to
ensure visibility into system health.

Develop and maintain CI/CD pipelines to streamline deployment
processes.

Collaboration and Communication:

Work closely with development teams to design and implement
reliable and scalable applications.

Collaborate with product teams to understand requirements and
ensure reliability considerations are integrated into the development
process.

Communicate effectively with stakeholders, providing regular
updates on system reliability and performance.

Security and Compliance:

Ensure systems adhere to security best practices and compliance
requirements.

Conduct regular security assessments and audits, implementing
necessary improvements.

Stay informed about emerging security threats and technologies,
adapting practices as needed.

Qualifications:

Education and Experience:

Bachelor's degree in Computer Science, Engineering, or a related
field; Master's degree preferred.

7+ years of experience in Site Reliability Engineering, DevOps, or
related roles.

3+ years of experience in a leadership or management position.

Technical Skills:

Proficiency in cloud platforms (AWS, Google Cloud Platform, Azure)
and container orchestration (Kubernetes, Docker).

Strong scripting and programming skills (Python, Go, Bash, etc.).

Experience with infrastructure as code (Terraform, Ansible, etc.)
and configuration management tools.

Knowledge of networking, security, and database management.

Soft Skills:

Excellent leadership and team management abilities.

Strong problem-solving and analytical skills.

Effective communication and interpersonal skills.

Ability to work in a fast-paced, dynamic environment and manage
multiple priorities.

Regards;

Vivek Sah |
 Technical Recruiter

Largeton INC. | 
Tel : (571)568-4156

Email: [email protected]

--

Keywords: continuous integration continuous deployment information technology golang
Site Reliability Manager || Hybrid
[email protected]
[email protected]
View all
Mon Jul 08 21:19:00 UTC 2024

To remove this job post send "job_kill 1540027" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 1

Location: ,