| Need - L Site Reliability Engineer (SRE), Only -Remote max $55 hr at Max, North Dakota, USA |
| Email: [email protected] |
|
http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=2115796&uid= From: bhavani, Brillius [email protected] Reply to: [email protected] Job Title: Lead Site Reliability Engineer (SRE),Only ,PP Num Must Location : Remote Job Description: We are seeking a highly experienced Lead Site Reliability Engineer (SRE) to drive reliability, performance, and uptime across our critical systems and services. In this role, you will be responsible for defining and implementing best practices around SRE principles, incident management, and continuous improvement. You will lead a team of SREs, collaborate closely with product, development, and operations teams, and ensure our platform meets the highest standards for availability and scalability. Key Responsibilities: Define and lead the implementation of SRE best practices, including SLIs (Service Level Indicators), SLOs (Service Level Objectives), and error budgets to ensure service reliability and performance. Contribute to scalable, robust, and highly available infrastructure across on-premise, cloud, or hybrid environments (P, Kubernetes). Oversee incident management and root cause analysis processes, driving swift resolution and fostering a blameless post-mortem culture. Develop and refine CI/CD pipelines and automated deployment strategies to achieve faster, more reliable releases. Implement and maintain monitoring, observability, and alerting solutions (e.g., Prometheus, Grafana, Splunk) to provide actionable insights and proactive issue detection. Collaborate with cross-functional teams (Development, QA, Product, and Security) to ensure new features and services meet reliability and performance standards from design through production. Mentor and coach junior engineers, promoting a culture of knowledge-sharing, continuous improvement, and technical excellence. Continuously evaluate and ad new tools and technologies that can enhance system reliability, scalability, and developer productivity. Requirements and Qualifications: Bachelors or Masters degree in Computer Science, Engineering, or related field, or equivalent experience. 7+ years of hands-on experience in DevOps, SRE, or systems engineering, including at least 2+ years in a lead or senior-level role. Strong knowledge of cloud platforms (P, Kubernetes) and container orchestration (Kubernetes, Docker). Proven expertise with monitoring and observability tools (Prometheus, Grafana, Datadog, Splunk, or similar). Understanding of distributed systems, network protocols, and microservices architecture. Proficiency in at least one scripting or programming language (Python, scripts, etc.). Experience implementing SLIs, SLOs, and error budgets to align engineering initiatives with business goals. Strong problem-solving skills, able to diagnose complex issues in distributed systems under time pressure. Excellent communication and leadership skills, with experience guiding teams through critical incidents and long-term improvement initiatives. Soft Skills: Effective communication in both technical and non-technical contexts. Collaborative mindset, eager to partner with cross-functional colleagues. Ability to prioritize tasks and manage multiple objectives concurrently. High level of accountability and sense of ownership over systems and processes. Keywords: continuous integration continuous deployment quality analyst Need - Lead Site Reliability Engineer (SRE), Only -Remote max $55 hr [email protected] http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=2115796&uid= |
| [email protected] View All |
| 10:04 PM 27-Jan-25 |