Site Reliability Engineering --------Atlanta,GA ------PP is MUST at Atlanta, Georgia, USA |
Email: [email protected] |
From: Bavana, kksoftwareassociates [email protected] Reply to: [email protected] Hi all, hope you are doing well, Must have Google GCP & SRE Responsibilities: Design, implement, and maintain highly available and scalable infrastructure and systems to meet service-level objectives (SLOs) and service-level agreements (SLAs). Collaborate with cross-functional teams to define reliability requirements, establish monitoring and alerting strategies, and implement proactive measures to prevent and mitigate incidents. Develop automation scripts and tools for deployment, configuration management, monitoring, and incident response to streamline operations and improve efficiency. Conduct post-incident reviews (PIRs) and root cause analysis (RCAs) to identify underlying issues, implement corrective actions, and prevent recurrence. Participate in capacity planning, performance tuning, and optimization efforts to ensure optimal resource utilization and cost-effectiveness. Monitor system performance and health metrics, analyze trends, and identify opportunities for optimization and improvement. Implement and maintain disaster recovery (DR) and business continuity (BC) plans, including backup and restore procedures, failover mechanisms, and redundancy strategies. Stay up-to-date with industry best practices, emerging technologies, and trends in site reliability engineering, continuously seeking opportunities to enhance our infrastructure and processes. Provide mentorship and guidance to junior engineers, fostering a culture of learning, collaboration, and continuous improvement. Keywords: |
[email protected] View all |
Wed Mar 13 21:24:00 UTC 2024 |