Required Sr Site Reliability Engineer (SRE) - Atlanta GA - Onsite. at Atlanta, Georgia, USA |
Email: adilmomentousa1@gmail.com |
https://jobs.nvoids.com/job_details.jsp?id=1963254&uid= Hi, Momento USA is a global technology consulting, talent acquisition and creative development firm that addresses clients' most pressing needs and challenges. We are currently looking for Sr Site Reliability Engineer (SRE) - Atlanta GA - Onsite. Please let me know if you are interested. Position: Senior Site Reliability Engineer (SRE) - Mission-Critical SaaS Cloud Products Location: Atlanta, GA - Onsite ## Key Responsibilities ### Reliability and Performance Management - Design, implement, and maintain highly available, scalable, and resilient cloud-native architectures for mission-critical SaaS products. - Develop and implement SLOs, SLIs, and SLAs to measure and improve service reliability. - Continuously optimize system performance and resource utilization across multiple cloud platforms. - Finetune/Optimize Application performance by analyzing the code, traces and database queries. ### Incident Management and Troubleshooting - Lead incident response efforts, effectively troubleshooting complex issues to minimize downtime and impact. - Reduce Mean Time to Recover (MTTR) through proactive monitoring, automated alerting, and efficient problem-solving techniques. - Conduct thorough Root Cause Analysis (RCA) for all major incidents and implement preventive measures. ### Observability and Monitoring - Design and implement end-to-end observability solutions across our distributed systems. - Develop and maintain comprehensive monitoring strategies using tools like ELK Stack, Prometheus, Grafana. - Create and optimize product status dashboards to provide real-time visibility into system health and performance. ### Automation and Infrastructure as Code (IaC) - Implement Infrastructure as Code practices using tools like Terraform. - Develop and maintain automated deployment pipelines and CI/CD workflows. - Create self-healing systems and automate routine operational tasks to reduce manual intervention. ### Cloud-Agnostic Architecture - Design and implement cloud-agnostic solutions that can operate efficiently across multiple cloud providers. - Develop expertise in event-driven architectures and related technologies (e.g., Apache Kafka/Eventhub, Redis, Mongo Atlas, IoTHub). - Implement and manage containerized applications using Kubernetes across different cloud environments. ### Continuous Improvement - Regularly review and refine operational practices to enhance efficiency and reliability. - Stay updated with the latest industry trends and technologies in SRE, cloud computing, and DevOps. - Contribute to the development of internal tools and frameworks to support SRE practices. ## Requirements - Strong knowledge of cloud platforms - Azure and their associated services. - Expert in Observability tools (ELK Stack, Dynatrace, Prometheus ) - Expertise in containerization technologies such as Docker and Kubernetes - Understanding of Event-driven architecture and database technologies (Mongo Atlas, Azure SQL, Postgres DB ) - Proficient in Iaac tools such as - Terraform and GitHub Actions. - Proficiency in one or more programming languages - Python/.Net/Java - Strong understanding of networking concepts, load balancing, and security practices. Thanks, Adil M Sr. Technical Lead Momento USA | Exceeding Customer Expectations Email: adil@momentousa.com -- Keywords: continuous integration continuous deployment database information technology Georgia Required Sr Site Reliability Engineer (SRE) - Atlanta GA - Onsite. adilmomentousa1@gmail.com https://jobs.nvoids.com/job_details.jsp?id=1963254&uid= |
adilmomentousa1@gmail.com View All |
09:57 PM 25-Nov-24 |