SITE RELIABILITY ENGINEER (Azure) | 3 Days a week Hybrid in Chicago, IL at Chicago, Illinois, USA |
Email: [email protected] |
Primary Role & Responsibilities SRE Engineer with real interest and experience in troubleshooting Linux systems, networking, monitoring, Databases, containers/Kubernetes, cloud technologies etc and a proven interest and experience in using software engineering to solve operational problems. comfortable writing code to automate API-driven tasks at scale. Python preferred. Architect and implement automations to auto-remediate/self-heal issues in production. participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes. Monitor the application ecosystem, jumping on bridges and resolving the issues. Having a good understanding of core DevOps and SRE practices and technologies. Be ready to participate in 24x365 on-call schedules and close it within 30 Minutes. Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity. Skills & Qualifications Overall 10+ years of experience with DevOps and SRE practices, technologies, and industry standards to make production reliable and resilient. Having experience of core DevOps and SRE technologies like: chaos engineering Ansible Docker Kubernetes, Helm Jenkins IaC via Terraform Prometheus, Grafana ELK stack Azure Cloud Stack Azure DevOps Expert Hands-on experience with provisioning and deploying infrastructures in Azure Public Cloud in a large scale enterprise environment with mission critical applications Expert Hands-on experience using Azure DevOps stack to build automated CI/CD pipelines for deploying applications and infrastructure Very Good understanding of application logs and Kubernetes events, application, and infrastructure metrics (Prometheus/Grafana/FluentD). You have experience in troubleshooting and understand the challenges of deploying applications in distributed systems and running them at scale Experience with Azure Public Cloud required. Experience with AWS, GCP, etc is a great plus. Experience of working with applications in Financial Services Industry is also a plus. Good understanding of Linux systems and Bash scripting. You have a passion for collaborating cross-functionally & cross-product on outage bridges to resolve issues within 30 Minutes and own the RCA for bridges. Review recurring incidents and identify improvement and automation opportunities and collaboration with product feature development teams. Knowledge of BMC products like ITSM and ADE would be a great plus. Willing to mentor and help team members to grow. Ability to explain technical concepts to business and technology stakeholders. -- Keywords: continuous integration continuous deployment information technology SITE RELIABILITY ENGINEER (Azure) | 3 Days a week Hybrid in Chicago, IL [email protected] |
[email protected] View all |
Fri Nov 15 19:19:00 UTC 2024 |