Home

Sr SITE RELIABILITY ENGINEER (10+ Years) | 3 Days a week Hybrid in Chicago, IL at Chicago, Illinois, USA
Email: [email protected]
Primary Role & Responsibilities

SRE Engineer with real interest and experience in troubleshooting Linux
systems, networking, monitoring, Databases, containers/Kubernetes, cloud
technologies etc and a proven interest and experience in using software
engineering to solve operational problems.

comfortable writing code to automate API-driven tasks at scale. Python
preferred.

Architect and implement automations to auto-remediate/self-heal issues in
production.

participate in SRE software engineering, writing code for the continuing
reduction of human intervention in operational tasks and automation of
processes.

Monitor the application ecosystem, jumping on bridges and resolving the issues.

Having a good understanding of core DevOps and SRE practices and technologies.

Be ready to participate in 24x365 on-call schedules and close it within 30
Minutes.

Scale systems sustainably through mechanisms like automation and evolve systems
by pushing for changes that improve reliability and velocity.

Skills & Qualifications

Overall 10+ years of experience with DevOps and SRE practices, technologies,
and industry standards to make production reliable and resilient.

Having experience of core DevOps and SRE technologies like:

chaos engineering

Ansible

Docker

Kubernetes, Helm

Jenkins

IaC via Terraform

Prometheus, Grafana

ELK stack

Azure Cloud Stack

Azure DevOps

Expert Hands-on experience with provisioning and deploying
infrastructures in Azure Public Cloud in a large scale enterprise environment
with mission critical applications

Expert Hands-on experience
using Azure DevOps stack to build automated CI/CD pipelines for deploying
applications and infrastructure

Very Good understanding of application logs and Kubernetes events, application,
and infrastructure metrics (Prometheus/Grafana/FluentD).

You have experience in troubleshooting and understand the challenges of
deploying applications in distributed systems and running them at scale

Experience with Azure Public Cloud required. Experience with AWS, GCP, etc is a
great plus. Experience of working with applications in Financial Services
Industry is also a plus.

Good understanding of Linux systems and Bash scripting.

You have a passion for collaborating cross-functionally & cross-product on
outage bridges to resolve issues within 30 Minutes and own the RCA for bridges.

Review recurring incidents and identify improvement and automation opportunities
and collaboration with product feature development teams.

Knowledge of BMC products like ITSM and ADE would be a great plus.

Willing to mentor and help team members to grow.

Ability to explain technical concepts to business and technology stakeholders.

--

Keywords: continuous integration continuous deployment information technology
Sr SITE RELIABILITY ENGINEER (10+ Years) | 3 Days a week Hybrid in Chicago, IL
[email protected]
[email protected]
View all
Tue Oct 22 18:53:00 UTC 2024

To remove this job post send "job_kill 1863613" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 1

Location: ,