Home

Urgent Req - SRE Lead (10-15 years) Chicago, IL (3 days onsite/week) Contract at Chicago, Illinois, USA
Email: [email protected]
ONLY REPLY TO 
[email protected]

TO REVIEW PROFILE.

Hi Professional

I hope youre doing well!

My name is Abhay and I'm an IT Recruiter
at Diverse Lynx.

I have an urgent position for the
following role. If interested, please share your resume at [email protected] or call me at 732-452-1006 Ext 618

Job Title: SRE
Lead (10-15 years)

Location: Chicago, IL (local to Chicago area who can
travel to work in Chicago downtown 3 days a week)

Start Date:
ASAP (after clearing the client interview)

JOB DESCRIPTION:

SRE Role
Overview

Client is
looking for a Lead Software Engineer to join our Public Sector Core Framework
platform team. This individual will play a SRE (Site Reliability Engineer) role
in an Azure / Kubernetes ecosystem and help enable stability for the system,
leading to the continued success of Lumen.

Key
Responsibilities

Strengthen the teams SRE practices, starting from service level
indicator definitions, objectives, error budgets, thresholds, alerting and
error management systems.

Site Planning SRE will have to work with dev and testing teams to
plan changes to production and other systems.

Optimizing planned outages This includes optimizing DevOps area and
any other activity resulting in a planned outage.

Toil management Identify areas of high toil and find solutions for
improvement.

Leverage automation wherever possible to minimize workload, enhance
stability, and improve the overall functionality of the environment.

Alert management Strengthening areas with alerting, including
establishing goals, criteria, alert recalls, reset, enable/disable revising
error budget based on the toil undergone by teams.

Prevention of outages respond to non-critical alerts and work
closely with development and testing teams.

Verification Work closely with Load and Performance teams in
redefining parameters like load and concurrent users.

Incident management Chair meetings with development and operations
teams in the event of an incident.

Post Incident Reviews Derive learnings from issues and alerts along
with teams, inclusive of RCAs. Work on long term solutions which could include
changes in code, configuration, change in design/architecture or capacity
planning.

Reporting with Reliability Metrics This includes set of derived
metrics which includes Availability, Mean Time to Restore, Mean Time Between
Repairs and Probability of Failure.

Continuous improvement Development and maintain a backlog of SRE
improvements opportunities.

With company sponsorship, underdo necessary background checks to
obtain and maintain U.S. Federal Government Public Trust suitability
clearance.

Requirements

Knowledgeable within the Site Reliability Engineering discipline with
a proven track record of success.

Proficient with administering Azure systems.

Proficient with Kubernetes systems. Familiarly with Podman/Docker and
Helm Charts.

Proficient with Python.

Experience with GitHub.

Knowledgeable with resiliency / reliability design patterns.

Tobe a
match fit, candidate will have experience in:

Prometheus

AKS Monitoring

Grafana

Automation

Best Regards,

Abhay Singh

IT Recruiter

Diverse Lynx LLC.

Email: [email protected]

|
URL: http://www.diverselynx.com

LinkedIn ID: https://www.linkedin.com/in/abhaysingh-chauhan/

Diverse Lynx LLC|300
Alexander Park|Suite #200|Princeton, NJ 08540

--

Keywords: information technology wtwo Idaho Illinois New Jersey
Urgent Req - SRE Lead (10-15 years) Chicago, IL (3 days onsite/week) Contract
[email protected]
[email protected]
View all
Tue May 14 20:56:00 UTC 2024



Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Time Taken: 19

Location: Chicago, Illinois