Job Details

Home

Site Reliability Engineer at Remote, Remote, USA

Email: [email protected]

From:

vandana,

CCIT

[email protected]

Reply to: [email protected]

Hi,

Hope you are doing well! Requesting you to please have a look on below requirement and let me know your interest. If you find yourself comfortable with the required skills kindly, share your updated resume and let me know the best time for a quick discussion.

Need Local only IL- Candidate with local DL..

Site Reliability Engineer

Location
- Onsite 3 days a week in Riverwoods, IL- Chicago

Duration - Long Term Contract

Expert Application Engineer (SRE)

Job Description
:

As an Application Reliability Engineer, youll tap into your passion for finding and fixing inefficiencies to solve our reliability and performance issues. In our Agile environment, youll focus on availability, latency, performance, efficiency, change and problem management, monitoring, emergency response and capacity planning of our services. Your projects will deliver enhanced infrastructure, development, and deployment automation.

At a minimum, heres what we need: 8+ Years Information Technology, (Software) Engineering, or related

Responsibilities:

Analyse, design, program, test, and deploy new user stories and features with high quality (security, reliability, operations) to production

Achieves team commitments (and influence others to do the same) by using informal leadership & highly developed communication skills

Has an oversight on design decisions and guides team to achieve key results for products assigned to them

Remediates issues using engineering principles and creates proactive design solutions for potential failures

Work with a team of site reliability engineers that is responsible for building the continuous reliability mindset, shepherding problem management, and driving key site reliability engineering practices into the organization.

Design and drive monitoring, alerting, ticket reporting strategies to measure SLA, SLO, MTTI, MTTR. Etc. and align with management expectations to reduce/minimize prod downtime.

Guide site reliability automation to help eliminate manual toil and create a self-healing capability

Participate in selection of appropriate automation tools, defining technology, quality, experience and implementation standards and practices within own technical domain.

Fosters a culture of excellence and continuous learning within the chapter. Establishes and tracks to appropriate OKRs to ensure outcomes are met.

Creates solutions addressing high impact technology and business priorities

Competent in multiple contexts, such as programming languages, security, automation, testing, infrastructure, and performance and is the go-to person for many people (inside and outside of their team)

Proactively identifies and mitigates issues based on intuition and experience in multiple domains

Must Have Skills:

Experienced with AWS Cloud

Experienced in building and managing OCP clusters, deploy applications into OCP

Experience with SRE design to address reliability and resiliency with availability of 5-9s

Experience in managing caching solutions like Hazelcast, GemFire or Terracota

Experience in setting up and managing Kafka

High level of familiarity with the Linux command line and scripting

Extremely comfortable with production environments, firewalls, and networking

Strong experience in deploying, observing, altering, logging, and monitoring systems (Splunk, Datadog, AppDynamics, Instana) with a mindset towards predictive analysis.

Working knowledge of the automation tools such as Ansible, Terraform, or Chef

Experience in performing RCA, Disaster Recovery activities, Chaos Engineering

Good to have Skills:

Highly preferred experience working in the payments industry

Deep knowledge and understanding of emerging trends in the SRE field.

Experience developing in Java (or other similar languages)

Studied architectural patterns at scale, including thoughtfully designed APIs, repeatable delivery pipelines, and efficient computer engineering principles.

Working knowledge of messaging services like RabbitMQ, SQS, Kafka

Strong Experience with Continuous Integration and Continuous Delivery models including Blue/Green and/or Canary release models

Tools & Technologies:

Open-shift Container Platform

(Splunk, Datadog, AppDynamics, Instana)

HazelCast.

Ansible, Terraform, or Chef

RabbitMQ, SQS, Kafka

Linux VMs , Shell Scripting

AWS Cloud

Postgress Database

Thanks & Regards,

Vandana

Technical Recruiter

|
[email protected]

CCIT, Inc |Empowering Enterprise eBusiness

www.ccitinc.com | 115 N Center St, Suite 202, Northville, MI - 48167

Keywords: golang Illinois Michigan
Site Reliability Engineer
[email protected]

[email protected]
View all

Thu Oct 17 22:06:00 UTC 2024

To remove this job post send "job_kill 1852373" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

vandana@ccitinc.com wrote:
From:

vandana,

CCIT

vandana@ccitinc.com

Reply to:   vandana@ccitinc.com

Hi,

Hope you are doing well! Requesting you to please have a look on below requirement and let me know your interest. If you find yourself comfortable with the required skills kindly, share your updated resume and let me know the best time for a quick discussion.

Need Local only IL- Candidate with local DL..

Site Reliability Engineer

Location 
-  Onsite 3 days a week in Riverwoods, IL- Chicago

Duration -  Long Term Contract

Expert Application Engineer (SRE)

Job Description
:

As an Application Reliability Engineer, youll tap into your passion for finding and fixing inefficiencies to solve our reliability and performance issues. In our Agile environment, youll focus on availability, latency, performance, efficiency, change and problem management, monitoring, emergency response and capacity planning of our services. Your projects will deliver enhanced infrastructure, development, and deployment automation.

At a minimum, heres what we need:  8+ Years  Information Technology, (Software) Engineering, or related

Responsibilities:

Analyse, design, program, test, and deploy new user stories and features with high quality (security, reliability, operations) to production

Achieves team commitments (and influence others to do the same) by using informal leadership & highly developed communication skills

Has an oversight on design decisions and guides team to achieve key results for products assigned to them

Remediates issues using engineering principles and creates proactive design solutions for potential failures

Work with a team of site reliability engineers that is responsible for building the continuous reliability mindset, shepherding problem management, and driving key site reliability engineering practices into the organization.

Design and drive monitoring, alerting, ticket reporting strategies to measure SLA, SLO, MTTI, MTTR. Etc.  and align with management expectations to reduce/minimize prod downtime.

Guide site reliability automation to help eliminate manual toil and create a self-healing capability

Participate in selection of appropriate automation tools, defining technology, quality, experience and implementation standards and practices within own technical domain.

Fosters a culture of excellence and continuous learning within the chapter. Establishes and tracks to appropriate OKRs to ensure outcomes are met.

Creates solutions addressing high impact technology and business priorities

Competent in multiple contexts, such as programming languages, security, automation, testing, infrastructure, and performance and is the go-to person for many people (inside and outside of their team)

Proactively identifies and mitigates issues based on intuition and experience in multiple domains

Must Have Skills:

Experienced with AWS Cloud

Experienced in building and managing OCP clusters, deploy applications into OCP

Experience with SRE design to address reliability and resiliency with availability of 5-9s

Experience in managing caching solutions like Hazelcast, GemFire or Terracota

Experience in setting up and managing Kafka

High level of familiarity with the Linux command line and scripting

Extremely comfortable with production environments, firewalls, and networking

Strong experience in deploying, observing, altering, logging, and monitoring systems (Splunk, Datadog, AppDynamics, Instana) with a mindset towards predictive analysis.

Working knowledge of the automation tools such as Ansible, Terraform, or Chef

Experience in performing RCA, Disaster Recovery activities, Chaos Engineering

Good to have Skills:

Highly preferred experience working in the payments industry

Deep knowledge and understanding of emerging trends in the SRE field.

Experience developing in Java (or other similar languages)

Studied architectural patterns at scale, including thoughtfully designed APIs, repeatable delivery pipelines, and efficient computer engineering principles.

Working knowledge of messaging services like RabbitMQ, SQS, Kafka

Strong Experience with Continuous Integration and Continuous Delivery models including Blue/Green and/or Canary release models

Tools & Technologies:

Open-shift Container Platform

(Splunk, Datadog, AppDynamics, Instana)

HazelCast.

Ansible, Terraform, or Chef

RabbitMQ, SQS, Kafka

Linux VMs , Shell Scripting

AWS Cloud

Postgress Database

Thanks & Regards,

Vandana

Technical Recruiter

|    
vandana@ccitinc.com

CCIT, Inc |Empowering Enterprise eBusiness

www.ccitinc.com | 115 N Center St, Suite 202, Northville, MI - 48167

Keywords: golang Illinois Michigan 
Site Reliability Engineer
vandana@ccitinc.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 103

Location: , Indiana