Home

urgent and Direct client Site Reliability Engineer at Remote, Remote, USA
Email: [email protected]
Hi professionals 

Good Morning

How are you all

Hope you all are doing well

Below is the JD 
for the 
Site Reliability Engineer

 The urgent and Direct client 
Site Reliability Engineer

Job Title: Site Reliability Engineer (Azure)

Required Exp: 10+ (Mandatory) - 

Job Type: 
CONTRACT TO HIRE

Location: Mundelein IL | Hybrid

Position Description:

The ecommerce Platform Operations team is responsible for the stability, reliability, release and deployment of our B2B & B2C ecommerce platforms. The teams primary function is to increase the efficiency of the organization through well designed automation and infrastructure. As a Site Reliability engineer you will work closely with various infrastructure & application development teams to increase stability and reliability via the enablement of various Telemetry concepts. You will also be responsible for effective operations of the ecommerce platform via efficient automation & execution of operational processes. If youre someone who doesnt mind participating in on-call support, and enjoys troubleshooting production issues and implementing remediation, this position is for you!

Responsibilities:

Monitoring and maintaining the Development, Testing/QA, Staging and Production environments

Mitigating production performance issues effectively by taking responsibility for seeing those performance issues through resolution with the goal of automating to prevent problem recurrence

Configure monitors, alerts, Service Level Indications using various Telemetry technologies.

Create business friendly dashboards to monitor health of various production systems

Collaborate with teams within IT to implement cloud and/or hybrid systems that support the business goals

Monitor cloud-based systems and components for availability, performance, reliability, security, efficiency, and ability to meet non-functional requirements and service level agreements.

Work with Infrastructure as Code pipelines to automate the deployment of Cloud resources

Serve as liaison between application and Cloud team to provide guidance to application teams on application container/pod deployments

Investigate, troubleshoot and resolve any issues that impact the customer

Work to improve performance and reliability as the platform scales, driving continuous improvement through operational metrics.

Scale Cloud operations through best practices as applicable for configuration management, resource allocation, optimizing performance and capacity, compliance with security policies and requirements, and ensuring service-level agreements are met

Work with Azure cloud engineering team to operationalize Clients cloud vision.

Azure Platform. Understanding of Microsoft Azure Cloud platform with emphasis on Azure Infrastructure solutions including IaaS and PaaS based environments, and Azure based application monitoring and management

Technical Dialog. Lead technical sessions making use of whiteboards or other resources to drive solution discussions leveraging published solution architectures for common infrastructure implementations.

Enable proactive monitoring & alerting using Splunk log aggregation.

Prepare applications to work on Kubernetes, Docker, and other hosted systems

Work on automation using scripting and be able to integrate different tools.

Troubleshoot and help resolve telemetry system and software defects. Perform incident/disruption management and conduct root-cause analysis (RCA).

Work successfully within an Agile environment partnering with the Scrum Master

Document the work done, as well as mentor our FTE.

Required skills:

Expert level experience with operating ATG Commerce ecommerce platform (OR) building custom Java / Java EE customer-facing solutions on Azure Cloud environment (AKS).

3+ Years Azure Experience

Hands on experience with containerization, Kubernetes, and micro services.

Experience with Cloud infrastructure and application monitoring following methodologies such as RED or USE.

Familiarity with APM monitoring tools such as Splunk APM, AppDynamics and/or Azure AppInsights

Familiarity with Infrastructure monitoring tools such as Graphana, Prometheus, Azure monitor, Log Analytics (KQL queries)

Experience with log collection tools and analysis, as well as infrastructure performance and optimization practices

Experience with DevOps automation platforms such as Jenkins, Artifactory, ACR, and/or Azure DevOps

Experience with CI/CD provisioningand managingAzureInfrastructure

Participate in after-hours on-call rotation and after-hours maintenance window activities as needed

Experience performing Root Cause Analysis (RCA) for application and infrastructure related issues

Solid grasp of various performance monitoring methodologies, as well as 2+ years of hands-on experience configuring monitoring tools such as Azure Application Insights, New Relic, and Splunk is required. Strong experience with other telemetry tools, including AppDynamics, Extrahop, vSphere, Solarwinds Orion, SAM, etc. will be considered.

Top candidate will have experience or thorough understanding of incident workflows (preferably using New Relic). Must have experience enriching alerts for faster root-cause detection and incident resolution.

Must be experience configuring monitors for business transactions, service end points, etc., as well as setup health rules for triggering alerts.

Detailed knowledge of relational databases, Ex: MS SQL, MySQL (OR) NoSQL DB like Cosmos DB. Must be able to construct SQL queries and configure them with telemetry.

Strong scripting (bash, python, shell) skills.

Self-starter with the ability to quickly learn new tools and tool features. Must be able to handle multiple tasks and priorities within a fast-paced work environment

Must be highly motivated and dependable with excellent communication skills.

Bachelors in computer science or other four-year degree in a relevant field is required

Preferred Skills:

Experience using Terraform to perform infrastructure as code

Deep working knowledge with Azure networking, Application Gateway, APIM, IAM Policy and network security.

Able to deploy and manage Azure storage.

Experience with Azure Active Directory management and design experience a plus

Production support experience with E-commerce websites.

Experience with tracking, measuring, and reporting KPIs like MTBI, MTRS, MTTD, etc.

--

Thanks & Regards 

Akhil Reddy

Administrative Assistant

Keshav Consulting Solutions, LLC

Phone:  9
1
9-439-7374

Email: [email protected]

Address: 5470 McGinnis Village Place, Suite102, Alpharetta, GA, 30005,

Website: 
www.keshavconsulting.com

Keywords: continuous integration continuous deployment quality analyst database information technology microsoft Georgia Illinois
urgent and Direct client Site Reliability Engineer
[email protected]
[email protected]
View all
Thu Jun 27 21:44:00 UTC 2024

To remove this job post send "job_kill 1517186" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 35

Location: , Illinois