Home

Site Reliability Engineer (AWS) || Alpharetta, GA || NO H1B at Alpharetta, Georgia, USA
Email: [email protected]
From:

Shane,

Abdisolutions

[email protected]

Reply to:   [email protected]

Job Description

Job Title: SRE

Location: Alpharetta, GA or Remote

Visa : No h1b and OPT

Location: Hybrid Alpharetta preferred, but will consider remote candidates

Role Definition

This is a developed professional role for an AWS focused SRE. Individuals are responsible for basic reliability and toil reduction projects. At this level SREs can observe the performance of a system and configure proactive alerting to protect service levels. SREs are ready to join the on-call rotation. They can participate in disaster recovery tests in production environments. They may train new team members.

Scope and Key Responsibilities

Creates monitoring queries and establishes service level baselines

Supports senior engineers during incidents

Makes contributions during post-mortems and RCAs

Participates in disaster recovery testing

Implements automation and executes code in production environments

Contributes to SRE knowledge documentation

TOP 3 must-have skills:

a.) AWS

b.) Windows/Linux

c.) Troubleshooting

Technical Skills

Observability: Level 3

Able to create proactive alert rules that detect conditions that are urgent and actionable, so that alerts page support teams before users are impacted.

Can create and configure browser agents to monitor performance of apps including user satisfaction, JavaScript errors, session performance, and core web vitals.

Can create complex synthetic transactions that includes scripts to simulate user flow and functionality from the browser or APIs endpoints.

Able to create advanced Application Performance Monitoring (APM) and Browser distributed traces that gives insights into application performance.

Able to recommend and create Service Level Objectives using the latency, traffic, errors, and saturation Golden Signals

Incident Management: Level 3

Has the ability to create and/or present RCAs including the executive summary, timeline, detailed impact statement, follow-on actions, and residual risks.

Can lead scenario modelling exercises and the creation of workflows which are triggered by a breach of SLO

Able to participate on the on-call rotation and provide on-call support for other SRE engineers.

Can write advanced automation scripts for incident response including failovers and rollbacks.

Design for Reliability: Level 3

Can make theorical performance (latency, traffic) and capacity recommendations based on customer demand and growth estimates

Has good knowledge of DevOps practices including monitoring, virtual networks, cloud storage, containers and orchestration, CI/CD, configuration management, and securing cloud applications

Disaster Recovery: Level 3

Capable of participating on-call to assist in the recovery of Major Incidents (for production environments)

Can test system and component failover within and between geographic regions (for production environments)

Able to automate the recovery of systems and components using Infrastructure-as-Code and Configuration Management scripts.

Platforms and Automation: Level 3

Able to identify opportunities to improve the developer experience through leveraging using observability tools, paved road components, shared services, and self-service portals.

Able to improve software delivery performance by recommending and/or implementing automated build and release processes and removing manual tasks

Able to maintain and secure cloud environments such that it doesn't impact software delivery performance.

Reliability Culture: Level 3

Can contribute to SRE knowledge base articles and training material.

Able to analyze toil by looking at ticket trends and can make recommendation for the team on focus areas.

Can independently work on small toil elimination projects.

Behavioral Competencies

Collaboration and Teamwork

Customer & External Focus

Solves Problems and Analyses Issues

Learning Agility

Keywords: continuous integration continuous deployment information technology Georgia
Site Reliability Engineer (AWS) || Alpharetta, GA || NO H1B
[email protected]
[email protected]
View all
Wed May 15 00:46:00 UTC 2024

To remove this job post send "job_kill 1394347" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 8

Location: Alpharetta, Georgia