Site Reliability Engineer (AWS) || Alpharetta, GA || NO H1B at Alpharetta, Georgia, USA |
Email: [email protected] |
From: Shane, Abdisolutions [email protected] Reply to: [email protected] Job Description Job Title: SRE Location: Alpharetta, GA or Remote Visa : No h1b and OPT Location: Hybrid Alpharetta preferred, but will consider remote candidates Role Definition This is a developed professional role for an AWS focused SRE. Individuals are responsible for basic reliability and toil reduction projects. At this level SREs can observe the performance of a system and configure proactive alerting to protect service levels. SREs are ready to join the on-call rotation. They can participate in disaster recovery tests in production environments. They may train new team members. Scope and Key Responsibilities Creates monitoring queries and establishes service level baselines Supports senior engineers during incidents Makes contributions during post-mortems and RCAs Participates in disaster recovery testing Implements automation and executes code in production environments Contributes to SRE knowledge documentation TOP 3 must-have skills: a.) AWS b.) Windows/Linux c.) Troubleshooting Technical Skills Observability: Level 3 Able to create proactive alert rules that detect conditions that are urgent and actionable, so that alerts page support teams before users are impacted. Can create and configure browser agents to monitor performance of apps including user satisfaction, JavaScript errors, session performance, and core web vitals. Can create complex synthetic transactions that includes scripts to simulate user flow and functionality from the browser or APIs endpoints. Able to create advanced Application Performance Monitoring (APM) and Browser distributed traces that gives insights into application performance. Able to recommend and create Service Level Objectives using the latency, traffic, errors, and saturation Golden Signals Incident Management: Level 3 Has the ability to create and/or present RCAs including the executive summary, timeline, detailed impact statement, follow-on actions, and residual risks. Can lead scenario modelling exercises and the creation of workflows which are triggered by a breach of SLO Able to participate on the on-call rotation and provide on-call support for other SRE engineers. Can write advanced automation scripts for incident response including failovers and rollbacks. Design for Reliability: Level 3 Can make theorical performance (latency, traffic) and capacity recommendations based on customer demand and growth estimates Has good knowledge of DevOps practices including monitoring, virtual networks, cloud storage, containers and orchestration, CI/CD, configuration management, and securing cloud applications Disaster Recovery: Level 3 Capable of participating on-call to assist in the recovery of Major Incidents (for production environments) Can test system and component failover within and between geographic regions (for production environments) Able to automate the recovery of systems and components using Infrastructure-as-Code and Configuration Management scripts. Platforms and Automation: Level 3 Able to identify opportunities to improve the developer experience through leveraging using observability tools, paved road components, shared services, and self-service portals. Able to improve software delivery performance by recommending and/or implementing automated build and release processes and removing manual tasks Able to maintain and secure cloud environments such that it doesn't impact software delivery performance. Reliability Culture: Level 3 Can contribute to SRE knowledge base articles and training material. Able to analyze toil by looking at ticket trends and can make recommendation for the team on focus areas. Can independently work on small toil elimination projects. Behavioral Competencies Collaboration and Teamwork Customer & External Focus Solves Problems and Analyses Issues Learning Agility Keywords: continuous integration continuous deployment information technology Georgia Site Reliability Engineer (AWS) || Alpharetta, GA || NO H1B [email protected] |
[email protected] View all |
Wed May 15 00:46:00 UTC 2024 |