Urgent : Site Reliability Engineer : Hybrid in Bentonville, AR (Need local candidate) at Bentonville, Arkansas, USA |
Email: [email protected] |
From: Deepali Jha, SibiTalent [email protected] Reply to: [email protected] Hello, Hope you are doing well!! My name is Deepali Jha and I am a Staffing Specialist at Sibitalent. I am reaching out to you on an exciting job opportunity with one of our clients. Title: Site Reliability Engineer Location: Hybrid in Bentonville, AR Visa: No h1b/CPT Duration: 6+ Months Note: Need local candidate We are seeking a skilled Site Reliability Engineer (SRE) to join our team, ensuring the stability, scalability, and reliability of our systems and applications. The ideal candidate will have a blend of software engineering and operations expertise, dedicated to automating infrastructure, managing complex distributed systems, and enhancing the overall performance and reliability of our platforms. What are the day-to-day responsibilities 1. Incident Management and Troubleshooting * Proactively monitor systems using tools like Splunk, Prometheus, and Grafana, and respond to incidents swiftly. * Conduct thorough root cause analysis (RCA) and root cause corrective action (RCCA) to identify, resolve, and prevent recurrence of incidents. 2. Data-Driven Analysis and Reporting * Utilize SQL, BigQuery, and data analytics skills to generate reports, track metrics, and drive data-informed decisions to improve system reliability. * Conduct regular performance reviews and create reports on system health and incident trends for continuous improvement. 3. System Reliability and Optimization * Design, implement, and maintain reliable systems and applications, ensuring optimal performance and minimizing downtime. * Develop and enforce SLOs, SLIs, and SLAs to meet availability targets for critical systems. Key Responsibilities: 1. System Reliability and Optimization * Design, implement, and maintain reliable systems and applications, ensuring optimal performance and minimizing downtime. * Develop and enforce SLOs, SLIs, and SLAs to meet availability targets for critical systems. 2. Incident Management and Troubleshooting * Proactively monitor systems using tools like Splunk, Prometheus, and Grafana, and respond to incidents swiftly. * Conduct thorough root cause analysis (RCA) and root cause corrective action (RCCA) to identify, resolve, and prevent recurrence of incidents. 3. Automation and Infrastructure as Code (IaC) * Automate repetitive tasks and processes to increase efficiency, reduce manual intervention, and enhance system reliability. 4. Capacity Planning and Scalability * Monitor system performance and plan for capacity and scalability, ensuring resources are effectively managed as the organization grows. * Analyze system metrics to detect trends and anticipate future scaling needs. 5. Disaster Recovery and Business Continuity * Create and test disaster recovery plans, coordinating backup and restoration processes to minimize downtime. * Work with stakeholders to ensure systems are prepared for unexpected events, with clear, documented procedures. 6. Collaboration and Documentation * Collaborate with development teams to align reliability with application design, deployment, and maintenance. * Document key processes, troubleshooting guidelines, and best practices to support knowledge sharing and onboarding. 7. Data-Driven Analysis and Reporting * Utilize SQL, BigQuery, and data analytics skills to generate reports, track metrics, and drive data-informed decisions to improve system reliability. * Conduct regular performance reviews and create reports on system health and incident trends for continuous improvement. Preferred Qualifications: * Experience with cloud platforms (AWS, GCP, or Azure) * Proficiency with monitoring and observability tools * Strong analytical skills and experience with automation tools and scripting languages (Python, Bash, etc.) Thanks & Regards Deepali Jha Sr. Technical Recruiter E-Mail:[email protected] Website: www.sibitalent.com Keywords: access management Arkansas Urgent : Site Reliability Engineer : Hybrid in Bentonville, AR (Need local candidate) [email protected] |
[email protected] View all |
Thu Nov 21 04:12:00 UTC 2024 |