Job Opening for Senior Site Reliability Engineer in Atlanta, GA (Required local candidate) at Atlanta, Georgia, USA |
Email: [email protected] |
From: Rakesh Sharma, Datum Software [email protected] Reply to: [email protected] Job Details: Senior Site Reliability Engineer Long term contract Atlanta, GA Qualifications: Manage and optimize data streaming and API components in OpenShift (on-premises) and AWS. Review and optimize application APIs and processes to enhance response times across various components. Automated testing processes, including data quality checks, production delivery, and deployment for production environments. Develop integrations between on-premises applications, AWS, and third-party tools (ServiceNow, VersionOne, Sumo). Collaborate with teams to define Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Lead performance monitoring and troubleshooting of platform applications, identifying root causes and documenting solutions. Evolve cloud infrastructure for the application suite by experimenting with new technologies and completing prototypes to assess benefits. Design and develop CI/CD pipelines to deploy application artifacts, including APIs and data process jobs. Configure and implement monitoring and alerting metrics to enable proactive issue detection by support teams. Maintain data integrity and access control using AWS security tools and services such as HSM and IAM. Develop and monitor AWS billing tools, generate cost reports, and implement cost optimization strategies. Work with security architects to design and implement data security tools, encryption, and key management. Address security vulnerabilities identified by audits and the wider security community and develop solutions for support teams to regularly scan and resolve issues. Monitor and analyze platform capacity and performance, collaborating with architecture teams to design elastic infrastructure for irregular traffic bursts. Contribute to the design and implementation of backup strategies for service restoration and disaster recovery. Provide continuous input to architecture, infrastructure, and application teams to improve design, performance, and security. BS in Computer Science or a related technical field, or equivalent practical experience. Desired Skillset: Strong expertise in AWS cloud platforms. Proficiency in automation, scripting, and monitoring tools, including OpenShift, CloudFormation, Terraform, Ansible, Shell, and Python. In-depth knowledge of infrastructure layers such as Linux OS, virtualization platforms, software-defined networking, load balancers, firewalls, API tools, monitoring tools, and storage/backup strategies. Extensive experience with enterprise systems and mission-critical application operations, including issue resolution. Experience with automating and operationalizing Development/QA using CI/CD tools such as GitLab, GitHub, Jenkins, Maven, Gradle, and Nexus. Working experience in Software Release Management. Minimum Experience: 3+ years in DevOps or SysOps engineering, focusing on major cloud platforms (preferably AWS). 2+ years of application development, including data streaming and deployment of high-availability critical application components. 1+ year in a Site Reliability Engineering (SRE) role preferred. Overall 7+ years of professional experience. Keywords: continuous integration continuous deployment quality analyst Georgia Job Opening for Senior Site Reliability Engineer in Atlanta, GA (Required local candidate) [email protected] |
[email protected] View all |
Mon Sep 23 23:01:00 UTC 2024 |