Site Reliability Engineer - API Enablement W2 at Remote, Remote, USA |
Email: shivani.sharma@adventatech.com |
UID: 0be54aa95edb43119fc90d6a9740ebe3 From: shivani sharma, adventatech shivani.sharma@adventatech.com Reply to: shivani.sharma@adventatech.com Job Description - Site Reliability Engineer - API Enablement W2 Linked In Location: ONSITE Hybrid> 3 days a week in either Hartford CT - Charlotte, NC - Chicago, IL - Columbus, OH (No Remote) Must Have Skills: - Site Reliability Engineering (SRE) including demonstrated ability to create dashboards in Splunk, Dynatrace, or similar tools, setup monitoring, SRE best practices - API Solutions (API Connect, Apigee, Azure API Management, or similar), API delivery, API Management - Full stack software engineering, including DevOps, Performance Testing, Automation tools and scripts Job Description: We are seeking a highly skilled and experienced API Enablement SRE Senior Staff Engineer to join our Team. The ideal candidate will have a strong background in managing and optimizing complex systems, ensuring their reliability, scalability, and performance. This role focuses on enhancing our API Management Platforms and integrating SRE best practices. Key Responsibilities: API Platform and Enablement Team: Design, implement, and maintain reliable and scalable SRE practices for API Management Platforms. Strong knowledge and experience in API solutions, platforms, API delivery, and API management. Strengthen the maturity of SRE practices by building on and executing improvements to observability, resiliency, and stability. Assess ecosystem changes to determine risk, impact, and checkout needs for API and Integration Platforms. Proactively consider SRE improvements and GenAI opportunities, create solutions, and successfully execute them. Create self-service capabilities to enable API provider teams to easily integrate with SRE API best practices. Incident Management and On-Call Rotation: Lead incident management, structured triage, and analysis, including the creation and management of incident runbooks. Participate in on-call rotations for incidents and changes, including evenings and weekends. Conduct problem analysis, remediation, and continuous improvement to enhance system reliability. Views, Dashboards, and Unified Views: Implement and maintain observability and monitoring solutions, including Splunk and Dynatrace. Create unified views, dashboards, and visualizations to provide a single pane of glass and information radiators for system health and performance. Create unified views that can be shared across stakeholders to quickly align on the issue root cause. Resiliency and Strengthening SRE Maturity: Design, implement, and maintain reliable and scalable systems and infrastructure. Lead the team in SRE and proactive risk mitigation, including resiliency and disaster recovery exercises, change management, and upgrades and patches. Level up SRE maturity and demonstrate it through the achievement of KPIs and operational metrics. Performance and Automation: Monitor and optimize the performance, availability, and reliability of systems and applications. Develop and maintain automation tools and scripts to streamline operations and improve efficiency. Risk Management and Metrics: Define, operationalize, and integrate SRE-related KPIs, metrics, and ideas into day-to-day activities. Proactively manage risks, including assessment of findings, planning remediation, and executing to bring prompt closure to resolve risks Qualifications: Strong knowledge and experience in API solutions, platforms, API delivery, and API management. Knowledge and skills in API Platforms (e.g., API Connect, Apigee, AWS API Gateway) and API Management. 5+ years of experience in site reliability engineering or a related field. Expertise in SRE best practices, including incident management, resiliency, monitoring, detection, diagnosis, remediation, and prevention. Demonstrated experience in being on call and resolving incidents, including incident management and root cause analysis. Experience with large-scale distributed systems. Knowledge of CI/CD pipelines and DevOps practices. Experience with cloud platforms (e.g., AWS, Azure, GCP) Strong knowledge of system design, development, and management. Full stack software engineering skill set, including front-end, back-end, and database development. Proficiency in scripting languages (e.g., Python, Bash) and automation tools (e.g., Ansible, Terraform). Familiarity with monitoring and observability tools (e.g., Splunk, Dynatrace). Demonstrated ability to mature SRE practices and strengthen stability through proven KPIs and metrics. Excellent documentation, communication, problem-solving, and collaboration skills. Experience with GenAI and innovation, and a commitment to continuous improvement Keywords: continuous integration continuous deployment information technology wtwo Connecticut Illinois North Carolina Ohio Site Reliability Engineer - API Enablement W2 shivani.sharma@adventatech.com https://jobs.nvoids.com/job_details.jsp?id=2266393 |
shivani.sharma@adventatech.com View All |
01:53 AM 19-Mar-25 |