Senior SRE Lead :Manhattan, NY - Day 1 onsite at Day, New York, USA |
Email: [email protected] |
From: Upama, CBS [email protected] Reply to: [email protected] Senior SRE Lead : Manhattan, NY - Day 1 onsite Job Description: The Application Infrastructure (AI) SRE Ops & Support department is seeking a Site Reliability Engineer to drive the reliability engineering, operations and customer support services for client suite of IT Service Management (ITSM) products. AI SRE & Ops Support is a cornerstone of the Application Infrastructure organization in clients Technology Division. Responsibilities include: Building and maintaining knowledge front to back of Application Infrastructures IT Service Management products, and then specializing in one or two of them Maximizing the availability and performance of supported systems through optimized and automated plant management, ongoing problem management, and architecture reviews with dev-side peers Reduction of the cost of support (hours of effort) through the elimination of operational issues, optimization and automation of tasks, development of operational tools and driving client self-service to minimize constraints Identification and prioritization of technical debt that is impacting client developer productivity, reliability or the efficiency of the ops team Complex troubleshooting in a Linux environment Consult with clients (the Firms internal development community, IT service practitioners) to maximize their productivity, including troubleshooting the issues they have using the departments products Minimizing the escalation rate to the dev-side product delivery team members to ensure the department has the greatest possible flow of feature delivery Being operationally responsive, including sharing on-call rotation with the rest of the global team (with a time-off in lieu system) Required Qualifications / Skills: 10+ years required Standard RPE and excellent communication skills both written and verbal. Strong Linux skills Experience w/ Python for task automation Good communication skills Experience with Incident management processes Oncall support is required Strong Linux troubleshooting skills Task automation experience in any programming language Practical experience of at least one pillar of observability (metrics, logs or traces) Exhibit working knowledge in at least ONE of the following areas SQL REST services (API) Load balancing and networking Performance troubleshooting and resolution Confident collaboration skills Desired Skills Python development for task automation Experience with site reliability engineering practices, like service level objectives (SLOs), error budgets, blameless postmortems, toil reduction Prior experience creating operational dashboards (Splunk, Grafana, etc) Experience administering and/or supporting ServiceNow SRE, Standard RPE + excellent communication skills both written and verbal. Pl. create RR etc. Keywords: artificial intelligence information technology procedural language New York |
[email protected] View all |
Fri Aug 04 18:40:00 UTC 2023 |