Home

Senior SRE Lead :Manhattan, NY - Day 1 onsite at Day, New York, USA
Email: [email protected]
From:

Upama,

CBS

[email protected]

Reply to:   [email protected]

Senior SRE Lead : 

Manhattan, NY - Day 1 onsite   

Job Description:

The Application Infrastructure (AI) SRE Ops & Support department is seeking a Site Reliability Engineer to drive the reliability engineering, operations and customer support services for client suite of IT Service Management (ITSM) products. AI SRE & Ops Support is a cornerstone of the Application Infrastructure organization in clients Technology Division.

Responsibilities include:

Building and maintaining knowledge front to back of Application Infrastructures IT Service Management products, and then specializing in one or two of them

Maximizing the availability and performance of supported systems through optimized and automated plant management, ongoing problem management, and architecture reviews with dev-side peers

Reduction of the cost of support (hours of effort) through the elimination of operational issues, optimization and automation of tasks, development of operational tools and driving client self-service to minimize constraints

Identification and prioritization of technical debt that is impacting client developer productivity, reliability or the efficiency of the ops team

Complex troubleshooting in a Linux environment

Consult with clients (the Firms internal development community, IT service practitioners) to maximize their productivity, including troubleshooting the issues they have using the departments products

Minimizing the escalation rate to the dev-side product delivery team members to ensure the department has the greatest possible flow of feature delivery

Being operationally responsive, including sharing on-call rotation with the rest of the global team (with a time-off in lieu system)

Required Qualifications / Skills:

10+ years required

Standard RPE and excellent communication skills both written and verbal.

Strong Linux skills

Experience w/ Python for task automation

Good communication skills

Experience with Incident management processes

Oncall support is required

Strong Linux troubleshooting skills

Task automation experience in any programming language

Practical experience of at least one pillar of observability (metrics, logs or traces)

Exhibit working knowledge in at least ONE of the following areas

   SQL

   REST services (API)

   Load balancing and networking

   Performance troubleshooting and resolution

   Confident collaboration skills

Desired Skills

Python development for task automation

Experience with site reliability engineering practices, like service level objectives (SLOs), error budgets, blameless postmortems, toil reduction

Prior experience creating operational dashboards (Splunk, Grafana, etc)

Experience administering and/or supporting ServiceNow SRE, Standard RPE + excellent communication skills both written and verbal. Pl. create RR etc.                          

Keywords: artificial intelligence information technology procedural language New York
[email protected]
View all
Fri Aug 04 18:40:00 UTC 2023

To remove this job post send "job_kill 484491" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 25

Location: , New York