Job Details

Home

Urgent Required SRE Support With Artificial Intelligence Location : Remote at Remote, Remote, USA

Email: [email protected]

From:

Sarvan Kumar,

Rivago infotech

[email protected]

Reply to: [email protected]

Role: SRE Support with AI

Location: Remote

Duration: Long term Project

Act as production Gatekeeper for all changes (Product and infrastructure changes)

Perform detailed deep dive (root cause analysis) on the repeated system issues and work with engineering team for permanent solution

Provide support as Tier2 application/platform support for Optum AI applications

Periodic on call rotations and available outside of normal business hours on evenings and weekends during critical production release or issue escalation periods

Site Reliability Engineer (SRE) is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning

This role will be a member of a team that focuses on DevOps, DevSecOps and SRE for the Optum AI Organization

The role drives continuous improvement in delivery of resilient, scalable, performant, secure, and high-quality cloud-native services

Collaborating with SecOps, and development teams the SRE identifies cross-team issues which create risk for operations across the organization and resolving those issues with a mixture of engineering, troubleshooting expertise, and general operational guidance

Proactively drive improvement of enterprise cloud capabilities while creating best practices and tools to empower developers to create, deploy, and operationally support services

As a key contributor in the organization this role is responsible for the working with the Principal SRE and guiding junior team members in DevOps culture, highly scalable architectures, and lean development utilizing agile practices

Educate yourself and others on anything that helps service teams more quickly and easily build, test, deploy & run their services to be more reliable

Plan, design, deploy, and operate Site Reliability Engineering capabilities for cloud products & services

Recognize and address sub-standard performance based on key performance indicators (KPIs)

Build monitoring that alerts on symptoms rather than outages

Continuously build, automate, and improve upon capabilities that are secure, scalable, performant, and resilient

Work closely with Infrastructure, Network, Security, Architecture, and Development teams to build highly performing, scalable, and secure Azure/AWS/GCP (cloud) environments

Define needs by documenting processes; includes research, planning and writing supporting documentation

Participate in regulatory and compliance activities as necessary

Periodic on call rotations and available outside of normal business hours on evenings and weekends during critical production release or issue escalation periods

Responsible for remediating the security vulnerabilities which are discovered in the non-production and production scans.

Participate in the new vendor/product/service onboarding and assess partner technical readiness (Such as Azure AI studio, Azure model catalog, AWS sage maker).

.Develop or maintain dashboard for operational analysis and status reports.

Perform Operational Readiness testing for every release package to proactively predict any performance degradations across all components of a critical asset. (For example Portal, Workspace creation, Project creation, Model Inference and API Response times)

Keywords: artificial intelligence
Urgent Required SRE Support With Artificial Intelligence Location : Remote
[email protected]

[email protected]
View all

Tue May 28 21:21:00 UTC 2024

To remove this job post send "job_kill 1430517" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

sarvan@rivagoinfotech.com wrote:
From:

Sarvan Kumar,

Rivago infotech

sarvan@rivagoinfotech.com

Reply to:   sarvan@rivagoinfotech.com

Role: SRE Support with AI

Location: Remote

Duration: Long term Project

Act as production Gatekeeper for all changes (Product and infrastructure changes)

Perform detailed deep dive (root cause analysis) on the repeated system issues and work with engineering team for permanent solution

Provide support as Tier2 application/platform support for Optum AI applications

Periodic on call rotations and available outside of normal business hours on evenings and weekends during critical production release or issue escalation periods

Site Reliability Engineer (SRE) is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning

This role will be a member of a team that focuses on DevOps, DevSecOps and SRE for the Optum AI Organization

The role drives continuous improvement in delivery of resilient, scalable, performant, secure, and high-quality cloud-native services

Collaborating with SecOps, and development teams the SRE identifies cross-team issues which create risk for operations across the organization and resolving those issues with a mixture of engineering, troubleshooting expertise, and general operational guidance

Proactively drive improvement of enterprise cloud capabilities while creating best practices and tools to empower developers to create, deploy, and operationally support services

As a key contributor in the organization this role is responsible for the working with the Principal SRE and guiding junior team members in DevOps culture, highly scalable architectures, and lean development utilizing agile practices

Educate yourself and others on anything that helps service teams more quickly and easily build, test, deploy & run their services to be more reliable

Plan, design, deploy, and operate Site Reliability Engineering capabilities for cloud products & services

Recognize and address sub-standard performance based on key performance indicators (KPIs)

Build monitoring that alerts on symptoms rather than outages

Continuously build, automate, and improve upon capabilities that are secure, scalable, performant, and resilient

Work closely with Infrastructure, Network, Security, Architecture, and Development teams to build highly performing, scalable, and secure Azure/AWS/GCP (cloud) environments

Define needs by documenting processes; includes research, planning and writing supporting documentation

Participate in regulatory and compliance activities as necessary

Periodic on call rotations and available outside of normal business hours on evenings and weekends during critical production release or issue escalation periods

Responsible for remediating the security vulnerabilities which are discovered in the non-production and production scans.

Participate in the new vendor/product/service onboarding and assess partner technical readiness (Such as Azure AI studio, Azure model catalog, AWS sage maker).

.Develop or maintain dashboard for operational analysis and status reports.

Perform Operational Readiness testing for every release package to proactively predict any performance degradations across all components of a critical asset. (For example  Portal, Workspace creation, Project creation, Model Inference and API Response times)

Keywords: artificial intelligence 
Urgent Required SRE Support With Artificial Intelligence Location : Remote
sarvan@rivagoinfotech.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,