Home

Site Reliability Engineering (SRE) with Java - Austin, XT (Onsite) - Only Locals at Austin, Texas, USA
Email: [email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2206600&uid=

From:

Manohar Reddy,

Procorp Systems Inc

[email protected]

Reply to: [email protected]

Site Reliability Engineering (SRE) with Java

Support/DevOps/Infra as code/Developer Support for our users/SRE etc.

Location: Austin, TX

Tools & Technologies Required

Python, Java, AWS, Kube, Jenkins, Docker, Splunk

JD:

Design, implement, and maintain highly available and scalable distributed systems.

Develop automation tools and scripts using Java, Python, or other relevant technologies to improve system reliability and efficiency.

Monitor, troubleshoot, and resolve production incidents, ensuring system uptime and performance.

Optimize infrastructure by implementing best practices in observability, logging, and monitoring (Prometheus, Grafana, ELK, etc.).

Collaborate with development teams to enhance CI/CD pipelines, automate deployments, and improve software delivery processes.

Ensure security, compliance, and infrastructure best practices across cloud and on-prem environments.

Conduct root cause analysis (RCA) for incidents and drive long-term improvements.

Improve system resilience through capacity planning, performance tuning, and failure recovery strategies.

Additional responsibilities

Ensure all the application components are running smoothly in the Kubernetes and AWS environment.

Support the components (patches / upgrades / issues / configurations) on the application Platform

Manage CI/CD pipelines for the application tools / components

Automation of Tasks to improve efficiency and effort reduction

Create and publish comprehensive dashboards for Observability

Configuring & Monitoring for Health Checks

User Provisioning

Monitoring & Remediation of Alerts

Alert the application team in the event of any potential issues related to infrastructure or components.

Create and Update Runbooks for standardized Operations

Acquire knowledge about the application platform (architecture, design, usage, typical problems faced by users, and their resolution) to reduce dependency on the application team for resolving support issues

Track and report the costing of AWS and other resources weekly.

Respond to users on application communication channels (Slack and support email group) and provide appropriate solutions.

Thanks & Regards

Manohar Reddy

Senior Technical Recruiter

Procorp Systems Inc

2222 W Spring Creek Pkwy, STE 202, Plano, Texas 75023

E-mail:

[email protected]

LinkedIn:

https://www.linkedin.com/in/manohar-reddy-nandammagari-07543b119/

Keywords: continuous integration continuous deployment Texas
Site Reliability Engineering (SRE) with Java - Austin, XT (Onsite) - Only Locals
[email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2206600&uid=
[email protected]
View All
03:24 AM 26-Feb-25


To remove this job post send "job_kill 2206600" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]


Time Taken: 7

Location: Austin, Texas