| SiteReliabilityEngineering (SRE) Location:Austin,TX - Looking for Ex Apple at Austin, Texas, USA |
| Email: [email protected] |
|
http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=2325949&uid= From: Manohar Reddy, Procorp Systems Inc [email protected] Reply to: [email protected] Site Reliability Engineering (SRE) Location: Austin, TX Looking for Ex Apple Tools & Technologies Required Python, Java, AWS, Kube, Jenkins, Docker, Splunk, Golang Design, implement, and maintain highly available and scalable distributed systems. Develop automation tools and scripts using Java, Python, or other relevant technologies to improve system reliability and efficiency. Ensure the uptime, reliability, and performance of production systems Automate operational processes and eliminate manual intervention Collaborate with developers to build scalable and resilient infrastructure Monitor and troubleshoot systems, identifying and resolving issues proactively Implement and maintain monitoring, logging, and alerting systems Participate in on-call rotation for production incident response Monitor, troubleshoot, and resolve production incidents, ensuring system uptime and performance. Optimize infrastructure by implementing best practices in observability, logging, and monitoring (Prometheus, Grafana, ELK, etc.). Collaborate with development teams to enhance CI/CD pipelines, automate deployments, and improve software delivery processes. Ensure security, compliance, and infrastructure best practices across cloud and on-prem environments. Conduct root cause analysis (RCA) for incidents and drive long-term improvements. Improve system resilience through capacity planning, performance tuning, and failure recovery strategies. Additional responsibilities Ensure all the application components are running smoothly in the Kubernetes and AWS environment. Support the components (patches / upgrades / issues / configurations) on the application Platform Manage CI/CD pipelines for the application tools / components Automation of Tasks to improve efficiency and effort reduction Create and publish comprehensive dashboards for Observability Configuring & Monitoring for Health Checks User Provisioning Experience with cloud platforms (AWS, Google Cloud Platform, Azure) Proficient in programming/scripting languages (Python, Go, , etc.) Strong knowledge of Linux/Unix systems and networking Familiarity with containerization and orchestration tools (Docker, Kubernetes) Solid understanding of CI/CD, automation, and infrastructure-as-code principles Strong problem-solving and troubleshooting skills Monitoring & Remediation of Alerts Alert the application team in the event of any potential issues related to infrastructure or components. Create and Update Runbooks for standardized Operations Acquire knowledge about the application platform (architecture, design, usage, typical problems faced by users, and their resolution) to reduce dependency on the application team for resolving support issues Track and report the costing of AWS and other resources weekly. Respond to users on application communication channels (Slack and support email group) and provide appropriate solutions. Thanks & Regards Manohar Reddy Senior Technical Recruiter Procorp Systems Inc 2222 W Spring Creek Pkwy, STE 202, Plano, Texas 75023 E-mail: [email protected] LinkedIn: https://www.linkedin.com/in/manohar-reddy-nandammagari-07543b119/ Keywords: continuous integration continuous deployment golang Texas SiteReliabilityEngineering (SRE) Location:Austin,TX - Looking for Ex Apple [email protected] http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=2325949&uid= |
| [email protected] View All |
| 12:53 AM 09-Apr-25 |