Site Reliability Engineer- Must need 13+ exp needed. at Plano, Texas, USA |
Email: [email protected] |
From: Chandra, RHG [email protected] Reply to: [email protected] Hi All, Hope you are doing good. Title: Site Reliability Engineer Location: Plano TX Duration: Contract Job Description: The Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment. Mandatory skills: Help build a Site Reliability Engineering culture by sharing your best practices, approaches, documentation, and code with other engineering teams. Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually. Able to troubleshoot complicated issues handling OS, Networking, Database in a cloud-based SaaS environment/on-premises environment and handle live production incidents, debug/troubleshoot application, and infrastructure issues, follow and implement SRE best practices. Monitor application performance, take steps to improve overall application performance and stability and follow through with implementation What you will do: Monitor application performance, take steps to improve overall application performance and stability, and follow through with implementation. Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually. Able to troubleshoot issues handling OS, Networking, databases in a cloud-based environment/on-premises environment and handle live production incidents, debug/troubleshoot application, and infrastructure issues, follow and implement SRE best practices. Coordinate with Product owners/business representatives to define Service Level Objectives and error budgets for key functionalities of the projects Participate in design reviews of software/components with build teams to ensure that they are built right. Review products prior to production deployments to validate compliance with Service level objectives Conduct system analysis, and configuration management and develop improvements for system software performance, availability, and reliability. Work closely with software engineers and QA to ensure the system is responding properly to non-functional requirements such as performance, security, and availability. Document system knowledge as acquired over time, create runbooks and ensure critical system information is readily available to those who need it. Maintain and monitor deployment of the servers, docker containers, databases, and general backend infrastructure. Participate in production feedback sessions, problem management calls to identify opportunities for product improvement. What youll bring: Bachelors Degree in Computer Science or related; or equivalent combination of education and experience 5+ years experience in full-stack application support/SRE role Experience in Javascript, Typescript and web development technologies Proficient in scripting languages such as Powershell and/or Python Troubleshooting experience of complex application incidents built in AWS stack Experience in conducting design reviews of software components and leading performance, capacity and chaos experiments. Extensive Experience with observability platforms (Data dog) is required. Experience with built-in browser side diagnostic tools is expected. Knowledge of DevOps methodologies and the tools involved such as CI/CD concepts, CI/CD tools (Jenkins, CodePipeline, etc.), and automation and configuration tools (Puppet, Ansible, etc) a plus. Hands on experience with AWS public cloud is a must, Project implementation experience on public cloud is a plus. Ability and willingness to adapt to new application stacks and new technology concepts as the business evolves over time Excellent communication skills, both verbal and written Ability to collaborate with local and remote teams in different time zones Ability to present/lead technical discussions with product, cloud COE, security and other support teams Regards, Chandra Keywords: continuous integration continuous deployment quality analyst information technology Texas Site Reliability Engineer- Must need 13+ exp needed. [email protected] |
[email protected] View all |
Fri Aug 16 01:36:00 UTC 2024 |