Site Reliability Engineer (SRE) at Remote, Remote, USA |
Email: benchsales11@googlegroups.com |
Role: Site Reliability Engineer (SRE) Experience: 12+ Years Location: remote Must Have skills: Rancher, CI/CD, Gitlab, Java, Python, Observability, dynatrace monitoring, Oracle, Kafka, Neo4j. Job Description: Site Reliability Engineer (SRE) Role Summary: We are seeking a dynamic Site Reliability Engineer (SRE) who will play a crucial role in ensuring the stability, performance, and security of our systems. The ideal candidate will have a robust background in infrastructure management, software development, and monitoring, with expertise in tools such as Rancher, GitLab CI/CD, Observability and Dynatrace. This role requires a deep understanding of programming languages like Java and Python, as well as experience with databases such as Oracle, Kafka, and Neo4j. Key Responsibilities: 1. System Reliability & Automation: o Enhance system reliability by automating manual processes, optimizing system performance, and implementing robust CI/CD pipelines using GitLab. o Deploy, manage, and scale containerized applications with Rancher, ensuring efficient resource utilization and high availability. 2. Development & Coding: o Collaborate with development teams to design and implement software that is scalable, resilient, and secure. o Write and maintain code in Java and Python to automate tasks, improve monitoring, and optimize performance. o Conduct code reviews, provide feedback, and ensure adherence to best practices in software development. 3. Monitoring & Observability: o Implement and manage monitoring solutions using Dynatrace to gain insights into application performance and system health. o Develop and maintain observability dashboards that provide real-time visibility into system performance and business metrics. o Investigate and resolve production issues, perform root cause analysis, and implement fixes to prevent recurrence. 4. Database & Data Stream Management: o Ensure the reliability, performance, and scalability of Oracle databases, Kafka streams, and Neo4j graph databases. o Work closely with data engineering teams to manage data pipelines and ensure data integrity and availability. 5. Collaboration & Incident Management: o Collaborate with cross-functional teams including DevOps, software engineering, and QA to ensure seamless integration and deployment of new features. o Lead incident response efforts, ensuring rapid recovery and minimizing downtime. o Participate in on-call rotations to provide support for production systems, ensuring high availability. Required Skills & Qualifications: Education: Bachelors degree in computer science, Engineering, or a related field. Experience: o Around 5 years of experience in a Site Reliability Engineer, DevOps, or similar role. o Proven experience with Rancher for container management and GitLab for CI/CD. o Strong coding skills in Java and Python, with experience in writing automation scripts and tools. o In-depth knowledge of observability and monitoring tools, particularly Dynatrace. o Hands-on experience with Oracle, Kafka, and Neo4j, with a focus on performance tuning and reliability. Skills: o Strong problem-solving and debugging skills. o Excellent communication skills with the ability to work effectively in a team-oriented environment. o Interacting with various team and customer on daily basis. o Experience with incident management and root cause analysis. o Ability to work in a fast-paced environment and adapt to changing requirements. Preferred Qualifications: Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform. Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible. Understanding of Agile and DevOps methodologies. -- Keywords: continuous integration continuous deployment quality analyst information technology Site Reliability Engineer (SRE) benchsales11@googlegroups.com |
benchsales11@googlegroups.com View all |
Tue Sep 17 00:51:00 UTC 2024 |