Home

Site Reliability Engineer (SRE) at Remote, Remote, USA
Email: [email protected]
Role: 
Site Reliability Engineer (SRE)

Experience: 12+ Years

Location: remote

Must Have skills:

Rancher, CI/CD, Gitlab, Java, Python, Observability, dynatrace monitoring, Oracle, Kafka, Neo4j.

Job Description: Site Reliability Engineer (SRE)

Role Summary: We are seeking a dynamic Site Reliability Engineer (SRE) who will play a crucial role in ensuring the stability, performance, and security of our systems. The ideal candidate will have a robust background in infrastructure management, software development, and monitoring, with expertise in tools such as Rancher, GitLab CI/CD, Observability and Dynatrace. This role requires a deep understanding of programming languages like Java and Python, as well as experience with databases such as Oracle, Kafka, and Neo4j.

Key Responsibilities:

1. System Reliability & Automation:

o Enhance system reliability by automating manual processes, optimizing system performance, and implementing robust CI/CD pipelines using GitLab.

o Deploy, manage, and scale containerized applications with Rancher, ensuring efficient resource utilization and high availability.

2. Development & Coding:

o Collaborate with development teams to design and implement software that is scalable, resilient, and secure.

o Write and maintain code in Java and Python to automate tasks, improve monitoring, and optimize performance.

o Conduct code reviews, provide feedback, and ensure adherence to best practices in software development.

3. Monitoring & Observability:

o Implement and manage monitoring solutions using Dynatrace to gain insights into application performance and system health.

o Develop and maintain observability dashboards that provide real-time visibility into system performance and business metrics.

o Investigate and resolve production issues, perform root cause analysis, and implement fixes to prevent recurrence.

4. Database & Data Stream Management:

o  Ensure the reliability, performance, and scalability of Oracle databases, Kafka streams, and Neo4j graph databases.

o  Work closely with data engineering teams to manage data pipelines and ensure data integrity and availability.

5. Collaboration & Incident Management:

o  Collaborate with cross-functional teams including DevOps, software engineering, and QA to ensure seamless integration and deployment of new features.

o Lead incident response efforts, ensuring rapid recovery and minimizing downtime.

o Participate in on-call rotations to provide support for production systems, ensuring high availability.

Required Skills & Qualifications:

Education: Bachelors degree in computer science, Engineering, or a related field.

Experience:

o  Around 5 years of experience in a Site Reliability Engineer, DevOps, or similar role.

o  Proven experience with Rancher for container management and GitLab for CI/CD.

o Strong coding skills in Java and Python, with experience in writing automation scripts and tools.

o In-depth knowledge of observability and monitoring tools, particularly Dynatrace.

o Hands-on experience with Oracle, Kafka, and Neo4j, with a focus on performance tuning and reliability.

Skills:

o Strong problem-solving and debugging skills.

o  Excellent communication skills with the ability to work effectively in a team-oriented environment.

o Interacting with various team and customer on daily basis.

o  Experience with incident management and root cause analysis.

o   Ability to work in a fast-paced environment and adapt to changing requirements.

Preferred Qualifications:

  Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.

  Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible.

   Understanding of Agile and DevOps methodologies.

--

Keywords: continuous integration continuous deployment quality analyst information technology
Site Reliability Engineer (SRE)
[email protected]
[email protected]
View all
Tue Sep 17 00:51:00 UTC 2024

To remove this job post send "job_kill 1754298" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,