Job Details

Home

Hybrid Role :: Site Reliability Engineer - San Jose, CA - Locals Only at Remote, Remote, USA

http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1605468&uid=

From:

Upen,

SUS Infotech Inc

[email protected]

Reply to: [email protected]

The Terms:

6-month temporary contract (with possible extension/conversion)

Hybrid (3 days in office, 2 remote) in San Jose, CA

About the Role:

We are seeking a skilled and experienced Site Reliability Engineer (SRE) to join our team. The primary focus of this role is to develop and maintain a comprehensive observability solution for our Kubernetes-based applications. The ideal candidate will be proficient in using various monitoring and logging tools to ensure the reliability and scalability of our services.

Key Responsibilities:

Design and Implementation
: Develop and implement observability solutions for

Kubernetes-based applications using Fluentbit, Cloud Watch, StackDriver, Grafana Loki, Grafana Tempo, Prometheus, Envoy Health Probes, Open Telemetry, and ArgoCD.

Monitoring and Logging
: Configure and maintain logging pipelines using Fluentbit to collect process, and route logs for storage and analysis.

Metrics and Tracing
: Set up Prometheus for metrics collection and Grafana Tempo for distributed tracing. Integrate these with Grafana for real-time monitoring and alerting via open telemetry.

Telemetry
: Utilize Open Telemetry to instrument applications for better traceability and observability.

CI/CD
: Use ArgoCD for continuous deployment and ensure observability tools are integrated into the CI/CD pipeline to deploy the observability suite.

Observability Optimization
: Analyze and optimize the performance of the observability stack to ensure minimal overhead and maximum efficiency.

Troubleshooting
: Proactively identify and resolve issues related to the observability infrastructure. Collaborate with development and operations teams to troubleshoot and resolve incidents.

Documentation and Training
: Document observability processes and best practices. Provide training and support to other team members on observability tools and techniques.

Required Skills and Qualifications:

Experience

: Proven experience as an SRE or in a similar role, with a strong focus on observability in Kubernetes environments supporting applications in EKS in AWS.

Technologies

: Hands-on experience with Fluentbit, Cloud Watch, StackDriver, Grafana Loki, Grafana Tempo, Prometheus, Envoy Health Probes, Open Telemetry, and ArgoCD.

Kubernetes

: In-depth knowledge of Kubernetes and container orchestration.

Scripting and Automation

: Proficiency in scripting languages such as Python, Bash, or similar for automation tasks.

Monitoring and Logging

: Strong understanding of monitoring, logging, and tracing concepts and best practices.

Problem Solving

: Excellent analytical and problem-solving skills.

Collaboration

: Strong communication skills and the ability to work effectively in a team environment.

Continuous Improvement

: A proactive attitude towards identifying opportunities for improvement and implementing solutions.

Preferred Qualifications:

Certifications

: Relevant certifications such as Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)

Cloud Platforms

: Experience with cloud platforms such as AWS and EKS.

DevOps Practices
: Familiarity with DevOps practices and tools.

Intake Notes:

Just to point out, he said that they are looking for observability in a Kubernetes framework. So, I am looking for skills/experience related to monitoring, tracking, and analyzing the performance of the applications and systems running in Kubernetes environments.

They do NOT want a DevOps person

Lead and SRE transformation. Implementing industry standard SRE support model. Embedded SRE operation and looking to jumpstart with contract resources to meet some deliverables. Kubernetes applications to instrument them into a new dev observability platform. Not looking for anything too boutique office shelf skills.

Technologies: what they are currently using. Additional skills are open to that. IC III level someone who has done it before. Claritive configuration of the role. The mindset is key

Working alongside established SRE. Communication is important, but tech stack is principal.

Manager does not want a DevOps Engineer that does CI/CD pipelines

Must have EKS (Elastic Kubernetes Services) and AWS

Focusing more on complete Observeablity

Keywords: continuous integration continuous deployment access management information technology California
Hybrid Role :: Site Reliability Engineer - San Jose, CA - Locals Only
[email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1605468&uid=

[email protected]
View All

08:36 PM 29-Jul-24

To remove this job post send "job_kill 1605468" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 126

Location: San Jose, California