SRE or Devops local in San Jose, California only USC or GC at California, Maryland, USA |
Email: [email protected] |
From: Badal kanojia, Stellentit [email protected] Reply to: [email protected] SRE Product Reliability San Jose, California Phone + Skype Job Description: Key Responsibilities: Design and Implementation: Develop and implement observability solutions for Kubernetes based applications using Fluentbit, Cloud Watch, StackDriver, Grafana Loki, Grafana Tempo, Prometheus, Envoy Health Probes, Open Telemetry, and ArgoCD. Monitoring and Logging: Configure and maintain logging pipelines using Fluentbit to collect, process, and route logs for storage and analysis. Metrics and Tracing: Set up Prometheus for metrics collection and Grafana Tempo for distributed tracing. Integrate these with Grafana for real-time monitoring and alerting via open telemetry. Telemetry: Utilize Open Telemetry to instrument applications for better traceability and observability. CI/CD: Use ArgoCD for continuous deployment and ensure observability tools are integrated into the CI/CD pipeline to deploy the observability suite. Observability Optimization: Analyze and optimize the performance of the observability stack to ensure minimal overhead and maximum efficiency. Troubleshooting: Proactively identify and resolve issues related to the observability infrastructure. Collaborate with development and operations teams to troubleshoot and resolve incidents. Documentation and Training: Document observability processes and best practices. Provide training and support to other team members on the observability tools and techniques. Required Skills and Qualifications: Experience: Proven experience as an SRE or in a similar role, with a strong focus on observability in Kubernetes environments supporting applications in EKS in AWS. Technologies: Hands-on experience with Fluentbit, Cloud Watch, StackDriver, Grafana Loki, Grafana Tempo, Prometheus, Envoy Health Probes, Open Telemetry, and ArgoCD. Kubernetes: In-depth knowledge of Kubernetes and container orchestration. Scripting and Automation: Proficiency in scripting languages such as Python, Bash, or similar for automation tasks. Monitoring and Logging: Strong understanding of monitoring, logging, and tracing concepts and best practices. Problem Solving: Excellent analytical and problem-solving skills. Collaboration: Strong communication skills and the ability to work effectively in a team environment. Continuous Improvement: A proactive attitude towards identifying opportunities for improvement and implementing solutions. Preferred Qualifications: Certifications: Relevant certifications such as Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) Cloud Platforms: Experience with cloud platforms such as AWS and EKS. DevOps Practices: Familiarity with DevOps practices and tools. Keywords: continuous integration continuous deployment SRE or Devops local in San Jose, California only USC or GC [email protected] |
[email protected] View all |
Fri Jul 26 22:38:00 UTC 2024 |