Principal DevOps or SRE Engineer || Location: Remote || Visa : No H1B & NO CPT at Remote, Remote, USA |
Email: [email protected] |
From: Gagan Deshwal, Code Infotek [email protected] Reply to: [email protected] Role: Principal DevOps/SRE Engineer Application-Centric Observability Location: Remote Visa : No H1B & NO CPT Responsibilities: Design and Implement Observability Framework: Develop and implement an end-to-end observability framework that extends beyond infrastructure to focus on application-specific metrics. Ensure comprehensive visibility into the performance of key business applications. Datadog Integration and Enhancement: Leverage Datadog to instrument application-level monitoring, integrating golden signals (SLI/SLOs) for performance, availability, and reliability. Develop SLI/SLO Blueprints: Create and maintain SLI/SLO blueprints for key business applications, defining and measuring golden signals (latency, traffic, errors, saturation) to ensure optimal system health. System Performance Optimization: Proactively monitor and assess application performance, identifying areas for improvement. Collaborate with development and SRE teams to implement performance optimization measures. Dashboard and Visualization: Develop centralized dashboards with drill-down capabilities, providing real-time visibility into the health of applications and enabling quick identification of performance issues. Business Journey Mapping: Work closely with business and engineering teams to map out critical business journeys and ensure that observability systems capture relevant metrics for each journey. Gap Analysis and Continuous Improvement: Perform baseline measurements, identify gaps in existing monitoring systems, and work to close those gaps by integrating additional telemetry data. Incident Response and Alerting: Define and implement alerting mechanisms based on SLI/SLO thresholds. Ensure the observability system can trigger appropriate alerts and escalations in case of performance degradation. Collaboration with Development Teams: Work alongside development and data engineering teams to embed observability practices into the SDLC, ensuring that monitoring is an integral part of the application architecture from the ground up. Knowledge Sharing: Provide training and guidance to teams on best practices for application observability, ensuring consistent adoption of tools and methodologies across the organization. Qualifications: 11-15 years of hands-on experience in DevOps/SRE, with a strong focus on observability for large-scale, high-performance applications. Expertise in using and enhancing observability tools like Datadog, including deep experience with metrics collection, alerting, and dashboard creation. Proven ability to create and implement SLI/SLO frameworks to track application performance, availability, and reliability. Strong understanding of monitoring application health across various services, containers, and microservices architectures. Experience in business journey mapping and ensuring observability captures relevant metrics at every stage of the user experience. Expertise in root cause analysis and providing insights into system performance through observability data. Proficiency in programming/scripting languages (e.g., Python, Bash) for automation and tool integration. Proven track record of driving performance improvements and maintaining system health through proactive monitoring and alerting. Thanks & Regards... Gagan Deshwal Technical Recruiter Email: [email protected] Keywords: Principal DevOps or SRE Engineer || Location: Remote || Visa : No H1B & NO CPT [email protected] |
[email protected] View all |
Tue Oct 22 19:56:00 UTC 2024 |