Home

Principal DevOps or SRE Engineer || Location: Remote || Visa : No H1B & NO CPT at Remote, Remote, USA
Email: [email protected]
From:

Gagan Deshwal,

Code Infotek

[email protected]

Reply to: [email protected]

Role: Principal DevOps/SRE Engineer Application-Centric Observability

Location: Remote

Visa : No H1B & NO CPT

Responsibilities:

Design and Implement Observability Framework:
Develop and implement an end-to-end observability framework that extends beyond infrastructure to focus on application-specific metrics. Ensure comprehensive visibility into the performance of key business applications.

Datadog Integration and Enhancement:
Leverage Datadog to instrument application-level monitoring, integrating golden signals (SLI/SLOs) for performance, availability, and reliability.

Develop SLI/SLO Blueprints:
Create and maintain SLI/SLO blueprints for key business applications, defining and measuring golden signals (latency, traffic, errors, saturation) to ensure optimal system health.

System Performance Optimization:
Proactively monitor and assess application performance, identifying areas for improvement. Collaborate with development and SRE teams to implement performance optimization measures.

Dashboard and Visualization:
Develop centralized dashboards with drill-down capabilities, providing real-time visibility into the health of applications and enabling quick identification of performance issues.

Business Journey Mapping:
Work closely with business and engineering teams to map out critical business journeys and ensure that observability systems capture relevant metrics for each journey.

Gap Analysis and Continuous Improvement:
Perform baseline measurements, identify gaps in existing monitoring systems, and work to close those gaps by integrating additional telemetry data.

Incident Response and Alerting:
Define and implement alerting mechanisms based on SLI/SLO thresholds. Ensure the observability system can trigger appropriate alerts and escalations in case of performance degradation.

Collaboration with Development Teams:
Work alongside development and data engineering teams to embed observability practices into the SDLC, ensuring that monitoring is an integral part of the application architecture from the ground up.

Knowledge Sharing:
Provide training and guidance to teams on best practices for application observability, ensuring consistent adoption of tools and methodologies across the organization.

Qualifications:

11-15 years
of hands-on experience in DevOps/SRE, with a strong focus on observability for large-scale, high-performance applications.

Expertise in using and enhancing observability tools like Datadog, including deep experience with metrics collection, alerting, and dashboard creation.

Proven ability to create and implement SLI/SLO frameworks to track application performance, availability, and reliability.

Strong understanding of monitoring application health across various services, containers, and microservices architectures.

Experience in business journey mapping and ensuring observability captures relevant metrics at every stage of the user experience.

Expertise in root cause analysis and providing insights into system performance through observability data.

Proficiency in programming/scripting languages (e.g., Python, Bash) for automation and tool integration.

Proven track record of driving performance improvements and maintaining system health through proactive monitoring and alerting.

Thanks & Regards...

Gagan Deshwal

Technical Recruiter

Email:

[email protected]

Keywords:
Principal DevOps or SRE Engineer || Location: Remote || Visa : No H1B & NO CPT
[email protected]
[email protected]
View all
Tue Oct 22 19:56:00 UTC 2024

To remove this job post send "job_kill 1864356" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,