Home

Priorty at Remote, Remote, USA
Email: [email protected]
Site Reliability Engineer

Seattle, WA(Hybrid)

A day in the life
Collaborate with cross-functional teams to implement monitoring, logging, and tracing solutions that provide actionable insights and enable efficient troubleshooting and root cause analysis
Identify opportunities for improvement and organize efforts/team members needed to address the areas of improvement Work closely with technical and business partners to integrate monitoring into their products using Datadog Build SLO and status dashboards for digital teams on Datadog
Ensure your team delivers expertise and support that helps product teams increase the reliability of our systems Collaborate with other technical teams to align on best practices and standards
Balance incoming support requests with internal investment work using data to inform your decisions
Identify gaps in our observability tooling and infrastructure and recommend and implement appropriate solutions to enhance our monitoring capabilities
Drive automation efforts to ensure efficient collection, storage, and analysis of observability data, leveraging tools and technologies such as Datadog, Splunk, etc, and distributed tracing frameworks
Participate in incident response, post-incident root cause analysis, and problem management activities, providing expertise and recommendations to improve system reliability and prevent future incidents
Stay updated with the latest industry trends and advancements in observability and SRE practices, and drive the adoption of new tools and methodologies to enhance our observability capabilities
Mentor and guide junior team members, sharing your knowledge and expertise to foster a culture of learning and continuous improvement within the SRE Observability and Foundations team

Qualifications
Bachelors degree in computer science/engineering or equivalent
5-8+ years of software engineering experience or SRE roles with a specific focus on observability
Familiarity with logging and monitoring solutions, log aggregation platforms, and distributed tracing frameworks
Experience in formulating and applying Service Level Objectives (SLOs)
Strong analytical and problem-solving skills, with a focus on root cause analysis and troubleshooting complex issues
Excellent collaboration and communication skills, with the ability to work effectively in cross-functional teams
Proven experience in driving automation initiatives and improving system reliability through observability practices
Relevant certifications such as Terraform Associate Certification and Certified Kubernetes Administrator

Bonus
Expertise in monitoring tools such as DataDog, Splunk, etc.
E-commerce experience preferred
Product ownership experience

Must haves
Acknowledges the presence of choice in every moment and takes personal responsibility for their life.
Possesses an entrepreneurial spirit and continuously innovates to achieve great results.
Communicates with honesty and kindness, and creates the space for others to do the same.
Leads with courage, knowing the possibility of greatness is bigger than the fear of failure.
Fosters connection by putting people first and building trusting relationships.
Integrates fun and joy as a way of being and working, aka doesnt take themselves too seriously.

--

Keywords: information technology Washington
Priorty
[email protected]
[email protected]
View all
Wed Aug 14 01:02:00 UTC 2024

To remove this job post send "job_kill 1656555" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 1

Location: , Washington