Home

Site Reliability Engineer||Remote at Remote, Remote, USA
Email: [email protected]
Hi,

Hope you are doing great today,

Please find the below requirement and if you are
comfortable then share the updated resume with contact details ASAP.

Job Title: Site Reliability Engineer

Location: Remote

Duration: 12+ Months

Visa: No H1B & Opt, Cpt 

MOI: Zoom

Exp:6+ Years

Requirement:

5 years minimum experience

Experience in Azure

Testing, tuning, monitoring specifically
performance testing

Monitoring tools Splunk, Dynatrace

Experience in Kubernetes

Non-functional testing

Experience with observability tools

Job
Description

Site Reliability Engineer (5+ years of
experience) is responsible for supporting reliability driven development
and operations through enablement and enhancement of tools, process, best
practices, framework, training, and technology

SREs will be part of development lifecycle
with more focus on shift left phases including Requirement,
Design/Architecture, Coding, and testing and to make sure performance,
resiliency, reliability, scalability, and availability are collected,
factored in the design, developed, and tested before it goes into
production and provide support/guidance for post-production operations

Facilitate Non-Functional Requirement
& SLO Collection and Documentation 

Architecture and Design review for
Reliability and Performance identify gaps to be added to the backlog

Conduct Single User profiling for resource
(CPU, Memory, IO, Network) usage optimization 

Create automation framework/dashboard for
faster root cause analysis required for issues

arises from Non-functional testing 

Be escalation point for issues that
Performance testing team can't figure out or need more

support 

Identify and configure Observability tools
for the given application 

Develop dashboards for SLO/SLIs through
configured observability tools 

Create alerts with thresholds for
application/Infra issues through configured observability

tools 

Enable proactive and predictive monitoring
and alerting 

Crosstrain developers in the use of
Observability Tools Splunk, Dynatrace, Grafana for issue resolution and
application tracing 

Develop automated framework (dashboards
and reports) for observability to include

automated reports for SLOs and Error Budgets

Develop automated dashboards/alerts for
new functionality/modules 

Develop or support self-healing
solutions/framework for repeated prod issues 

Facilitate blameless RCA with the team for
Production Issues to develop recommendations

for improvement 

Escalation-point for complex issue
resolution encountered in Production - escalation path

Establish a list of common Stability/Resilience
recommendations each team should have in

place for all deployments and assist teams with adopting. For existing
applications and

ensure new development includes the recommendations

In liaison with Technical Architects,
develop automated framework for production readiness

checklist to ensure deployments are configured for stability best
practices that reduces risk

and that error budgets are in a state to accept the risk of upcoming
changes

Facilitate integrating Non-Functional
testing into CICD pipeline 

Recommend best deployment strategy and
tools 

Takes ownership and proactively identifies
issues and opportunities to improve the systems

they are involved in and acts accordingly by doing the work and/or
providing a plan to be

executed

Qualifications

Proficient in SRE Principles and
Practices  

Hands on SRE experience with influencing
skills on providing solutions and taking decisions  

Proficient in observability tools
including Splunk, Dynatrace, Prometheus/Grafana  

Proficient in Kubernetes
technology  

Proficient in problem identification and
solving  

Experience in software, infrastructure, or
platform engineering with a combination of the following: o Scalability,
resiliency, reliability analysis of application
design/architecture  

Performance Testing/Tuning/Monitoring,
maximizing system uptime and availability, ensuring functional and
performance SLAs 

Experience in Agile Methodologies and
processes o Experience representing technical viewpoints to diverse
audiences and making prudent & timely technical risk decisions 

Experience developing
automation  

Experience in understanding end to end
application component architecture

Amit Vikal (AV)

Thoth IT LLC

--

Keywords: information technology
[email protected]
View all
Wed Sep 20 02:10:00 UTC 2023

To remove this job post send "job_kill 658473" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,