Job Details

Home

Site Reliability Engineer||Remote at Remote, Remote, USA

Email: [email protected]

Hi,

Hope you are doing great today,

Please find the below requirement and if you are
comfortable then share the updated resume with contact details ASAP.

Job Title: Site Reliability Engineer

Location: Remote

Duration: 12+ Months

Visa: No H1B & Opt, Cpt

MOI: Zoom

Exp:6+ Years

Requirement:

5 years minimum experience

Experience in Azure

Testing, tuning, monitoring specifically
performance testing

Monitoring tools Splunk, Dynatrace

Experience in Kubernetes

Non-functional testing

Experience with observability tools

Job
Description

Site Reliability Engineer (5+ years of
experience) is responsible for supporting reliability driven development
and operations through enablement and enhancement of tools, process, best
practices, framework, training, and technology

SREs will be part of development lifecycle
with more focus on shift left phases including Requirement,
Design/Architecture, Coding, and testing and to make sure performance,
resiliency, reliability, scalability, and availability are collected,
factored in the design, developed, and tested before it goes into
production and provide support/guidance for post-production operations

Facilitate Non-Functional Requirement
& SLO Collection and Documentation

Architecture and Design review for
Reliability and Performance identify gaps to be added to the backlog

Conduct Single User profiling for resource
(CPU, Memory, IO, Network) usage optimization

Create automation framework/dashboard for
faster root cause analysis required for issues

arises from Non-functional testing

Be escalation point for issues that
Performance testing team can't figure out or need more

support

Identify and configure Observability tools
for the given application

Develop dashboards for SLO/SLIs through
configured observability tools

Create alerts with thresholds for
application/Infra issues through configured observability

tools

Enable proactive and predictive monitoring
and alerting

Crosstrain developers in the use of
Observability Tools Splunk, Dynatrace, Grafana for issue resolution and
application tracing

Develop automated framework (dashboards
and reports) for observability to include

automated reports for SLOs and Error Budgets

Develop automated dashboards/alerts for
new functionality/modules

Develop or support self-healing
solutions/framework for repeated prod issues

Facilitate blameless RCA with the team for
Production Issues to develop recommendations

for improvement

Escalation-point for complex issue
resolution encountered in Production - escalation path

Establish a list of common Stability/Resilience
recommendations each team should have in

place for all deployments and assist teams with adopting. For existing
applications and

ensure new development includes the recommendations

In liaison with Technical Architects,
develop automated framework for production readiness

checklist to ensure deployments are configured for stability best
practices that reduces risk

and that error budgets are in a state to accept the risk of upcoming
changes

Facilitate integrating Non-Functional
testing into CICD pipeline

Recommend best deployment strategy and
tools

Takes ownership and proactively identifies
issues and opportunities to improve the systems

they are involved in and acts accordingly by doing the work and/or
providing a plan to be

executed

Qualifications

Proficient in SRE Principles and
Practices

Hands on SRE experience with influencing
skills on providing solutions and taking decisions

Proficient in observability tools
including Splunk, Dynatrace, Prometheus/Grafana

Proficient in Kubernetes
technology

Proficient in problem identification and
solving

Experience in software, infrastructure, or
platform engineering with a combination of the following: o Scalability,
resiliency, reliability analysis of application
design/architecture

Performance Testing/Tuning/Monitoring,
maximizing system uptime and availability, ensuring functional and
performance SLAs

Experience in Agile Methodologies and
processes o Experience representing technical viewpoints to diverse
audiences and making prudent & timely technical risk decisions

Experience developing
automation

Experience in understanding end to end
application component architecture

Amit Vikal (AV)

Thoth IT LLC

--

Keywords: information technology

[email protected]
View all

Wed Sep 20 02:10:00 UTC 2023

To remove this job post send "job_kill 658473" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

amitvikalg@gmail.com wrote:
Hi,

Hope you are doing great today,

Please find the below requirement and if you are
comfortable then share the updated resume with contact details ASAP.

Job Title: Site Reliability Engineer

Location: Remote

Duration: 12+ Months

Visa: No H1B & Opt, Cpt

MOI: Zoom

Exp:6+ Years

Requirement:

5 years minimum experience

Experience in Azure

Testing, tuning, monitoring  specifically
     performance testing

Monitoring tools  Splunk, Dynatrace

Experience in Kubernetes

Non-functional testing

Experience with observability tools

Job
Description

Site Reliability Engineer (5+ years of
     experience) is responsible for supporting reliability driven development
     and operations through enablement and enhancement of tools, process, best
     practices, framework, training, and technology

SREs will be part of development lifecycle
     with more focus on shift left phases including Requirement,
     Design/Architecture, Coding, and testing and to make sure performance,
     resiliency, reliability, scalability, and availability are collected,
     factored in the design, developed, and tested before it goes into
     production and provide support/guidance for post-production operations

Facilitate Non-Functional Requirement
     & SLO Collection and Documentation

Architecture and Design review for
     Reliability and Performance identify gaps to be added to the backlog

Conduct Single User profiling for resource
     (CPU, Memory, IO, Network) usage optimization

Create automation framework/dashboard for
     faster root cause analysis required for issues

arises from Non-functional testing

Be escalation point for issues that
     Performance testing team can't figure out or need more

support

Identify and configure Observability tools
     for the given application

Develop dashboards for SLO/SLIs through
     configured observability tools

Create alerts with thresholds for
     application/Infra issues through configured observability

tools

Enable proactive and predictive monitoring
     and alerting

Crosstrain developers in the use of
     Observability Tools Splunk, Dynatrace, Grafana for issue resolution and
     application tracing

Develop automated framework (dashboards
     and reports) for observability to include

automated reports for SLOs and Error Budgets

Develop automated dashboards/alerts for
     new functionality/modules

Develop or support self-healing
     solutions/framework for repeated prod issues

Facilitate blameless RCA with the team for
     Production Issues to develop recommendations

for improvement

Escalation-point for complex issue
     resolution encountered in Production - escalation path

Establish a list of common Stability/Resilience
     recommendations each team should have in

place for all deployments and assist teams with adopting. For existing
     applications and

ensure new development includes the recommendations

In liaison with Technical Architects,
     develop automated framework for production readiness

checklist to ensure deployments are configured for stability best
     practices that reduces risk

and that error budgets are in a state to accept the risk of upcoming
     changes

Facilitate integrating Non-Functional
     testing into CICD pipeline

Recommend best deployment strategy and
     tools

Takes ownership and proactively identifies
     issues and opportunities to improve the systems

they are involved in and acts accordingly by doing the work and/or
     providing a plan to be

executed

Qualifications

Proficient in SRE Principles and
     Practices

Hands on SRE experience with influencing
     skills on providing solutions and taking decisions

Proficient in observability tools
     including Splunk, Dynatrace, Prometheus/Grafana

Proficient in Kubernetes
     technology

Proficient in problem identification and
     solving

Experience in software, infrastructure, or
     platform engineering with a combination of the following: o Scalability,
     resiliency, reliability analysis of application
     design/architecture

Performance Testing/Tuning/Monitoring,
     maximizing system uptime and availability, ensuring functional and
     performance SLAs

Experience in Agile Methodologies and
     processes o Experience representing technical viewpoints to diverse
     audiences and making prudent & timely technical risk decisions

Experience developing
     automation

Experience in understanding end to end
     application component architecture

Amit Vikal (AV)

Thoth IT LLC

Keywords: information technology

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,