Job Details

Home

SRE Engineer - Remote ( Site Reliability Engineer ) at Remote, Remote, USA

Email: [email protected]

Role: SRE Engineer

Location: Remote

Job Type: Contract

Exp:6+ Years

Client: E&Y

Requirement:

        5 years minimum experience

        Experience in Azure

        Testing, tuning, monitoring specifically performance testing

        Monitoring tools Splunk, Dynatrace

        Experience in Kubernetes

        Non-functional testing

        Experience with observability tools

Job Description

        Site Reliability Engineer (5+ years of experience) is responsible for supporting reliability driven development and operations through enablement and enhancement of tools, process, best practices, framework, training, and technology

        SREs will be part of development lifecycle with more focus on shift left phases including Requirement, Design/Architecture, Coding, and testing and to make sure performance, resiliency, reliability, scalability, and availability are collected, factored in the design, developed, and tested before it goes into production and provide support/guidance for post-production operations

        Facilitate Non-Functional Requirement & SLO Collection and Documentation

        Architecture and Design review for Reliability and Performance identify gaps to be added to the backlog

        Conduct Single User profiling for resource (CPU, Memory, IO, Network) usage optimization

        Create automation framework/dashboard for faster root cause analysis required for issues

arises from Non-functional testing

        Be escalation point for issues that Performance testing team can't figure out or need more

support

        Identify and configure Observability tools for the given application

        Develop dashboards for SLO/SLIs through configured observability tools

        Create alerts with thresholds for application/Infra issues through configured observability

tools

        Enable proactive and predictive monitoring and alerting

        Crosstrain developers in the use of Observability Tools Splunk, Dynatrace, Grafana for issue resolution and application tracing

        Develop automated framework (dashboards and reports) for observability to include

automated reports for SLOs and Error Budgets

        Develop automated dashboards/alerts for new functionality/modules

        Develop or support self-healing solutions/framework for repeated prod issues

        Facilitate blameless RCA with the team for Production Issues to develop recommendations

for improvement

        Escalation-point for complex issue resolution encountered in Production - escalation path

        Establish a list of common Stability/Resilience recommendations each team should have in

place for all deployments and assist teams with adopting. For existing applications and

ensure new development includes the recommendations

        In liaison with Technical Architects, develop automated framework for production readiness

checklist to ensure deployments are configured for stability best practices that reduces risk

and that error budgets are in a state to accept the risk of upcoming changes

        Facilitate integrating Non-Functional testing into CICD pipeline

        Recommend best deployment strategy and tools

        Takes ownership and proactively identifies issues and opportunities to improve the systems

they are involved in and acts accordingly by doing the work and/or providing a plan to be

executed

Qualifications

        Proficient in SRE Principles and Practices

        Hands on SRE experience with influencing skills on providing solutions and taking decisions

        Proficient in observability tools including Splunk, Dynatrace, Prometheus/Grafana

        Proficient in Kubernetes technology

        Proficient in problem identification and solving

        Experience in software, infrastructure, or platform engineering with a combination of the following: o Scalability, resiliency, reliability analysis of application design/architecture

        Performance Testing/Tuning/Monitoring, maximizing system uptime and availability, ensuring functional and performance SLAs

        Experience in Agile Methodologies and processes o Experience representing technical viewpoints to diverse audiences and making prudent & timely technical risk decisions

        Experience developing automation

        Experience in understanding end to end application component architecture

--

Thanks and Regards,Praveen J

Email Address - [email protected]

Talent Acquisition Specialist

http://adepttechservices.com

11340 Lakefield Dr., Suite 200, Johns Creek, GA 30097

Ph: (678)-785-3

--

Keywords: information technology Georgia

[email protected]
View all

Tue Sep 19 23:16:00 UTC 2023

To remove this job post send "job_kill 657002" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

praveen.j@adepttechservices.com wrote:
Role: SRE Engineer

Location: Remote

Job Type: Contract

Exp:6+ Years

Client: E&Y

Requirement:

5 years minimum experience

Experience in Azure

Testing, tuning, monitoring  specifically performance testing

Monitoring tools  Splunk, Dynatrace

Experience in Kubernetes

Non-functional testing

Experience with observability tools

Job Description

Site Reliability Engineer (5+ years of experience) is responsible for supporting reliability driven development and operations through enablement and enhancement of tools, process, best practices, framework, training, and technology

SREs will be part of development lifecycle with more focus on shift left phases including Requirement, Design/Architecture, Coding, and testing and to make sure performance, resiliency, reliability, scalability, and availability are collected, factored in the design, developed, and tested before it goes into production and provide support/guidance for post-production operations

Facilitate Non-Functional Requirement & SLO Collection and Documentation

Architecture and Design review for Reliability and Performance identify gaps to be added to the backlog

Conduct Single User profiling for resource (CPU, Memory, IO, Network) usage optimization

Create automation framework/dashboard for faster root cause analysis required for issues

arises from Non-functional testing

Be escalation point for issues that Performance testing team can't figure out or need more

support

Identify and configure Observability tools for the given application

Develop dashboards for SLO/SLIs through configured observability tools

Create alerts with thresholds for application/Infra issues through configured observability

tools

Enable proactive and predictive monitoring and alerting

Crosstrain developers in the use of Observability Tools Splunk, Dynatrace, Grafana for issue resolution and application tracing

Develop automated framework (dashboards and reports) for observability to include

automated reports for SLOs and Error Budgets

Develop automated dashboards/alerts for new functionality/modules

Develop or support self-healing solutions/framework for repeated prod issues

Facilitate blameless RCA with the team for Production Issues to develop recommendations

for improvement

Escalation-point for complex issue resolution encountered in Production - escalation path

Establish a list of common Stability/Resilience recommendations each team should have in

place for all deployments and assist teams with adopting. For existing applications and

ensure new development includes the recommendations

In liaison with Technical Architects, develop automated framework for production readiness

checklist to ensure deployments are configured for stability best practices that reduces risk

and that error budgets are in a state to accept the risk of upcoming changes

Facilitate integrating Non-Functional testing into CICD pipeline

Recommend best deployment strategy and tools

Takes ownership and proactively identifies issues and opportunities to improve the systems

they are involved in and acts accordingly by doing the work and/or providing a plan to be

executed

Qualifications

Proficient in SRE Principles and Practices

Hands on SRE experience with influencing skills on providing solutions and taking decisions

Proficient in observability tools including Splunk, Dynatrace, Prometheus/Grafana

Proficient in Kubernetes technology

Proficient in problem identification and solving

Experience in software, infrastructure, or platform engineering with a combination of the following: o Scalability, resiliency, reliability analysis of application design/architecture

Performance Testing/Tuning/Monitoring, maximizing system uptime and availability, ensuring functional and performance SLAs

Experience in Agile Methodologies and processes o Experience representing technical viewpoints to diverse audiences and making prudent & timely technical risk decisions

Experience developing automation

Experience in understanding end to end application component architecture

Thanks and Regards,Praveen J

Email Address - Praveen.j@adepttechservices.com

Talent Acquisition Specialist

http://adepttechservices.com

11340 Lakefield Dr., Suite 200, Johns Creek, GA 30097

Ph: (678)-785-3

Keywords: information technology Georgia

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,