Home

SRE Engineer - Remote ( Site Reliability Engineer ) at Remote, Remote, USA
Email: [email protected]
Role: SRE Engineer

Location: Remote

Job Type: Contract

Exp:6+ Years

Client: E&Y

Requirement:

        5 years minimum experience

        Experience in Azure

        Testing, tuning, monitoring specifically performance testing

        Monitoring tools Splunk, Dynatrace

        Experience in Kubernetes

        Non-functional testing

        Experience with observability tools

Job Description

        Site Reliability Engineer (5+ years of experience) is responsible for supporting reliability driven development and operations through enablement and enhancement of tools, process, best practices, framework, training, and technology

        SREs will be part of development lifecycle with more focus on shift left phases including Requirement, Design/Architecture, Coding, and testing and to make sure performance, resiliency, reliability, scalability, and availability are collected, factored in the design, developed, and tested before it goes into production and provide support/guidance for post-production operations

        Facilitate Non-Functional Requirement & SLO Collection and Documentation 

        Architecture and Design review for Reliability and Performance identify gaps to be added to the backlog

        Conduct Single User profiling for resource (CPU, Memory, IO, Network) usage optimization 

        Create automation framework/dashboard for faster root cause analysis required for issues

arises from Non-functional testing 

        Be escalation point for issues that Performance testing team can't figure out or need more

support 

        Identify and configure Observability tools for the given application 

        Develop dashboards for SLO/SLIs through configured observability tools 

        Create alerts with thresholds for application/Infra issues through configured observability

tools 

        Enable proactive and predictive monitoring and alerting 

        Crosstrain developers in the use of Observability Tools Splunk, Dynatrace, Grafana for issue resolution and application tracing 

        Develop automated framework (dashboards and reports) for observability to include

automated reports for SLOs and Error Budgets

        Develop automated dashboards/alerts for new functionality/modules 

        Develop or support self-healing solutions/framework for repeated prod issues 

        Facilitate blameless RCA with the team for Production Issues to develop recommendations

for improvement 

        Escalation-point for complex issue resolution encountered in Production - escalation path

        Establish a list of common Stability/Resilience recommendations each team should have in

place for all deployments and assist teams with adopting. For existing applications and

ensure new development includes the recommendations

        In liaison with Technical Architects, develop automated framework for production readiness

checklist to ensure deployments are configured for stability best practices that reduces risk

and that error budgets are in a state to accept the risk of upcoming changes

        Facilitate integrating Non-Functional testing into CICD pipeline 

        Recommend best deployment strategy and tools 

        Takes ownership and proactively identifies issues and opportunities to improve the systems

they are involved in and acts accordingly by doing the work and/or providing a plan to be

executed

Qualifications

        Proficient in SRE Principles and Practices  

        Hands on SRE experience with influencing skills on providing solutions and taking decisions  

        Proficient in observability tools including Splunk, Dynatrace, Prometheus/Grafana  

        Proficient in Kubernetes technology  

        Proficient in problem identification and solving  

        Experience in software, infrastructure, or platform engineering with a combination of the following: o Scalability, resiliency, reliability analysis of application design/architecture  

        Performance Testing/Tuning/Monitoring, maximizing system uptime and availability, ensuring functional and performance SLAs 

        Experience in Agile Methodologies and processes o Experience representing technical viewpoints to diverse audiences and making prudent & timely technical risk decisions 

        Experience developing automation  

        Experience in understanding end to end application component architecture

--

Thanks and Regards,Praveen J

Email Address - [email protected]

Talent Acquisition Specialist

http://adepttechservices.com

11340 Lakefield Dr., Suite 200, Johns Creek, GA 30097

Ph: (678)-785-3

--

Keywords: information technology Georgia
[email protected]
View all
Tue Sep 19 23:16:00 UTC 2023

To remove this job post send "job_kill 657002" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,