Home

Remote !! Devops -SRE Architect Manager at Remote, Remote, USA
Email: [email protected]
From:

Shubham Jaiswal,

KPG99,INC

[email protected]

Reply to:   [email protected]

SRE Architect Manager

Remote

USC/ GC only

We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) Architect to join our dynamic team.

The ideal candidate will play a critical role in ensuring the reliability, scalability, and performance of our systems and applications.

The qualified candidate must demonstrate strong communication skills to collaborate with and influence many stakeholders across the organization and possess a deep technical background across technology stacks, including applications, data and messaging frameworks, and infrastructure components.

This individual should also demonstrate exceptional leadership skills in leading a technical team, recruiting team members, and growing the organization.

Responsibilities:

Design, implement, and maintain scalable monitoring APM solutions (Dynatrace, Datadog, or New Relic) to ensure the reliability and performance of our systems.

Express a bias for action by identifying inefficiencies and proposing solutions, working independently and collaboratively with the team.

Develop and maintain automated alerting and incident response processes to proactively identify and address potential issues.

Collaborate with cross-functional teams to define and implement best practices for monitoring, logging, observability, and incident management.

Drive continuous improvement initiatives to enhance system reliability, scalability, and performance.

Automate infrastructure provisioning, configuration, and deployment processes using Terraform and other infrastructure-as-code tools.

Work closely with development teams to integrate monitoring and observability into the CI/CD pipeline.

Provide guidance and mentorship to junior team members, fostering a culture of continuous learning and growth.

Contribute to the SRE group in various technology domains and SRE practices, such as observability framework, resiliency, DevSecOps, etc.

Setup/Enhance SRE best practices in the areas of observability, automation, resiliency, etc.,

Be accountable for the delivery and performance of SRE Teams.

Evangelize the SRE practice across organizational boundaries.

Recommend relevant and implementable technologies that not only represent the state-of-art SRE practices/trends but also benefit the overall application modernization journey.

Architect new platforms/libraries/toolchains/APIs to enable broad-scope SRE adoption.

Drive the broader developer community to adopt the SRE best practices.

Develop playbooks on various SRE, DevOps, and related topics.

Required Skills & Qualifications:

Bachelor's degree in computer science or information technology fields or equivalent professional experiences. A masters degree is preferred.

5+ years of professional Site Reliability and DevOps career experience.

10+ years of total IT experience building complex systems.

In-depth familiarity with SRE terminologies, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), error budgets, incident management, postmortem analysis, Recovery Time Objectives (RTOs), and Recovery Point Objectives (RPOs).

Ability to identify organization-wide gaps in the SRE domain and identify implementable solutions that contribute to the transformation of the organization.

Ability to build and lead high-performance SRE teams to consistently achieve business results.

Expertise with monitoring, APM, and alerting tools like Splunk, Dynatrace, Grafana, Datadog, New Relic, etc.

Experience with one or more high-level languages such as Python, Go, Java, JavaScript, C#, Ruby, and PHP.

Experience with CI/CD pipelines like Jenkins, ADO, GitHub Actions, GitLab, etc.

Demonstrated expertise in cloud platforms like Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP).

Proficiency in containerization and orchestration technologies such as Docker, Kubernetes, or OpenShift.

Experience with Infrastructure as a Code (IaC) and configuration management tools like Terraform, Ansible, Chef, Puppet, or Salt.

Exposure to open-source telemetry (OTel) frameworks and tools.

Experience with tools like ServiceNow, PagerDuty, XMatters, etc.

Strong communication skills and ability to partner across organizations.

Ability to create technical documents on SRE best practices and processes.

Certifications:

At least one certification in any of the public clouds AWS, Azure, or GCP is mandatory.

Certification in one of the tools like Splunk, Dynatrace, or Datadog is preferred.

Certification in automation tools like Terraform, Ansible, etc. is preferred

Thanks and Regards,

Shubham Jaiswal
| Sr. Technical Recruiter | KPG99,INC | MBE Certified Firm

Direct: 609-722-8051 | 
[email protected]
  | www.kpgtech.com 

3240 E State St EXT , Hamilton, NJ 08619

Linkedln
:

https://www.linkedin.com/in/shubham-jaiswal-b1b02a20b

Note:
KPG99 does not endorse undesired email. If you do not want to receive our mails, please reply with Remove in the subject; I would ensure that you are not troubled further. I apologize for any inconvenience

Keywords: csharp continuous integration continuous deployment information technology golang green card New Jersey
Remote !! Devops -SRE Architect Manager
[email protected]
[email protected]
View all
Fri May 10 00:26:00 UTC 2024

To remove this job post send "job_kill 1383192" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 1

Location: ,