Job Details

Home

Site Reliability Engineering (SRE) Subject Matter Expert (SME) - CT, NJ, GA, VA, RI at Remote, Remote, USA

https://jobs.nvoids.com/job_details.jsp?id=595038&uid=
From:

pradeep,

Shrive Technologies

pradeep@shrivetechnologies.com

Reply to: pradeep@shrivetechnologies.com

Job Description: Site Reliability Engineering (SRE) Subject Matter Expert (SME)

Location Any of the Locations Hybrid
Farmington Ave, Hartford, CT
MILTEC TER CHANTILLY VA,
Mansell Rd, Alpharetta, GA
Campus Dr, Florham Park, NJ
Woonsocket, Rhode Island

Job Summary:

We are seeking a seasoned Site Reliability Engineering (SRE) Subject Matter Expert (SME) with deep expertise in Google Cloud Platform (GCP) to join our dynamic team. The ideal candidate will play a pivotal role in ensuring the reliability, scalability, and performance of our cloud-based applications and services. As an SRE SME, you will provide technical leadership, mentorship, and strategic guidance to our SRE and engineering teams.

Responsibilities:
Collaborate with cross-functional teams to design, implement, and maintain highly available, scalable, and fault-tolerant systems on GCP.
Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure service reliability. Manage error budgets and drive continuous improvements.
Lead incident response and post-incident reviews, identifying root causes and implementing corrective actions to prevent future occurrences.
Design and implement automation solutions, infrastructure as code (IaC), and tools to streamline operational processes and ensure efficient management of GCP resources.
Perform capacity planning and scaling to handle growth and traffic fluctuations, utilizing GCP's auto-scaling features effectively.
Set up comprehensive monitoring and alerting systems, define alerting thresholds, and ensure timely incident escalation and resolution.
Develop and maintain disaster recovery plans and procedures to ensure data integrity and business continuity in case of disruptions.
Collaborate with security teams to implement GCP security best practices, ensure compliance with relevant regulations, and address security vulnerabilities.
Optimize system performance by analyzing resource utilization, identifying bottlenecks, and implementing performance enhancements.
Foster a culture of continuous improvement by participating in retrospectives, sharing lessons learned, and promoting best practices among the team.
Provide technical leadership and mentorship to junior engineers, helping them enhance their GCP and SRE skills.
Stay up-to-date with the latest developments in GCP services, SRE methodologies, and cloud-native technologies.
Collaborate with GCP representatives as needed to address technical challenges, explore new features, and ensure alignment with GCP best practices.

Qualifications:
Bachelor's degree in Computer Science, Engineering, or a related field. Master's degree preferred.
5-10 years of experience in Site Reliability Engineering with a strong focus on Google Cloud Platform (GCP).
Proven experience in designing, deploying, and maintaining production-grade applications on GCP.
In-depth knowledge of GCP services, including Compute Engine, Kubernetes Engine, Cloud Storage, Cloud Networking, and more.
Strong understanding of SRE principles, methodologies, and best practices.
Proficiency in automation and scripting using tools such as Terraform, Ansible, or equivalent.
Experience with monitoring and observability tools like Prometheus, Grafana, Stackdriver, or similar.
Excellent problem-solving skills and the ability to diagnose and resolve complex technical issues.
Strong communication skills and the ability to collaborate effectively with cross-functional teams.
Relevant GCP certifications (e.g., Professional Cloud Architect, Professional DevOps Engineer) are a plus.
Prior experience in leading incident response and post-incident reviews is highly desirable.
Demonstrated leadership skills and the ability to provide technical guidance and mentorship.

Keywords: Connecticut Georgia New Jersey Virginia
https://jobs.nvoids.com/job_details.jsp?id=595038&uid=

pradeep@shrivetechnologies.com
View All

02:07 AM 01-Sep-23

To remove this job post send "job_kill 595038" as subject from pradeep@shrivetechnologies.com to usjobs@nvoids.com. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to pradeep@shrivetechnologies.com -

To

Subject
Message -

pradeep@shrivetechnologies.com wrote:
From:

pradeep,

Shrive Technologies

pradeep@shrivetechnologies.com

Reply to:   pradeep@shrivetechnologies.com

Job Description: Site Reliability Engineering (SRE) Subject Matter Expert (SME)

Location  Any of the Locations  Hybrid
Farmington Ave, Hartford, CT
MILTEC TER CHANTILLY VA,
Mansell Rd, Alpharetta, GA
Campus Dr, Florham Park, NJ
Woonsocket, Rhode Island

Job Summary:

We are seeking a seasoned Site Reliability Engineering (SRE) Subject Matter Expert (SME) with deep expertise in Google Cloud Platform (GCP) to join our dynamic team. The ideal candidate will play a pivotal role in ensuring the reliability, scalability, and performance of our cloud-based applications and services. As an SRE SME, you will provide technical leadership, mentorship, and strategic guidance to our SRE and engineering teams.

Responsibilities:
Collaborate with cross-functional teams to design, implement, and maintain highly available, scalable, and fault-tolerant systems on GCP.
Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure service reliability. Manage error budgets and drive continuous improvements.
Lead incident response and post-incident reviews, identifying root causes and implementing corrective actions to prevent future occurrences.
Design and implement automation solutions, infrastructure as code (IaC), and tools to streamline operational processes and ensure efficient management of GCP resources.
Perform capacity planning and scaling to handle growth and traffic fluctuations, utilizing GCP's auto-scaling features effectively.
Set up comprehensive monitoring and alerting systems, define alerting thresholds, and ensure timely incident escalation and resolution.
Develop and maintain disaster recovery plans and procedures to ensure data integrity and business continuity in case of disruptions.
Collaborate with security teams to implement GCP security best practices, ensure compliance with relevant regulations, and address security vulnerabilities.
Optimize system performance by analyzing resource utilization, identifying bottlenecks, and implementing performance enhancements.
Foster a culture of continuous improvement by participating in retrospectives, sharing lessons learned, and promoting best practices among the team.
Provide technical leadership and mentorship to junior engineers, helping them enhance their GCP and SRE skills.
Stay up-to-date with the latest developments in GCP services, SRE methodologies, and cloud-native technologies.
Collaborate with GCP representatives as needed to address technical challenges, explore new features, and ensure alignment with GCP best practices.

Qualifications:
Bachelor's degree in Computer Science, Engineering, or a related field. Master's degree preferred.
5-10 years of experience in Site Reliability Engineering with a strong focus on Google Cloud Platform (GCP).
Proven experience in designing, deploying, and maintaining production-grade applications on GCP.
In-depth knowledge of GCP services, including Compute Engine, Kubernetes Engine, Cloud Storage, Cloud Networking, and more.
Strong understanding of SRE principles, methodologies, and best practices.
Proficiency in automation and scripting using tools such as Terraform, Ansible, or equivalent.
Experience with monitoring and observability tools like Prometheus, Grafana, Stackdriver, or similar.
Excellent problem-solving skills and the ability to diagnose and resolve complex technical issues.
Strong communication skills and the ability to collaborate effectively with cross-functional teams.
Relevant GCP certifications (e.g., Professional Cloud Architect, Professional DevOps Engineer) are a plus.
Prior experience in leading incident response and post-incident reviews is highly desirable.
Demonstrated leadership skills and the ability to provide technical guidance and mentorship.

Keywords: Connecticut Georgia New Jersey Virginia

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at me@nvoids.com

Time Taken: 1

Location: ,