Job Details

Home

Site Reliability Engineer (SRE) | Must Have 10+ yrs exp. | Hybrid | at Remote, Remote, USA

Email: [email protected]

Passport Number must

Site Reliability Engineer (SRE)

Location:
Boston, MA (For
locals, it is hybrid and for non-locals, they will have relocate to Boston, MA
and from there they can work hybrid)

Job Description:
Site Reliability Engineer (SRE) - DataDog, Cloud, Python, PowerShell, Ansible

(10+ years
experience)

Summary:

We are looking
for an experienced Site Reliability Engineer (SRE) with expertise in cloud
technologies, Python programming, PowerShell, and Ansible. As an SRE, you will
be responsible for ensuring the reliability, availability, and performance of
our systems and infrastructure. You will collaborate with cross-functional
teams to design and implement automation, monitor system health, and
proactively identify and resolve issues.

Responsibilities:

1. Design, build,
and maintain highly available and scalable infrastructure on cloud platforms
such as AWS, Azure, or GCP.

2. Develop and
maintain automation scripts and tools using Python, PowerShell, and Ansible for
deployment, configuration management, and system monitoring.

3. Collaborate
with development teams to ensure the deployment of reliable and efficient
applications and services.

4. Implement and
improve monitoring and alerting systems to identify and address performance
bottlenecks, availability issues, and capacity constraints.

5. Troubleshoot
and resolve complex infrastructure issues, including performance optimization,
network connectivity, and security concerns.

6. Perform
regular system performance analysis and capacity planning to ensure scalability
and efficiency of the infrastructure.

7. Design and
implement disaster recovery strategies and ensure business continuity.

8. Collaborate
with security teams to ensure compliance with security policies and industry
best practices.

9. Continuously
evaluate and adopt new technologies and tools to improve system reliability,
performance, and operational efficiency.

10. Participate
in on-call rotations and respond to incidents to minimize downtime and impact
on system availability.

11. Document
system configurations, processes, and troubleshooting procedures.

12. Mentor and
provide guidance to junior members of the team.

Requirements:

1. Bachelor's or
Master's degree in Computer Science, Engineering, or a related field.

2. 7-10 years of
experience working as a Site Reliability Engineer or in a similar role.

3. Strong
experience with cloud platforms such as AWS, Azure, or GCP, including
infrastructure provisioning, networking, and security.

4. Proficiency in
programming languages such as Python and PowerShell for automation, scripting,
and infrastructure management.

5. Extensive
experience with configuration management tools like Ansible for provisioning
and managing infrastructure as code.

6. Solid
understanding of DevOps principles and practices, including CI/CD pipelines and
version control systems.

7. Strong
knowledge of containerization technologies like Docker and container
orchestration platforms like Kubernetes.

8. Experience
with monitoring and log aggregation tools such as Prometheus, Grafana, ELK
Stack, or Splunk.

9. Deep
understanding of networking concepts, including TCP/IP, DNS, load balancing,
and firewalls.

10. Familiarity
with database technologies like MySQL, PostgreSQL, or MongoDB.

11. Strong
problem-solving skills and the ability to troubleshoot complex issues in a
distributed, large-scale production environment.

12. Excellent
communication and collaboration skills, with the ability to work effectively in
cross-functional teams.

13. Experience
with infrastructure-as-code tools like Terraform is a plus.

14. Relevant
certifications such as AWS Certified DevOps Engineer, Azure Administrator, or
Certified Kubernetes Administrator (CKA) are a plus.

Rajesh Potlapelli

PH: 9724400051

https://www.linkedin.com/in/rajesh-potlapelli/

https://www.linkedin.com/groups/9142054/

[email protected]
/ [email protected]

--

Keywords: continuous integration continuous deployment information technology Massachusetts

[email protected]
View all

Thu Sep 21 19:31:00 UTC 2023

To remove this job post send "job_kill 666042" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

directclientsrequirement@gmail.com wrote:
Passport Number must

Site Reliability Engineer (SRE)

Location: 
Boston, MA (For
locals, it is hybrid and for non-locals, they will have relocate to Boston, MA
and from there they can work hybrid)

Job Description:
Site Reliability Engineer (SRE) - DataDog, Cloud, Python, PowerShell, Ansible

(10+ years
experience)

Summary:

We are looking
for an experienced Site Reliability Engineer (SRE) with expertise in cloud
technologies, Python programming, PowerShell, and Ansible. As an SRE, you will
be responsible for ensuring the reliability, availability, and performance of
our systems and infrastructure. You will collaborate with cross-functional
teams to design and implement automation, monitor system health, and
proactively identify and resolve issues.

Responsibilities:

1. Design, build,
and maintain highly available and scalable infrastructure on cloud platforms
such as AWS, Azure, or GCP.

2. Develop and
maintain automation scripts and tools using Python, PowerShell, and Ansible for
deployment, configuration management, and system monitoring.

3. Collaborate
with development teams to ensure the deployment of reliable and efficient
applications and services.

4. Implement and
improve monitoring and alerting systems to identify and address performance
bottlenecks, availability issues, and capacity constraints.

5. Troubleshoot
and resolve complex infrastructure issues, including performance optimization,
network connectivity, and security concerns.

6. Perform
regular system performance analysis and capacity planning to ensure scalability
and efficiency of the infrastructure.

7. Design and
implement disaster recovery strategies and ensure business continuity.

8. Collaborate
with security teams to ensure compliance with security policies and industry
best practices.

9. Continuously
evaluate and adopt new technologies and tools to improve system reliability,
performance, and operational efficiency.

10. Participate
in on-call rotations and respond to incidents to minimize downtime and impact
on system availability.

11. Document
system configurations, processes, and troubleshooting procedures.

12. Mentor and
provide guidance to junior members of the team.

Requirements:

1. Bachelor's or
Master's degree in Computer Science, Engineering, or a related field.

2. 7-10 years of
experience working as a Site Reliability Engineer or in a similar role.

3. Strong
experience with cloud platforms such as AWS, Azure, or GCP, including
infrastructure provisioning, networking, and security.

4. Proficiency in
programming languages such as Python and PowerShell for automation, scripting,
and infrastructure management.

5. Extensive
experience with configuration management tools like Ansible for provisioning
and managing infrastructure as code.

6. Solid
understanding of DevOps principles and practices, including CI/CD pipelines and
version control systems.

7. Strong
knowledge of containerization technologies like Docker and container
orchestration platforms like Kubernetes.

8. Experience
with monitoring and log aggregation tools such as Prometheus, Grafana, ELK
Stack, or Splunk.

9. Deep
understanding of networking concepts, including TCP/IP, DNS, load balancing,
and firewalls.

10. Familiarity
with database technologies like MySQL, PostgreSQL, or MongoDB.

11. Strong
problem-solving skills and the ability to troubleshoot complex issues in a
distributed, large-scale production environment.

12. Excellent
communication and collaboration skills, with the ability to work effectively in
cross-functional teams.

13. Experience
with infrastructure-as-code tools like Terraform is a plus.

14. Relevant
certifications such as AWS Certified DevOps Engineer, Azure Administrator, or
Certified Kubernetes Administrator (CKA) are a plus.

Rajesh Potlapelli

PH: 9724400051

https://www.linkedin.com/in/rajesh-potlapelli/

https://www.linkedin.com/groups/9142054/

rajesh_potlapelli@aesincus.com
 / directclientsrequirement@gmail.com

Keywords: continuous integration continuous deployment information technology Massachusetts

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 7

Location: Boston, Massachusetts