Home

Site Reliability Engineer (SRE) | Boston, MA(For locals, it is hybrid and for non-locals, they will have relocate to Boston, MA and from there they can work hybrid) at Boston, Massachusetts, USA
Email: [email protected]
From:

irfan shiak,

Agile Enterprise Solutions Inc.

[email protected]

Reply to:   [email protected]

hello,

ROLE: Site Reliability Engineer (SRE)

Location:
Boston, MA(For locals, it is hybrid and for non-locals, they will have relocate to Boston, MA and from there they can work hybrid)

Job Description: Site Reliability Engineer (SRE) - DataDog, Cloud, Python, PowerShell, Ansible (10+ years experience)

Summary:

We are looking for an experienced Site Reliability Engineer (SRE) with expertise in cloud technologies, Python programming, PowerShell, and Ansible. As an SRE, you will be responsible for ensuring the reliability, availability, and performance of our systems and infrastructure. You will collaborate with cross-functional teams to design and implement automation, monitor system health, and proactively identify and resolve issues.

Responsibilities:

1. Design, build, and maintain highly available and scalable infrastructure on cloud platforms such as AWS, Azure, or GCP.

2. Develop and maintain automation scripts and tools using Python, PowerShell, and Ansible for deployment, configuration management, and system monitoring.

3. Collaborate with development teams to ensure the deployment of reliable and efficient applications and services.

4. Implement and improve monitoring and alerting systems to identify and address performance bottlenecks, availability issues, and capacity constraints.

5. Troubleshoot and resolve complex infrastructure issues, including performance optimization, network connectivity, and security concerns.

6. Perform regular system performance analysis and capacity planning to ensure scalability and efficiency of the infrastructure.

7. Design and implement disaster recovery strategies and ensure business continuity.

8. Collaborate with security teams to ensure compliance with security policies and industry best practices.

9. Continuously evaluate and adopt new technologies and tools to improve system reliability, performance, and operational efficiency.

10. Participate in on-call rotations and respond to incidents to minimize downtime and impact on system availability.

11. Document system configurations, processes, and troubleshooting procedures.

12. Mentor and provide guidance to junior members of the team.

Requirements:

1. Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

2. 7-10 years of experience working as a Site Reliability Engineer or in a similar role.

3. Strong experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure provisioning, networking, and security.

4. Proficiency in programming languages such as Python and PowerShell for automation, scripting, and infrastructure management.

5. Extensive experience with configuration management tools like Ansible for provisioning and managing infrastructure as code.

6. Solid understanding of DevOps principles and practices, including CI/CD pipelines and version control systems.

7. Strong knowledge of containerization technologies like Docker and container orchestration platforms like Kubernetes.

8. Experience with monitoring and log aggregation tools such as Prometheus, Grafana, ELK Stack, or Splunk.

9. Deep understanding of networking concepts, including TCP/IP, DNS, load balancing, and firewalls.

10. Familiarity with database technologies like MySQL, PostgreSQL, or MongoDB.

11. Strong problem-solving skills and the ability to troubleshoot complex issues in a distributed, large-scale production environment.

12. Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.

13. Experience with infrastructure-as-code tools like Terraform is a plus.

14. Relevant certifications such as AWS Certified DevOps Engineer, Azure Administrator, or Certified Kubernetes Administrator (CKA) are a plus.   

Thanks & Regards,

Irfan Shaik

P : 972-440-0069

Cell No: 647-375-2228

Agile Enterprise Solutions Inc.

7460 Warren Pkwy,Suite 100, Frisco, TX 75034.

Keywords: continuous integration continuous deployment information technology Massachusetts Texas
[email protected]
View all
Fri Oct 06 01:29:00 UTC 2023

To remove this job post send "job_kill 721575" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 8

Location: Boston, Massachusetts