Home

Site Reliability Engineer (SRE) LEAD |Boston, MA |Onsite at Remote, Remote, USA
Email: [email protected]
ROLE:
Site Reliability Engineer (
SRE)  LEAD

Location:
Boston,
MA(For locals, it is hybrid and for non-locals, they will have relocate
to Boston, MA and from there they can work hybrid)

Job Description: Site Reliability Engineer (
SRE) - DataDog, Cloud, Python, PowerShell, Ansible (10+ years experience)

Summary:

We
are looking for an experienced Site Reliability Engineer (
SRE) with
expertise in cloud technologies, Python programming, PowerShell, and
Ansible. As an
SRE, you will be responsible for ensuring the
reliability, availability, and performance of our systems and
infrastructure. You will collaborate with cross-functional teams to
design and implement automation, monitor system health, and proactively
identify and resolve issues.

Responsibilities:

1. Design, build, and maintain highly available and scalable infrastructure on cloud platforms such as AWS, Azure, or GCP.

2.
Develop and maintain automation scripts and tools using Python,
PowerShell, and Ansible for deployment, configuration management, and
system monitoring.

3. Collaborate with development teams to ensure the deployment of reliable and efficient applications and services.

4.
Implement and improve monitoring and alerting systems to identify and
address performance bottlenecks, availability issues, and capacity
constraints.

5. Troubleshoot and
resolve complex infrastructure issues, including performance
optimization, network connectivity, and security concerns.

6. Perform regular system performance analysis and capacity planning to ensure scalability and efficiency of the infrastructure.

7. Design and implement disaster recovery strategies and ensure business continuity.

8. Collaborate with security teams to ensure compliance with security policies and industry best practices.

9.
Continuously evaluate and adopt new technologies and tools to improve
system reliability, performance, and operational efficiency.

10. Participate in on-call rotations and respond to incidents to minimize downtime and impact on system availability.

11. Document system configurations, processes, and troubleshooting procedures.

12. Mentor and provide guidance to junior members of the team.

Requirements:

1. Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

2. 7-10 years of experience working as a Site Reliability Engineer or in a similar role.

3.
Strong experience with cloud platforms such as AWS, Azure, or GCP,
including infrastructure provisioning, networking, and security.

4. Proficiency in programming languages such as Python and PowerShell for automation, scripting, and infrastructure management.

5. Extensive experience with configuration management tools like Ansible for provisioning and managing infrastructure as code.

6. Solid understanding of DevOps principles and practices, including CI/CD pipelines and version control systems.

7. Strong knowledge of containerization technologies like Docker and container orchestration platforms like Kubernetes.

8. Experience with monitoring and log aggregation tools such as Prometheus, Grafana, ELK Stack, or Splunk.

9. Deep understanding of networking concepts, including TCP/IP, DNS, load balancing, and firewalls.

10. Familiarity with database technologies like MySQL, PostgreSQL, or MongoDB.

11.
Strong problem-solving skills and the ability to troubleshoot complex
issues in a distributed, large-scale production environment.

12. Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.

13. Experience with infrastructure-as-code tools like Terraform is a plus.

14.
Relevant certifications such as AWS Certified DevOps Engineer, Azure
Administrator, or Certified Kubernetes Administrator (CKA) are a plus. 

Thanks & Regards,

Irfan Shaik

P : 972-440-0069

Cell No: 647-375-2228

Agile Enterprise
Solutions Inc.

7460 Warren Pkwy,Suite
100, Frisco, TX 75034.

Email: 

[email protected] 

 Website:

www.aesinc.us.com

Keywords: continuous integration continuous deployment information technology Massachusetts Texas
[email protected]
View all
Fri Oct 06 00:04:00 UTC 2023

To remove this job post send "job_kill 720846" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,