Home

Reliability and Monitoring Engineer(SRE) Plano, TX (Onsite) at Plano, Texas, USA
Email: [email protected]
From:

Sai kirubha,

Xforia inc

[email protected]

Reply to:   [email protected]

Hi,

Greetings from XForia

please check out the job description here below and share your Updated Resume

Title: Reliability and Monitoring Engineer(SRE)

Location: Plano, TX (Onsite)

Duration: 12 months Contract

Must have experience in DataDog, AWS

Roles & Responsibilities:

Responsible for ensuring the availability, performance, and reliability of our cloud-based infrastructure and services. The primary focus of this role is designing, implementing, and managing robust monitoring and alerting systems to proactively identify issues and timely incident response. This resource will work closely with the engineering platform and Development teams to optimize services and maintain service uptime.

Develop and maintain comprehensive monitoring solutions for cloud-based services and applications.

Configure monitoring tools and systems to collect relevant metrics, logs, and traces.

Create custom monitoring dashboards and reports using
Data Dog or other tools, to provide real-time insights into system performance and health.

Continuously monitor the cloud infrastructure's performance and capacity, anticipating and addressing potential scalability issues.

Proactively suggest and implement improvements to enhance the system's reliability, resilience, and fault tolerance.

Work on automating tasks to streamline operational processes and reduce manual intervention.

Collaborate with cross-functional teams to investigate and resolve critical incidents, ensuring minimal impact on end-users.

Work with Problem Management team to complete post-mortem analysis of incidents to identify root causes and implement preventive measures.

Experience & Qualifications:

Minimum 10+yrs of experience needed.

3+ years experience working with cloud platforms and services (
AWS, Azure, GCP, etc.) in a production environment.

Solid understanding of monitoring and logging tools, such as
Prometheus, Grafana, ELK stack, Splunk, etc.

Experience with infrastructure as code (IaC) tools, like
Terraform, CloudFormation, or Ansible.

Strong scripting and automation skills (e.g.,
Python, Bash) to facilitate operational tasks.

Knowledge of containerization technologies (
Docker, Kubernetes) and microservices architecture.

Familiarity with DevOps practices and Agile methodologies.

Thanks & Regards,

Sai Kirubha

Technical Recruiter

99300 Wade Boulevard, Suite 220, Frisco TX 75035

214-271-8948

|

[email protected] |
www.xforia.com

Keywords: Texas
[email protected]
View all
Tue Oct 10 21:51:00 UTC 2023

To remove this job post send "job_kill 734762" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 9

Location: Plano, Texas