Job Details

Home

Need Site Reliability Engineer at Philly, PA at Remote, Remote, USA

Email: [email protected]

Hi,

Hope you are doing great today.

Let me know if you are interested in below requirement.

Please send the suitable resume along with contact details, current location
asap on

[email protected]

Job Title
: Site
Reliability Engineer

Location
:
Philadelphia, PA (Onsite)

Duration
:
12+ Months Contract

Job
Description :

1.

Observability and Monitoring:

o

Develop
and implement robust observability strategies, including logging, metrics, and
tracing, to gain deep insights into the performance and health of our systems.

Collaborate with cross-functional teams to establish and
enforce best practices for instrumentation, logging, and monitoring throughout
the software development lifecycle.

1.

Site Reliability Engineering:

o

Lead
initiatives to improve the reliability, availability, and scalability of our
applications and infrastructure.

Collaborate with development teams to design and implement
systems that are resilient to failures and capable of quick recovery.

Drive the adoption of SRE principles and practices across
the organization.

1.

Incident Management:

o

Develop
and refine incident response processes, ensuring timely detection, analysis,
and resolution of incidents.

Collaborate with teams to conduct post-incident reviews,
identify root causes, and implement preventive measures.

1.

Automation and Tooling:

o

Build
and maintain automation tools for deployment, monitoring, and incident response
to streamline operational processes.

Evaluate and integrate third-party tools to enhance
observability and SRE capabilities.

1.

Collaboration and Leadership:

o

Provide
technical leadership and mentorship to the engineering team.

Collaborate with product managers, architects, and other
stakeholders to align observability and SRE initiatives with business goals.

Qualifications:

Bachelor's
or higher degree in Computer Science, Software Engineering, or a related field.

o

Extensive experience in software
engineering with a focus on observability, monitoring, and SRE.

o

Strong expertise in designing and
implementing distributed systems for high availability and reliability.

o

Proficiency in APM (Application
performance monitoring), RUM (Real user monitoring), Synthetics, correlation,
alert & incident management will be required. (e.g., OTEL, Jaeger,
Kloudfuse, service-now)

o

Proficiency in one or more
programming languages (e.g., Java, Python, Go).

o

Experience with cloud platforms
(e.g., AWS, Azure, GCP) and container orchestration (e.g., Kubernetes).

o

In-depth knowledge of observability
tools and frameworks (e.g., Prometheus, Grafana, ELK stack, Datadog, Aternity)
and incident management processes.

o

In-depth knowledge of ML & AI
frameworks (e.g., Anomaly, Outlier, AIOps, LLM )

o

Excellent communication and
collaboration skills.

o

Demonstrated ability to lead
technical initiatives and mentor team members.

Preferred
Qualifications:

1.

Certifications in relevant areas
such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator
(CKA), or equivalent.

Previous experience in a leadership or
management role. Familiarity with Infrastructure as Code (IaC) tools such as
Terraform, Packer & C Crossplane

--

Thanks
& Regards

Rajashekar
Soma

PH :
+1 972-440-0066

Agile
Enterprise Solutions Inc
.

7460
Warren Pkwy, Suite 100, Frisco, TX 75034.

Email:

[email protected]

Website
: www.aesinc.us.com

--

Keywords: cprogramm artificial intelligence machine learning information technology golang Pennsylvania Texas

[email protected]
View all

Thu Dec 14 14:30:00 UTC 2023

To remove this job post send "job_kill 937717" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

somarajashekar.agilees@gmail.com wrote:
Hi,

Hope you are doing great today.

Let me know if you are interested in below requirement.

Please send the suitable resume along with contact details, current location
asap on

rajashekar_soma@aesincus.com

Job Title
 : Site
Reliability Engineer

Location
 :
Philadelphia, PA (Onsite)

Duration
 :
12+ Months Contract

Job
Description :

Observability and Monitoring:

Develop
and implement robust observability strategies, including logging, metrics, and
tracing, to gain deep insights into the performance and health of our systems.

Collaborate with cross-functional teams to establish and
enforce best practices for instrumentation, logging, and monitoring throughout
the software development lifecycle.

Site Reliability Engineering:

Lead
initiatives to improve the reliability, availability, and scalability of our
applications and infrastructure.

Collaborate with development teams to design and implement
systems that are resilient to failures and capable of quick recovery.

Drive the adoption of SRE principles and practices across
the organization.

Incident Management:

Develop
and refine incident response processes, ensuring timely detection, analysis,
and resolution of incidents.

Collaborate with teams to conduct post-incident reviews,
identify root causes, and implement preventive measures.

Automation and Tooling:

Build
and maintain automation tools for deployment, monitoring, and incident response
to streamline operational processes.

Evaluate and integrate third-party tools to enhance
observability and SRE capabilities.

Collaboration and Leadership:

Provide
technical leadership and mentorship to the engineering team.

Collaborate with product managers, architects, and other
stakeholders to align observability and SRE initiatives with business goals.

Qualifications:

Bachelor's
or higher degree in Computer Science, Software Engineering, or a related field.

Extensive experience in software
engineering with a focus on observability, monitoring, and SRE.

Strong expertise in designing and
implementing distributed systems for high availability and reliability.

Proficiency in APM (Application
performance monitoring), RUM (Real user monitoring), Synthetics, correlation,
alert & incident management will be required. (e.g., OTEL, Jaeger,
Kloudfuse, service-now)

Proficiency in one or more
programming languages (e.g., Java, Python, Go).

Experience with cloud platforms
(e.g., AWS, Azure, GCP) and container orchestration (e.g., Kubernetes).

In-depth knowledge of observability
tools and frameworks (e.g., Prometheus, Grafana, ELK stack, Datadog, Aternity)
and incident management processes.

In-depth knowledge of ML & AI
frameworks (e.g., Anomaly, Outlier, AIOps, LLM )

Excellent communication and
collaboration skills.

Demonstrated ability to lead
technical initiatives and mentor team members.

Preferred
Qualifications:

Certifications in relevant areas
such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator
(CKA), or equivalent.

Previous experience in a leadership or
management role. Familiarity with Infrastructure as Code (IaC) tools such as
Terraform, Packer & C Crossplane

Thanks
& Regards

Rajashekar
Soma

PH :
+1 972-440-0066

Agile
Enterprise Solutions Inc
.

7460
Warren Pkwy, Suite 100, Frisco, TX 75034.

Email:

rajashekar_soma@aesincus.com

Website
: www.aesinc.us.com

Keywords: cprogramm artificial intelligence machine learning information technology golang Pennsylvania Texas

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 2

Location: ,