Job Details

Home

SRE/incident management || Philly, PA Day 1 Onsite at Day, New York, USA

Email: [email protected]

Greetings from Tanisha
Systems!!

Hope you are doing great today !!

My name is Muskan & I am glad to write you an email for
the below-mentioned Job that I have with one of our clients would appreciate it
if you could let me know about your interest in pursuing it or if you know
anyone who might be a good fit.

Responsibilities:

Observability and Monitoring:

Develop and implement
robust observability strategies, including logging, metrics, and tracing,
to gain deep insights into the performance and health of our systems.

Collaborate with cross-functional teams to establish and
enforce best practices for instrumentation, logging, and monitoring throughout
the software development lifecycle.

Site Reliability Engineering:

Lead initiatives to
improve the reliability, availability, and scalability of our
applications and infrastructure.

Collaborate with development teams to design and implement
systems that are resilient to failures and capable of quick recovery.

Drive the adoption of SRE principles and practices across
the organization.

Incident Management:

Develop and refine
incident response processes, ensuring timely detection, analysis, and
resolution of incidents.

Collaborate with teams to conduct post-incident reviews,
identify root causes, and implement preventive measures.

Automation and Tooling:

Build and maintain
automation tools for deployment, monitoring, and incident response to
streamline operational processes.

Evaluate and integrate third-party tools to enhance
observability and SRE capabilities.

Collaboration and Leadership:

Provide technical
leadership and mentorship to the engineering team.

Collaborate with product managers, architects, and other
stakeholders to align observability and SRE initiatives with business goals.

Qualifications:

Bachelor's or
higher degree in Computer Science, Software Engineering, or a related
field.

Extensive experience in software engineering with a focus
on observability, monitoring, and SRE.

Strong expertise in designing and implementing distributed
systems for high availability and reliability.

Proficiency
in APM (Application performance monitoring), RUM (Real user monitoring),
Synthetics, correlation, alert & incident management will be required.
(e.g., OTEL, Jaeger, Kloudfuse, service-now)

Proficiency in one or more programming languages (e.g.,
Java, Python, Go).

Experience
with cloud platforms (e.g., AWS, Azure, GCP) and container orchestration (e.g.,
Kubernetes).

In-depth knowledge of observability tools and frameworks
(e.g., Prometheus, Grafana, ELK stack, Datadog, Aternity) and incident
management processes.

In-depth
knowledge of ML & AI frameworks (e.g., Anomaly, Outlier, AIOps, LLM )

Excellent communication and collaboration skills.

Demonstrated ability to lead technical initiatives and
mentor team members.

Preferred Qualifications:

Certifications in relevant areas such as AWS Certified
DevOps Engineer, Certified Kubernetes Administrator (CKA), or equivalent.

Previous experience in a leadership or management role.

Familiarity with Infrastructure as Code (IaC) tools such as
Terraform, Packer & C Crossplane

Thanks, and Regards,

Muskan Shukla

Desk: 212-729-6543 Ext 635

Tanisha Systems Inc.

99 Wood Ave South, Suite # 308, Iselin, NJ 08830

Email:
[email protected]

Muskan Shukla | LinkedIn

--

Keywords: cprogramm artificial intelligence machine learning access management information technology golang New Jersey

[email protected]
View all

Thu Dec 14 23:14:00 UTC 2023

To remove this job post send "job_kill 939536" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

cblmuskan015@gmail.com wrote:
Greetings from Tanisha
Systems!!

Hope you are doing great today !!

My name is Muskan & I am glad to write you an email for
the below-mentioned Job that I have with one of our clients would appreciate it
if you could let me know about your interest in pursuing it or if you know
anyone who might be a good fit.

Responsibilities:

Observability and Monitoring:

Develop and implement
      robust observability strategies, including logging, metrics, and tracing,
      to gain deep insights into the performance and health of our systems.

Collaborate with cross-functional teams to establish and
enforce best practices for instrumentation, logging, and monitoring throughout
the software development lifecycle.

Site Reliability Engineering:

Lead initiatives to
      improve the reliability, availability, and scalability of our
      applications and infrastructure.

Collaborate with development teams to design and implement
systems that are resilient to failures and capable of quick recovery.

Drive the adoption of SRE principles and practices across
the organization.

Incident Management:

Develop and refine
      incident response processes, ensuring timely detection, analysis, and
      resolution of incidents.

Collaborate with teams to conduct post-incident reviews,
identify root causes, and implement preventive measures.

Automation and Tooling:

Build and maintain
      automation tools for deployment, monitoring, and incident response to
      streamline operational processes.

Evaluate and integrate third-party tools to enhance
observability and SRE capabilities.

Collaboration and Leadership:

Provide technical
      leadership and mentorship to the engineering team.

Collaborate with product managers, architects, and other
stakeholders to align observability and SRE initiatives with business goals.

Qualifications:

Bachelor's or
     higher degree in Computer Science, Software Engineering, or a related
     field.

Extensive experience in software engineering with a focus
on observability, monitoring, and SRE.

Strong expertise in designing and implementing distributed
systems for high availability and reliability.

Proficiency
in APM (Application performance monitoring), RUM (Real user monitoring),
Synthetics, correlation, alert & incident management will be required.
(e.g., OTEL, Jaeger, Kloudfuse, service-now)

Proficiency in one or more programming languages (e.g.,
Java, Python, Go).

Experience
with cloud platforms (e.g., AWS, Azure, GCP) and container orchestration (e.g.,
Kubernetes).

In-depth knowledge of observability tools and frameworks
(e.g., Prometheus, Grafana, ELK stack, Datadog, Aternity) and incident
management processes.

In-depth
knowledge of ML & AI frameworks (e.g., Anomaly, Outlier, AIOps, LLM )

Excellent communication and collaboration skills.

Demonstrated ability to lead technical initiatives and
mentor team members.

Preferred Qualifications:

Certifications in relevant areas such as AWS Certified
     DevOps Engineer, Certified Kubernetes Administrator (CKA), or equivalent.

Previous experience in a leadership or management role.

Familiarity with Infrastructure as Code (IaC) tools such as
Terraform, Packer & C Crossplane

Thanks, and Regards,

Muskan Shukla

Desk: 212-729-6543 Ext 635

Tanisha Systems Inc.

99 Wood Ave South, Suite # 308, Iselin, NJ 08830

Email: 
Muskan.shukla@tanishasystems.com

Muskan Shukla | LinkedIn

Keywords: cprogramm artificial intelligence machine learning access management information technology golang New Jersey

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,