Home

Open Telemetry SME || Remote OK || C2C at Remote, Remote, USA
Email: [email protected]
  Hi,  Hope you are doing well!! Please find the job description detailing the preferred specifications that our client is seeking for this position.

Title - Open Telemetry SME

Location: Remote OK, prefers
Reston, VA (if local must go onsite twice a week)

VISA- No H1B

Job Description:

Client
is seeking an experienced monitoring tools and Open Telemetry Subject
Matter Expert (SME) who will be responsible for designing, implementing
and optimizing monitoring solutions and leveraging Open Telemetry to
enhance observability within the Enterprise Command Center (ECC).

The SME
should collaborate with the Incident Management team to troubleshoot and
resolve incidents.

Key Job Functions:

Lead
the design and implementation of monitoring solutions using industry
standard tools such as Splunk and others.

Customize
monitoring configurations to align with the organizational requirements.

Implement
and integrate Open Telemetry across various applications and services for
enhanced observability.

Optimize
monitoring solutions for efficiency and accuracy ensuring minimal impact
on system performance.

Responsible
for designing and implementing application and infrastructure performance
monitoring under AWS Cloud environment.

Create
monitors and dashboards to monitor applications and infrastructure
performance.

Perform
deep statistical analysis using performance data to help identify capacity
and performance bottlenecks.

Configure
alerting mechanisms within monitoring tools to proactively identify and
address potential issues.

Develop
comprehensive documentation for monitoring tool configurations, Open
Telemetry implementations and best practices.

Provide
training to incident management teams on utilizing monitoring tools and
interpreting open telemetry data effectively.

Setup
monitoring dashboards for incident detection and alerting.

Perform
end-to-end analysis of transactions under an observability environment.

Troubleshoot
incidents and identify root cause quickly using wire data analytics, application
performance management and event correlation monitoring tools.

Diagnose
and resolve incidents by providing factual data from the various
monitoring and instrumentation systems.

Job Requirements:

A good
understanding of the IT Cloud infrastructure that includes AWS
Cloud, middleware, database, storage and/or network infrastructure.

Strong
understanding of IT infrastructure, networking, security concepts and
application architecture.

Hands-on
experience with Open Telemetry instrumentation and telemetry data
collection.

Proven
experience as a Splunk SM with in-depth knowledge of Splunk architecture
and components.

Excellent
troubleshooting and problem-solving skills.

Strong
documentation skills and attention to detail.

Proactively
monitoring of hardware, software, and environmental alerts or
malfunctions.

Analyze
dashboards and monitoring tools to look for trends and patterns in
application/infrastructure health and performance.

Monitor
applications and infrastructure using tools like Splunk, DynaTrace,
Catchpoint, MoogSoft, xMatters, SignalFx, Catchpoint, MoogSoft, xMatters,
SolarWinds, Extrahop etc.

Expert
understanding of micro service-based applications deployed in Cloud using
Lambdas, ECS Fargate etc.

Proficiency
in AWS services like IAM, Roles, Security groups, EC2, S3, Lambda, ALB,
ECS etc.

Experience
working with AWS tools like ELB, RDS, Redshift, DynamoDB, Aurora,
Route53, Lambda, S3, Batch, CloudWatch, CloudTrail, WAF etc.

Hands
on experience with transaction level monitoring using Dynatrace and
Splunk.

Create
Splunk search queries and dashboards.

Be the
SME in helping recognize and onboard new data sources into Splunk and
other tools, analyze the data for anomalies and trends, and building
dashboards highlighting the key trends of the data.

Implement
best in class engineering strategies to support a distributed clustered
Splunk environment consisting of Search Heads, Indexers, Forwarders,
Splunk Enterprise Security (ES) app spanning security, performance,
engineering, and operational roles.

Use
open-source Observability framework, OpenTelemetry for instrumenting,
generating, collecting, and exporting telemetry data such as traces,
metrics, logs to help analyze application performance and behavior.

Use
distributed tracing in an end-to-end visibility environment that consists
of micro-services, Containers, Serverless and Lambda.

Work
closely with application teams and business stakeholders to perform
troubleshooting and aid in incident triage. 

Influence
other technical teams on incident calls and articulate troubleshooting
steps effectively.

Follow
up on items that could negatively impact production operations, assist
with postmortem related activities, and support various efforts related to
operational improvements.

Strong
relationship management skills and aptitude to multi-task and work well in
a high stress environment, both within teams and independently.

Thanks and Regards

Sumit Shahi

Technical Recruiter

Cloud
Space LLC

Email: 
[email protected]

Website: 
www.cloudspacetek.com

Address : 
1909 J N, Pease Place, Suite 201, Charlotte, NC 28262

--

Keywords: sthree information technology golang North Carolina Virginia
[email protected]
View all
Tue Feb 13 21:47:00 UTC 2024

To remove this job post send "job_kill 1114075" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 22

Location: , Oklahoma