Home

Reliability Engineer with Strong Data Dog Exp | Contract | Remote at Strong, Arkansas, USA
Email: [email protected]
NO H1-B

Position:

Reliability Engineer (Data Dog)

Location:

REMOTE

Duration:

6 months with high possibility of extension

Background check:

Yes

Interview Process/#
of Rounds:

2 Rounds

Manager Notes:

MUST have strong working knowledge of and
experience with Data Dog.

Job Description:

Responsibilities

Duties
include:

Develop and maintain comprehensive monitoring solutions in Data
Dog for cloud-based and on premise-based services and applications.

Configure monitoring tools and systems to collect relevant
metrics, logs, and traces.

Create custom monitoring dashboards and reports using Data
Dog, to provide real-time insights into system performance and health.

Continuously monitor the infrastructure's performance and
capacity, anticipating and addressing potential scalability issues and create
monitors with targeted notifications

Understanding on how to install the Agent in Linux and
Windows and configure the YAML file to monitor the systems.

Aggregate and visualize data in the Datadog application.

Familiarity the Data Dog API and how to write custom monitors
and alerts.

Familiarity with Networking to work with Network team to
setup Dashboards and alerts

Proactively suggest and implement improvements to enhance the
system's reliability, resilience, and fault tolerance.

Work on automating tasks to streamline operational processes
and reduce manual intervention.

Collaborate with cross-functional teams to investigate and
resolve critical incidents, ensuring minimal impact on end-users.

Work with Problem Management team to complete post-mortem
analysis of incidents to identify root causes and implement preventive
measures.

Understand the overall architecture of our systems to
identify gaps in monitoring and troubleshoot issues.

Configure and maintain custom dashboards
and alerts in various monitoring tools.

Create custom reports, deliver report
presentations to various stakeholders.

YAML, JSON, Python, and shell scripting

Develop metrics for both the business and
technical teams to determine the health of systems.

Provide on-call support as needed.

Leads and coordinates performance
engineering for medium to large initiatives.

Collect and document expected system
performance and operational characteristics.

Collect and/or prepare test data for test
execution.

Develop and execute performance tests including
load, stress, endurance, fail-over and interoperability.

Conduct technical analysis of performance
test results and production systems, and provide recommendations on
performance tuning, systems, and infrastructure. Identify, report, and
review defects in assessing system performance and stability.

Defining the strategy for enabling
performance diagnostics and monitoring using an Application Performance
Management (APM) tool, other monitoring tools, and diagnostic techniques.

Collaborating with developers to promote
the concept of performance engineering during all phases of the SDLC to
detect and correct performance issues earlier in the lifecycle.

Leads peer reviews to ensure the
completeness of all test assets created.

Resolve performance and stability issues in
performance test environment.

Develop performance engineering work plan
structure and project schedule.

Review architectural design for performance
risks and potential issues.

Prepare capacity analysis when applicable.

Minimum Requirements:

Requires an BA/BS degree in Information
Technology, Computer Science or related field of study and a minimum of 7
years performance engineering and performance testing experience; or any
combination of education and experience, which would provide an equivalent
background.

Preferred Skills,
Capabilities and Experiences:

Experience managing performance engineering
efforts for an application strongly preferred.

Proficiency with the following tools is
preferred (Splunk, DataDog, DynaTrace among others).

Experience managing performance engineering
efforts for an application strongly preferred.

Knowledge of developing scripts for
monitoring (PowerShell, Python and Shell scripting).

5 years of Splunk programming proficiency
is highly preferred.

5-6 years experience using .NET and Java
application and Application Monitoring Tools like App Dynamics or Datadog
are highly preferred.

Proficiency is performance tuning is
preferred.

Good understanding of the UI, Middleware
and backend Databases

--

Keywords: business analyst user interface information technology
[email protected]
View all
Tue Feb 06 20:23:00 UTC 2024

To remove this job post send "job_kill 1089088" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,