Home

urgent : MONITORING and ALERTING ENGINEER ,Skype + F2F at Fort Worth, Texas, USA
Email: [email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2085783&uid=

From:

Ramashankar,

vyzeinc

[email protected]

Reply to: [email protected]

Job Description -
Recruiters Note:- Reference #755484
Intake call notes-
Need Dynatrace and Cloudwatch with AWS specifically- most critical skills needed
Need a monitoring/alerting engineer
Does not need a Dynatrace user, but rather needs someone who can work with the application team (more of an engineering role)
Identifying threshold issues, memory issues, user issues,etc
Change management and problem management experience are necessary
This is a forward-facing role
Application performance management engineer would be ideal
Looking for someone with minimum 4-5 years experience would be perfect (but open to less experience for the right candidate)
After hours/weekend work schedule may require Sat or Sun periodically but for the most part is M-F 7:45/8 to 4:45/5 or so
DDU, DEM license familiarity required
Needs to know what session replay is
Zoom virtual interview will be required, then an onsite in person will be required as well

Title: MONITORING and ALERTING ENGINEER
Location: Fort Worth, Tx Hybrid (4 days onsite)
Duration: 6+months
MOI: 1 zoom, final onsite
or candidates only
Reason for opening - backfill
4 days/wk onsite required
Some weekend availability may be required from time to time.
7 day/24 hour Operation Environment After hours support flexibility is REQUIRED.
MONITORING and ALERTING ENGINEER
Manager wants Dynatrace - not just a user
Cloudwatch
DataDog or Splunk
ITIL - change management or Incident management experience is desirable
A Monitoring and Alerting Engineer is a specialized IT professional responsible for the design, implementation, and management of monitoring and alerting systems for an organization's IT infrastructure. Their primary goal is to ensure the continuous availability, reliability, and performance of critical systems and applications. By leveraging various monitoring tools and technologies, they proactively identify and address potential issues before they impact business operations.
Key Responsibilities
- System Monitoring: Implement and maintain monitoring solutions to track the performance, health, and availability of IT systems, applications, and networks.
- Alert Management: Configure and manage alert mechanisms to ensure timely notifications of any anomalies, failures, or performance degradations.
- Incident Response: Collaborate with support and operations teams to analyze, resolve, and lead event resolution processes during incidents and outages.
- Root Cause Analysis: Conduct thorough investigations to determine the root cause of incidents and implement corrective actions to prevent recurrence.
- Optimization: Identify opportunities for system optimization and performance improvements through data analysis and trend identification.
- Tool Evaluation and Integration: Evaluate, recommend, and integrate new monitoring and alerting tools and technologies to enhance the organization's monitoring capabilities.
- Documentation and Reporting: Develop and maintain comprehensive documentation, including monitoring configurations, incident reports, and performance metrics.
- Collaboration and Communication: Work closely with various IT teams, including application, infrastructure, and DevOps teams, to ensure seamless operations and effective communication during incidents.
Skills and Qualifications
- Proficiency in monitoring and alerting tools (e.g., Dynatrace, Datadog, CloudWatch, Splunk).
- Strong understanding of IT infrastructure, including servers, networks, databases, and cloud environments.
- Some Experience with incident, problem, and change management processes a plus
- Ability to analyze complex systems and identify performance bottlenecks.
- Excellent troubleshooting and problem-solving skills.
- Effective communication and collaboration skills.
- Familiarity with ITIL best practices and service management frameworks.
Performance of Duties
- Operate in a 7-day/24-hour environment with after-hours support flexibility.
- Collaborate with internal teams and suppliers to resolve and lead event resolution across all mission-critical IT and Telecom service levels.
- Protect business system availability through integrated incident, problem, and change management.
- Monitor systems for faults and optimization opportunities.
- Assist the major incident response team and escalate critical events.
- Evaluate and improve monitoring/alerting tools and processes.
- Conduct technical root cause analysis and engage with management teams for internal issues.
- Identify potential business-impacting events and manage incident processes.
- Provide expert guidance during reviews and debriefs.
- Analyze problem trends and monitor tools to identify chronic activity.
- Communicate effectively with senior management.
Qualifications
- Experience with Dynatrace, AppMon, Zabbix, SCOM, Datadog, CloudWatch, X-Ray, and Splunk.
- Self-motivated and able to work in a 7x24 environment.
- Experience managing critical system outages and interacting at all organizational levels.
- On-call support availability.
Preferred Qualifications
- B.S. degree in Computer Science, Information Systems, or Engineering.
- Technical expertise in distributed systems/administration and general scripting/programming (Python, Node.js, Ruby, Perl, Bash/sh).
- Excellent writing and communication skills.
- ServiceNow experience.

Keywords: javascript information technology card Texas
urgent : MONITORING and ALERTING ENGINEER ,Skype + F2F
[email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2085783&uid=
[email protected]
View All
08:09 PM 16-Jan-25


To remove this job post send "job_kill 2085783" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]


Time Taken: 6

Location: Fort Worth, Texas