Home

Urgently Hiring ::Senior Engineer for Incident Triage & Monitoring::Hybrid (Reston, VA) at Reston, Virginia, USA
Email: [email protected]
From:

Priyabrata Pradhan,

Vyzeinc

[email protected]

Reply to:   [email protected]

Job Description -

Position: Senior Engineer for Incident Triage & Monitoring

Location: Reston, VA or Plano, TX, Hybrid

Visa: CITIZEN, GREEN CARD, GC EAD, H4 EAD

Duration: 12+ months

Responsibilities:

Must have AWS Certification professional NOT associate

Triage/Incident Management

Hands on AWS, 5+ years preferably architect

AWS certs required.

Monitoring tools: Splunk, Dynatrace, Open Telemetry

Solution delivery and design

ECC mainly does incident management but they are changing their model to also do triage engineering. A lot of apps are moving to AWS -  they need strong AWS experience as an engineer or an architect with at least 6 years of AWS experience. The hands on experience is very important. Future state -  ECC will have hands on functions. Look, observe, analyze, hands-on, with AWS. The person should be using monitoring tools like splunk, Dynatrace, Open telemetry, etc. With the help of those tools they will be triaging incidents. Later in the year, they will also be delivering solutions. In future state, these teams will be clubbed together and deploying solutions. To be able to deploy solutions, they need to be well-versed in AWS. For the incident management piece, they can train the person. First few weeks they will have incident management training. Ask cloud related questions before sending.  Reston based but willing to be flexible on location.

Sr. Engineer or AWS Solutions Architect could work. They dont want developers, they need experience on the infrastructure side. An admin that has touched all the infrausture of AWS (storage, EC2,) etc. not necessarily kubernates, etc.

Example they should be able to understand if anything is dropping before it hits the application, where is it dropping Jack of all trades and master of none.

Integration, web applications, etc. all of that has to be understood by the user. They should be able to guide the team and say  look here or look there to see if permissions are there, are they attached to EC2 instances

Do they have application knowledge What services they have experience with They may be an expert is some things but not all. Let them write a paragraph or two of what they have done with AWS. Not always everything is true, let them write what have you done with AWS

Interview is scenario based to understand how they troubleshoot, resolve issues. Their questions are real life scenario questions.

Job Description:

IT Analyst III Specialized (ECC Level 2 Senior Engineer for Incident Triage & Monitoring)

In this incident management function, manage incidents to resolution in a 24/7/365 environment using the client incident management processes, effectively guide incident and triage calls from a technical perspective, share technical details obtained from monitoring tools and dashboards to aid troubleshooting, outline details of resolution activities, recommend and implement improved processes, provide timely status updates to stakeholders, assist with postmortem related activities and support various efforts related to operational improvements.

Manage efforts to maintain application in production, including troubleshooting stoppages, repairing bugs, documenting application performance, and coordinating with technology infrastructure management.

Key Job Functions:

Manage IT production incidents to resolution in a 24/7/365 environment using the client incident management processes and communicate management of incident status, impact and resolution actions.

Effectively lead and guide Incident triage calls from a technical perspective analyzing different components of the infrastructure and application environment via the use of a variety of monitoring tools and processes.

Troubleshoot the incidents and identify root cause quickly using operations, wire data analytics, application performance management and event correlation monitoring tools.

Perform analysis of data, evaluating multiple application protocols including web, database, storage, and supporting infrastructure such as AWS, UNIX, DNS, LDAP, SSL, SMTP, and FTP.

Troubleshooting and resolving incidents on the AWS cloud infrastructure.

6Hands on experience managing and monitoring applications deployed on Amazon Web Services (AWS) using tools li

ke EC2, ELB, RDS, Redshift, DynamoDB, Aurora, Route53, ECS, Lambada, S3, Batch, CloudWatch, CloudTrail, WAF etc.

Experience with building tools for monitoring and troubleshooting of system resources in an AWS environment.

Ability to triage AWS related incidents using monitoring tools on AWS Cloud.

Experience with performance engineering of AWS Cloud applications.

Hands on experience with transaction level monitoring using Dynatrace, Open-Telemetry and Splunk.

Ability to perform transaction level monitoring and troubleshooting in AWS cloud platform.

Eyes on glass monitoring of the health of applications as well as the underlying infrastructure.

Monitoring experience with tools like Extrahop, SolarWinds, Netcool suite, Catchpoint, MoogSoft.

Ability to analyze dashboards and reporting/monitoring tools to look at trends and patterns in application health and performance.

Proactively looking for hardware, software, and environmental alerts or malfunctions.

Influence other technical teams on the calls and articulate troubleshooting steps effectively.

Lead required technical follow-up calls for critical incidents.

Assist with documentation of Root Cause Analysis (RCA) or Correction of Errors (COE) and data quality for all ECC communicated incidents.

Ensure appropriate functional and management escalation takes place as per the standards and procedures.

Follow up on items that could potentially negatively impact production operations, assist with postmortem related activities, and support various efforts related to operational improvements.

Based on recommendations from management, implement new and improved processes, change processes, perform new tasks, create reports and address ad-hoc requests.

Participate in on-call rotation.

Ability to work on any shifts as needed including weekends and night shifts.

Ability to report incident details and metrics to senior leadership.

Education:

Bachelor's Degree or equivalent required.

Minimum Experience:

6 plus years of related experience.

Specialized Knowledge & Skills:

6 plus years of working experience with different IT Infrastructure components such as AWS, Unix/ Linux Servers, Wintel Servers, AWS, networks, firewalls, routers, load balancers, VPN, Apache, web logic, LDAP, Active Directory, Exchange, Oracle/MS SQL databases, SAN, Virtualization, Email systems, Enterprise monitoring and access management solutions for single sign on.

Subject matter expertise is not required and experience with at least eight of the above is preferred.

Senior level hands-on working experience with Amazon Web Services (AWS).

Proven methodical approach to problem identification, monitoring, problem solving and resolution.

Ability to analyze different components of the infrastructure and application environments during Incident triage calls.

Aptitude to influence other technical teams on the incident calls and articulate troubleshooting steps effectively.

Experience and confidence working with all levels of management; excellent written and verbal skills.

Able to quickly and concisely communicate with senior management on technical issues in non-technical terms and to run large conference calls during Incident calls with a wide range of personnel and management levels.

Strong relationship management skills and aptitude to multi-task and work well in a high stress environment, both within teams and independently.

AWS Solution Architect Associate or higher certification

Preferred Qualifications:

Understanding of tools like CloudFormation or Terraform

Management and troubleshooting of Middleware products on UNIX and Linux environments.

Knowledge of Service Oriented Architecture (SOA), Java etc.

Understanding of Azure or Google Cloud.

Prior client or Financial industry experience.

Thanks & Regards

Priyabrata pradhan

Email: [email protected]

Keywords: sthree active directory information technology green card microsoft Texas Virginia
[email protected]
View all
Mon Mar 18 23:36:00 UTC 2024

To remove this job post send "job_kill 1227624" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 10

Location: Reston, Virginia