Home

Hybrid RE: Site Reliability Engineer No h1B at Remote, Remote, USA
Email: [email protected]
Client:
Walmart

Title: Site
Reliability Engineer role with Azure and SPlunk

Location:
Sunnyvale CA

Duration:
6+ Months

Visa: NO
H1B 

MOI:
Skype

Need LinkedIn with profile picture-2 Candidate
only

This is a Site Reliability Engineer Role for Sam's Cash Application
team.

Role and Responsibilities include:

Production Tickets handling and
Troubleshooting : 
Requires
knowledge of: Strong Analytical and problem solving skills; Root cause
analysis (RCA); Root cause corrective action (RCCA) To guide team members
in RCA and RCCA to identify the origins of and prevent defects/performance
gaps. Analyzes complex problems involving multiple parties, networks,
hardware, software, and cloud computing technologies.

Assesses immediate restoration versus root
cause based on consequences and resource requirements. Analyzes the issues
and plans a series of steps to enhance an application's availability and
reliability, potentially including reconfiguration, integration, removal,
or the addition of application components. Analyzes trends to proactively
prevent incidents and provide historical summary reports.

Disaster Recovery Planning: Requires knowledge
of: 
Disaster recovery
procedures and processes; Enterprise disaster recovery systems. To
coordinate partial and full tests of contingency and disaster recovery
plans. Creates and maintains data center contingency documents and action
plans. Defines and documents contingency and disaster recovery procedures.
Leads the identification of critical functions for assigned area of
responsibility. Creates and tests plans for operating in a remote back-up
environment. Coordinates the day-to-day activities of control measures
used in recovery plans.

Monitoring and Alerting : 
Requires knowledge of: Monitoring and
alerting tools (Splunk, Prometheus, Grafana); Monitoring metrics and key
performance indicators (for example, availability, MTBF, MTTR); SLIs and
SLOs (for example, request latency, availability, error rates, saturation);
Distributed tracing; Alerting logic.

To establish metrics to monitor network,
software, or system performance. Establishes SLOs/SLAs to determine
availability goals of systems/services. Sets altering priorities by
identifying the most important systems based on criticality. Oversees
daily system monitoring, including verifying the integrity and
availability of all hardware and services, reviews system and application
logs, and verifies the completion of scheduled jobs.

Leads end-to-end audits of monitors and alarms
based on subsystem knowledge. Provides proactive updates to executive
leadership on potential customer-impacting issues. Analyzes systems and
makes recommendations to prevent possible incidents using knowledge of
complex and company-wide systems.

Data Reporting and Metrics:

Advanced SQL skills to pull complex data
report from multiple sources, familiar with Databricks or GCP Big Query,
capable to write advanced "Splunk" queries to join multiple
indices to stitch data, using Data-Driven decision-making process to analyze
the impact of the production issues and prioritize them.

Top 3 Skills Needed or Required

Strong technical analytical and problem
solving skills , experiences on triaging and Troubleshooting Production
Issues;

Monitoring and Alerting Skills ((Splunk,
Prometheus, Grafana)

Data Reporting and Metrics Skills (SQL,Python,
Pyspark, Databricks).

--

Keywords: information technology California
Hybrid RE: Site Reliability Engineer No h1B
[email protected]
[email protected]
View all
Mon Sep 09 22:59:00 UTC 2024

To remove this job post send "job_kill 1732387" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 13

Location: Sunnyvale, California