Home

DevOps / SRE || GC-EAD OR H4-EAD || Need Local DL || Sunnyvale CA (Hybrid) at Sunnyvale, California, USA
Email: [email protected]
Role - Devops / SRE 

Location: Sunnyvale CA (Hybrid)

Visa- GC- EAD, H4-EAD

LINKEDIN MUST BE LOCAL

Top 3 Skills Needed or Required

Strong technical analytical and problem solving skills , experiences on triaging and Troubleshooting Production Issues; Monitoring and Alerting Skills ((Splunk, Prometheus, Grafana) Data Reporting and Metrics Skills (SQL,Python, Pyspark, Databricks)

This Person will be working as a Site Reliability Engineer. Primary responsibility will be Production Support work:

Production Tickets handling and Troubleshooting : Requires knowledge of: Strong Analytical and problem solving skills; Root cause analysis (RCA); Root cause corrective action (RCCA) To guide team members in RCA and RCCA to identify the origins of and prevent defects/performance gaps. Analyzes complex problems involving multiple parties, networks, hardware, software, and cloud computing technologies.

Assesses immediate restoration versus root cause based on consequences and resource requirements. Analyzes the issues and plans a series of steps to enhance an application's availability and reliability, potentially including reconfiguration, integration, removal, or the addition of application components. Analyzes trends to proactively prevent incidents and provide historical summary reports.

Disaster Recovery Planning: Requires knowledge of: Disaster recovery procedures and processes; Enterprise disaster recovery systems. To coordinate partial and full tests of contingency and disaster recovery plans. Creates and maintains data center contingency documents and action plans. Defines and documents contingency and disaster recovery procedures. Leads the identification of critical functions for assigned area of responsibility. Creates and tests plans for operating in a remote back-up environment. Coordinates the day-to-day activities of control measures used in recovery plans.

Monitoring and Alerting : Requires knowledge of: Monitoring and alerting tools (Splunk, Prometheus, Grafana); Monitoring metrics and key performance indicators (for example, availability, MTBF, MTTR); SLIs and SLOs (for example, request latency, availability, error rates, saturation); Distributed tracing; Alerting logic.

To establish metrics to monitor network, software, or system performance. Establishes SLOs/SLAs to determine availability goals of systems/services. Sets altering priorities by identifying the most important systems based on criticality. Oversees daily system monitoring, including verifying the integrity and availability of all hardware and services, reviews system and application logs, and verifies the completion of scheduled jobs.

Leads end-to-end audits of monitors and alarms based on subsystem knowledge. Provides proactive updates to executive leadership on potential customer-impacting issues. Analyzes systems and makes recommendations to prevent possible incidents using knowledge of complex and company-wide systems.

Data Reporting and Metrics

Advanced SQL skills to pull complex data report from multiple sources, familiar with Databricks or GCP Big Query, capable to write advanced "Splunk" queries to join multiple indices to stitch data, using Data-Driven decision-making process to analyze the impact of the production issues and prioritize them.

Kind Regards,

Krish Agrani| Absolute IT | IT Recruiter.

116 Village Blvd Suite 200  Princeton New Jersey  08540

Absolute
IT

--

Keywords: information technology green card California
DevOps / SRE || GC-EAD OR H4-EAD || Need Local DL || Sunnyvale CA (Hybrid)
[email protected]
[email protected]
View all
Tue Sep 10 00:10:00 UTC 2024

To remove this job post send "job_kill 1732944" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 9

Location: Sunnyvale, California