Job Details

Home

SRE LEAD with Python :TX at Dallas, Texas, USA

Email: [email protected]

From:

Chandra N,

Siri Info

[email protected]

Reply to: [email protected]

Role name:LeadRole Description:Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteriaDevelop automated solutions to address potential problems before they result in a service interruptionProvide impact assessment and mitigation plan for changes going into the production environmentInvestigate root cause of severe and systemic outages, identify corrective actions and apply across the enterpriseDevelop availability measures that align with consumer experience to accurately assess the usability of crucial servicesBuild capacity models to baseline transactional load compared to resource performance and leverage data to predict overall system capacity while automating load placement to avoid outagesIdentify thresholds for all critical links in the data path to quickly isolate where imbalances may result in potential outagesAnalyze failure points in services to model risk level and resolution steps if failure occurs.Assist in driving architecture enhancements into system to mitigate potential failure points.Programmatically monitor for and remediate configuration drift of critical devicesDevelop response plans to potential failure points and evaluate effectiveness during planned testsPerform comprehensive operational health checks of the entire services to identify areas of concern and track activities to drive improvements at all levels of the architectureProvide technical coaching and direction to more junior teammatesCompetencies:Digital : Python, Digital : Node.js, Digital : DevOps, Core Java, Unix / Linux Basics and CommandsExperience (Years):10 & AboveEssential Skills:Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteriaDevelop automated solutions to address potential problems before they result in a service interruptionProvide impact assessment and mitigation plan for changes going into the production environmentInvestigate root cause of severe and systemic outages, identify corrective actions and apply across the enterpriseDevelop availability measures that align with consumer experience to accurately assess the usability of crucial servicesBuild capacity models to baseline transactional load compared to resource performance and leverage data to predict overall system capacity while automating load placement to avoid outagesIdentify thresholds for all critical links in the data path to quickly isolate where imbalances may result in potential outagesAnalyze failure points in services to model risk level and resolution steps if failure occurs.Assist in driving architecture enhancements into system to mitigate potential failure points.Programmatically monitor for and remediate configuration drift of critical devicesDevelop response plans to potential failure points and evaluate effectiveness during planned testsPerform comprehensive operational health checks of the entire services to identify areas of concern and track activities to drive improvements at all levels of the architectureProvide technical coaching and direction to more junior teammatesDesirable Skills:Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteriaDevelop automated solutions to address potential problems before they result in a service interruptionProvide impact assessment and mitigation plan for changes going into the production environmentInvestigate root cause of severe and systemic outages, identify corrective actions and apply across the enterpriseDevelop availability measures that align with consumer experience to accurately assess the usability of crucial servicesBuild capacity models to baseline transactional load compared to resource performance and leverage data to predict overall system capacity while automating load placement to avoid outagesIdentify thresholds for all critical links in the data path to quickly isolate where imbalances may result in potential outagesAnalyze failure points in services to model risk level and resolution steps if failure occurs.Assist in driving architecture enhancements into system to mitigate potential failure points.Programmatically monitor for and remediate configuration drift of critical devicesDevelop response plans to potential failure points and evaluate effectiveness during planned testsPerform comprehensive operational health checks of the entire services to identify areas of concern and track activities to drive improvements at all levels of the architectureProvide technical coaching and direction to more junior teammatesCountry:United StatesBranch | City | Location:TCS - Dallas, TX

Richardson

Richardson,TX

Keywords: javascript Texas
SRE LEAD with Python :TX
[email protected]

[email protected]
View all

Thu Aug 01 23:37:00 UTC 2024

To remove this job post send "job_kill 1620488" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

chandra.n@siriinfo.com wrote:
From:

Chandra N,

Siri Info

chandra.n@siriinfo.com

Reply to:   chandra.n@siriinfo.com

Role name:LeadRole Description:Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteriaDevelop automated solutions to address potential problems before they result in a service interruptionProvide impact assessment and mitigation plan for changes going into the production environmentInvestigate root cause of severe and systemic outages, identify corrective actions and apply across the enterpriseDevelop availability measures that align with consumer experience to accurately assess the usability of crucial servicesBuild capacity models to baseline transactional load compared to resource performance and leverage data to predict overall system capacity while automating load placement to avoid outagesIdentify thresholds for all critical links in the data path to quickly isolate where imbalances may result in potential outagesAnalyze failure points in services to model risk level and resolution steps if failure occurs.Assist in driving architecture enhancements into system to mitigate potential failure points.Programmatically monitor for and remediate configuration drift of critical devicesDevelop response plans to potential failure points and evaluate effectiveness during planned testsPerform comprehensive operational health checks of the entire services to identify areas of concern and track activities to drive improvements at all levels of the architectureProvide technical coaching and direction to more junior teammatesCompetencies:Digital : Python, Digital : Node.js, Digital : DevOps, Core Java, Unix / Linux Basics and CommandsExperience (Years):10 & AboveEssential Skills:Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteriaDevelop automated solutions to address potential problems before they result in a service interruptionProvide impact assessment and mitigation plan for changes going into the production environmentInvestigate root cause of severe and systemic outages, identify corrective actions and apply across the enterpriseDevelop availability measures that align with consumer experience to accurately assess the usability of crucial servicesBuild capacity models to baseline transactional load compared to resource performance and leverage data to predict overall system capacity while automating load placement to avoid outagesIdentify thresholds for all critical links in the data path to quickly isolate where imbalances may result in potential outagesAnalyze failure points in services to model risk level and resolution steps if failure occurs.Assist in driving architecture enhancements into system to mitigate potential failure points.Programmatically monitor for and remediate configuration drift of critical devicesDevelop response plans to potential failure points and evaluate effectiveness during planned testsPerform comprehensive operational health checks of the entire services to identify areas of concern and track activities to drive improvements at all levels of the architectureProvide technical coaching and direction to more junior teammatesDesirable Skills:Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteriaDevelop automated solutions to address potential problems before they result in a service interruptionProvide impact assessment and mitigation plan for changes going into the production environmentInvestigate root cause of severe and systemic outages, identify corrective actions and apply across the enterpriseDevelop availability measures that align with consumer experience to accurately assess the usability of crucial servicesBuild capacity models to baseline transactional load compared to resource performance and leverage data to predict overall system capacity while automating load placement to avoid outagesIdentify thresholds for all critical links in the data path to quickly isolate where imbalances may result in potential outagesAnalyze failure points in services to model risk level and resolution steps if failure occurs.Assist in driving architecture enhancements into system to mitigate potential failure points.Programmatically monitor for and remediate configuration drift of critical devicesDevelop response plans to potential failure points and evaluate effectiveness during planned testsPerform comprehensive operational health checks of the entire services to identify areas of concern and track activities to drive improvements at all levels of the architectureProvide technical coaching and direction to more junior teammatesCountry:United StatesBranch | City | Location:TCS - Dallas, TX

Richardson

Richardson,TX

Keywords: javascript Texas 
SRE LEAD with Python :TX
chandra.n@siriinfo.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 44

Location: Dallas, Texas