Home

Hiring!!! - Site Reliability Engineer (SRE) - Houston, TX (Onsite) at Houston, Texas, USA
Email: [email protected]
Hi ,

Hope youre doing well.

Please look at the below position and share me your updated resume to expedite
the process.

Role: Site Reliability Engineer (SRE)

Location: Houston, TX (Onsite)

Duration: 12+ Months

Job Description

Site Reliability Engineer (SRE) to become a part of
our growing Digital IT team focused on the Integrated Production Surveillance
& Optimization (IPS&O) function. The SRE will support the reliability
of Digital IT/OT critical applications.  This transformative role involves
automating IT infrastructure tasks and driving SRE best practices, tools, and
processes.  The ideal candidate should exhibit a growth mindset and
proactively monitor and respond to incidents for optimal user experience.

The candidate must have senior level experience
deploying and supporting applications in OpenShift/Kubernetes container
platforms.

They are looking for a Kubernetes DevOps Expert that
also has OPC (Red Hat's Openshift Container Platform) expert knowledge. Lets
find experts that are Certified in either Kubernetes or OPC or both.

The successful candidate will possess a strong
developer background as well as interpersonal skills needed to communicate
design requirements and objectives while providing thought leadership to peers
and leadership.

Candidates should be self-motivated and collaborative
IT professionals with a strong background in software development, systems
administration and IT automation.

Responsibilities:

Maintaining survivability and reliability of IT/OT
critical resources.

Write and build CI/CD pipelines and build/release
processes for IT/OT workflow applications.

Provide mentoring to the IT/OT Devops team in the best
practices associated with CI/CD deployments using ADO, and GIT.

Perform periodic load and scalability testing to
establish baselines, drift, and capacity planning.

Conduct weekly operational state reviews covering
performance trends, anomalies, errors, and other availability events with SREs,
product owners, and development teams.

Participate in quarterly business and operational
reviews aligning on roadmaps, development velocity, efficiency, growth trends,
etc.

Plan and execute periodic Disaster Recovery exercises
including both tabletop and simulated failures (fault injection).

Required Qualifications

Candidates must have a bachelors degree and 8 years
of IT experience.

Senior level experience with OCP and Kubernetes.

Familiarity with continuous integration/deployment
processes and tools such as IDEs (Eclipse), Source Code management.
(GIT/Stash), ADO Pipelines, Maven, Nexus artifacts, etc.

Strong understanding of SRE practices: incident
response, change/release management, capacity planning, infrastructure
automation, elastic environments, chaos engineering and blameless postmortems.

Expertise in application performance monitoring,
observability, and proactive alert correlation, including monitoring containers
and failure-based alerting.

Scripting experience such as Python and Bash

Experienced in deploying applications in OCP in both
public and private cloud.

Excellent written and oral communications skills

Demonstrated ability to communicate to nontechnical
audience on technical issues.

Demonstrated ability to communicate on a technical
level to a technical audience.

Strong interpersonal skills, adaptable and able to
learn quickly.

Requires limited supervision and have excellent time
management skills.

Self-motivated and self-starter.

Ability to work and interact with others in a
structured/team environment.

Experience with at least one technology in each of the
tech stack categories below:

Monitoring and Logging Tools(s): AppDynamics, Splunk,
ELK Stack, DataDog, Prometheus, AWS CloudWatch/X-Ray, Grafana

Programming: C# .NET, PowerShell, Python, YAML

Containers: Docker, Helm Chart

OS: Linux RHEL, Ubuntu, CentOS

Code Repos: Azure Repos, GitHub

Infrastructure as code: Terraform, Ansible

Automation Tools: Jenkins, Chef, Puppet

Agile: JIRA, SAFe

Desired Qualifications

Experience in cloud/virtual technologies and
management VMware, AWS, Azure, etc.

Knowledge, skills and abilities to support web server
technologies Apache, Nginx, IIS.

Knowledge, skills and abilities to automate the
creation of Platform as a Services (PaaS) infrastructure using industry
standard tools such as Ansible and Chef.

Familiarity with Industrial Control System (ICS)
security architecture Purdue model.

--

Keywords: csharp continuous integration continuous deployment information technology Texas
[email protected]
View all
Wed Oct 04 18:39:00 UTC 2023

To remove this job post send "job_kill 712455" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 9

Location: Houston, Texas