Job Details

Home

Urgent requirements for :: SRE(Site Reliability Engineer) :: Fort mill, SC(onsite) at Remote, Remote, USA

http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1699471&uid=

From:

Nirbhay singh,

Appian infotech Inc

[email protected]

Reply to: [email protected]

Hi,

We Have Urgent requirements for

SRE(Site Reliability Engineer)

Job Title:

SRE Engineer)

Location:

Fort mill, SC(Onsite)

Job Type: Long term Project

Job Description:

**Tech Skills:

.Net, SQL, React

Dynatrace, Solar winds DPA,

AWS Cloud

Splunk, Elastic Stack

Python, Scripting Languages, Ansible Tower, Terraform

Proficiency in Core SRE Principles: Expertise in essential SRE concepts such as CUJ, SLO, SLI, and Error Budgeting based on NFRs and ability to apply these principles effectively to ensure service reliability, meet business objectives, and drive continuous improvement initiatives.

Experience of Reducing TOIL: Identifying manual and repetitive tasks within the Software Development Life Cycle (SDLC) or IT operations and implementing automation solutions to reduce the TOIL. Ability to streamline processes, enhance productivity, and free up resources for more strategic initiatives through automation and process improvement.

Comprehensive CI/CD Proficiency: Strong understanding of Continuous CI/CD practices, with robust knowledge of Git, GitHub Actions and GitHub Workflows. Familiarity with other tools such as Jenkins and similar would be advantageous.

Engage in and improve the whole life cycle of application and cloud services-from inception and design, through deployment, operation, and refinement.

Design, develop, ship, and motivate the creation of software and systems to increase product reliability and organizational efficiency.

Lead development and tracking of SRE Error Budgets

Lead development of SRE dashboard.

Lead root cause investigations.

Proactively identify system anomalies

Recognize automation opportunities.

Plug into the software release cycle. Work closely with developers to ensure software releases are well designed, planned, implemented, released, and monitored.

Automate time-consuming and manual processes.

Assess current SRE solution and define the SRE approach for products.

Work with applications development teams on designing, implementing, and improving SRE practices.

Cloud Platform Expertise: Cloud platform experience with AWS, hands-on experience with key cloud services, including logging & monitoring,

Strong Knowledge on IAC: Expertise in Infrastructure as Code (IAC) and strong command on Terraform for provisioning and managing cloud infrastructure.

Proficiency in Container Orchestration: Hands-on experience in creating and managing Docker images, ensuring optimal performance and security. Proficiency in Kubernetes platform including the ability to effectively manage containerized applications, scale resources as needed, and troubleshoot issues in production environments.

Monitoring and Observability: Experience with monitoring tools such as Prometheus, Grafana, and ELK Stack and should be able to set up and configure monitoring solutions, utilize metrics for performance optimization, and troubleshoot issues effectively.

Strong understanding of cloud platforms like AWS and infrastructure automation tools.

Proven ability to design and implement monitoring solutions that ensure system uptime and performance.

Experience with AIOps principles and automation best practices.

Excellent communication, collaboration, and problem-solving skills.

Responsibilities:

Design and implement a comprehensive monitoring strategy for cloud infrastructure and applications.

Leverage industry-leading tools like Dynatrace, Splunk, and Elastic Stack for real-time monitoring and troubleshooting.

Develop and configure health probes and insightful alerts to proactively identify and address potential issues.

Champion the adoption and implementation of AIOps platform for automated incident resolution and self-healing infrastructure.

Collaborate with development teams to translate operational insights into actionable requirements for high-quality software releases.

Design and execute reliability tests to ensure system stability and production readiness.

Maintain a deep understanding of cloud platforms like AWS and utilize infrastructure automation tools like Terraform and Ansible Tower.--

Regards

Nirbhay Singh

Appian Infotech Inc

Contact No- 276 910 0146 Ext. 128

Email-
nirbhay.s

@appianinfotech.com

LinkedIn:-
https://www.linkedin.com/in/n-k-singh-430076245/

Keywords: continuous integration continuous deployment information technology South Carolina
Urgent requirements for :: SRE(Site Reliability Engineer) :: Fort mill, SC(onsite)
[email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1699471&uid=

[email protected]
View All

01:18 AM 28-Aug-24

To remove this job post send "job_kill 1699471" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 8

Location: Fort Mill, South Carolina