Senior SRE Engineer Lead - Fort Mill, SC (Onsite) Contract. at Remote, Remote, USA |
Email: [email protected] |
From: Raj, Softcom Systems Inc [email protected] Reply to: [email protected] Job Description: Responsibilities Design and architect systems that are highly available, scalable, and reliable through collaboration with cross-functional teams. Lead incident response efforts during system outages or performance degradations, coordinating with various teams to quickly diagnose issues and implement solutions. Develop and refine incident management processes. Provide mentorship and guidance to help develop technical skills and expertise within the team and stakeholders across the organization. Share best practices, provide constructive feedback, and foster a culture of continuous learning and improvement. Drive automation initiatives to streamline deployment, configuration, monitoring, and maintenance processes. Develop automation tools and frameworks to increase operational efficiency, reduce manual intervention, and improve reliability. Awareness and understanding of industry trends, emerging technologies, and best practices in site reliability engineering. Evaluate new tools, technologies, and methodologies to enhance system reliability, scalability, and security, and implement them as appropriate within the organization Skills Bachelors degree (or equivalent) in computer science or related discipline Strong understanding of system architecture principles, including designing scalable, fault-tolerant, and highly available systems.. Advanced experience with containerization technologies such as Docker and container orchestration tools like Kubernetes to manage and scale containerized applications. Expertise in automation tools and Infrastructure as Code (IaC) to automate deployment, configuration, and management of infrastructure resources using tools like Terraform, Ansible, or Puppet. Expertise in implementing monitoring and alerting platforms using tools like Prometheus, Grafana, or ELK stack. Expertise in facilitating the adoption of observability platforms for logging, metrics, and Application performance monitors. Strong scripting and programming skills in languages such as Python, Go, Ruby Powershell or Shell scripting. Knowledge of database technologies and experience monitoring and alerting on issues is highly sought. Demonstrated ability to respond promptly to incidents, coordinate with cross-functional teams, and lead incident response efforts to resolve issues quickly and minimize downtime. Strong communication and collaboration skills to work closely with all stakeholders. Demonstrated ability to communicate technical concepts clearly. Proficient with establishing SLOs, identifying and creating SLIs and Error Budgets Experience Proficient with APM s like Dynatrace, NewRelic, Datadog, AppDynamic Build and manage Metric tools like Prometheus, Grafana, or Datadog Proficient with deploying, managing, configuration, and adoption of ServiceNow and ServiceNow modules Proficient building, maintaining and enhancing delivery pipelines with CI/CD tools like Jenkins, GitLab CI/CD, CircleCI, Travis CI, GitHub Actions Keywords: continuous integration continuous deployment golang Senior SRE Engineer Lead - Fort Mill, SC (Onsite) Contract. [email protected] |
[email protected] View all |
Thu Aug 22 23:35:00 UTC 2024 |