Site Reliability Engineer (SRE), Southfield, MI at Southfield, Michigan, USA |
Email: [email protected] |
From: Ron, Involgix [email protected] Reply to: [email protected] Site Reliability Engineer (SRE) Southfield, MI Job Description: We count on our site reliability engineer (SRE) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand customer deployments, were seeking an experienced SRE to deliver insights from massive-scale data in real time. Specifically, were searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction. Objectives of this role: Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions. Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement. Provide primary operational support and engineering for multiple large-scale distributed software applications. Responsibilities: At day-to-day level, SREs will be focused on Automation, Monitoring, Incident Resolution and Culture. A love of SRE, open-source, self-service tools, and micro-services. Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. Partner with development teams to improve services through rigorous testing and release procedures. Participate in system design consulting, platform management, and capacity planning. Create sustainable systems and services through automation and uplifts. Balance feature development speed and reliability with well-defined service-level objectives. After incidents, document actions to create automated solutions during incident response. Monitor infrastructure using SRE tools and suggest tools as necessary. Build monitoring alerts and incident response processes. Improve operational processes and team practices. Coding infrastructure automation across the CI/CD pipeline. As the solution scales, ensure reliability through designing, building, and maintaining the core infrastructure. Demonstrate strong programming skills and thorough knowledge of systems. Bring about cultural shifts to provide a foundation for process changes. Experience with AWS multi-region/multi-AZ deployed systems, auto scaling of EC2 instances, CloudFormation, ELBs, VPCs, CloudWatch, SNS, SQS, S3, Route53, RDS, IAM roles, security groups, blue/green deployments, and A/B testing. Required skills and Qualifications: Bachelors degree (or equivalent) in computer science or related discipline Comfortable with large scale production systems and technologies, for example load balancing, monitoring, distributed systems, and configuration management Strong coding skills in at least one programming language, and a desire to pick up more. Familiarity with and enthusiasm for software engineering best practices such as testing, continuous integration and continuous delivery. Exposure with cloud and Amazon Web Services (AWS) and APIs The ability to thrive in a rapidly evolving, globally distributed environment. Strong Security mindset. Proactive approach to identifying problems, performance bottlenecks, and areas for improvement. Solid understanding of fundamental technologies like TCP/IP, HTTP. Strong working knowledge of Linux systems and applications. Experience with automation tooling such as Chef, Docker, AWS. Experience with JavaScript Frameworks, Angular JS/ReactJS/NodeJS and with cloud automation/orchestration technologies. Ability and willingness to collaborate. Strong problem-solving skills and ability to think under pressure. Strong analytical skills and management skills. Communication and documentation skills. Preferred skills and qualifications Previous success in technical engineering Coding experience beyond simple scripts Keywords: continuous integration continuous deployment javascript sthree Arizona Michigan |
[email protected] View all |
Mon Sep 25 23:44:00 UTC 2023 |