Lead Java Developer at Remote, Remote, USA |
Email: [email protected] |
Lead Java Developer Location : O Fallon MO Day one Onsite USC, H4, L2 Job Summary: L6 - B3 Lead Full stack engineer We are looking for a Software Engineer for Service Reliability Engineering and automation with a strong focus on automation. The ideal candidate will have experience in automating complex infrastructures, optimizing CI/CD pipelines, and incorporating AI and machine learning models to enhance service reliability, incident response, and infrastructure management. You will work alongside development, operations, and AI teams to build resilient, scalable, and automated solutions. Key Responsibilities: Automate Infrastructure & Operations : Develop and implement automation strategies to manage large-scale infrastructure (e.g., provisioning, configuration management, patch management). Build and maintain Infrastructure-as-Code (IaC) solutions. AI-Driven Monitoring & Incident Response : Integrate AI and machine learning models into monitoring systems to predict potential failures and optimize response times. Use AI tools and techniques to improve anomaly detection, system health predictions, and proactive incident resolution. CI/CD Pipeline Management : Automate the CI/CD processes using tools such as Jenkins, Bitbucket Pipelines, GitLab CI, or similar. Incorporate AI/ML into CI/CD workflows for optimizing build/test times and enhancing code quality predictions. Collaborate with the development team to enhance and optimize deployment pipelines. AI-Powered Optimization : Utilize AI to perform predictive scaling, system optimization, and capacity planning. Implement self-healing capabilities through AI-based predictive analysis and automation tools. Monitoring & Alerting Automation : Automate monitoring and alerting solutions to detect anomalies, failures, and capacity issues early. Implement observability tools like Prometheus, Grafana, and Dynatrace for efficient system monitoring. Reliability & Scalability : Design and build self-healing, scalable systems that reduce manual intervention. Perform capacity planning and optimize system performance through automation. Incident Management & Response : Build automated runbooks and workflows to address incidents quickly. Set up automated playbooks for incident detection, troubleshooting, and remediation. Security & Compliance Automation : Implement automated security checks and audits within the CI/CD pipeline. Automate compliance reports, vulnerability scans, and patches. Required Skills & Qualifications: Technical Expertise : Hands-on experience with on-premise machines and cloud platforms like PCF, AWS, Azure. Proficiency in programming languages such as Java, Python, Bash for scripting automation tasks. Strong knowledge of CI/CD tools (e.g., Jenkins, Bitbucket, GitLab, etc.) and version control systems. Ability to integrate machine learning models into infrastructure for automation and predictive monitoring. Infrastructure Automation : Expertise in containerization and orchestration tools (e.g., Docker, Kubernetes). Monitoring & Observability : Familiarity with monitoring tools like Prometheus, Grafana, Dynatrace, Splunk and alerting frameworks. Reliability Engineering : Experience with building and automating scalable, reliable, and self-healing systems. Strong troubleshooting skills. F5 Knowledge : (Good to have and not a mandatory requirement) Understanding with F5 BIG-IP, including LTM (Local Traffic Manager), GTM (Global Traffic Manager), and iRules scripting. Understanding of load balancing strategies, SSL termination, and traffic management for high availability systems. Collaboration & Communication : Excellent communication and collaboration skills to work cross-functionally with development, operations, and QA teams. Preferred Qualifications: Familiarity with Agile and DevOps practices. Experience with automation in large-scale distributed systems. Experience working with both microservices and monolith architecture. Familiar with AI/ML-driven infrastructure optimization Soft Skills: Problem-solving mindset and analytical thinking. Ability to thrive in a fast-paced and high-pressure environment. Team player with excellent collaboration skills. -- Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning information technology ffive Missouri Lead Java Developer [email protected] |
[email protected] View all |
Mon Oct 07 21:51:00 UTC 2024 |