Lead SRE- Irving , TX at Irving, Texas, USA |
Email: [email protected] |
From: Amy Jones, BLACKAPPLE SOLUTIONS [email protected] Reply to: [email protected] Job role: Lead SRE job description Location: Irving , TX Note: Internal clarification. Dont post this job outside. What We are Looking for : Candidate should be based on Texas. We have our office near DFW Metro Area,Irving TX. Also. It would be nice to find some candidates having worked as previous financial services companies(Chase, USAA, BofA). JOB Description: In this role, you will: Lead complex technology initiatives including those that are companywide with broad impact Act as a key participant in developing standards and companywide best practices for engineering complex and large-scale technology solutions for technology engineering disciplines Design, code, test, debug, and document for projects and programs Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals Troubleshoot, and analyze production job failures related to data, network file delivery, and server and application issues independently and provide solutions to recovery. Participate in root cause analysis and preventative actions to avoid recurring incidents. Participate in the buildout of automation to prevent problem recurrence, with the goal of automating response to all non-exceptional service conditions. Apply technology background in software engineering and systems engineering to ensure the applications on-boarded to SRE are available, have full-stack observability, are integrated with CI/CD, and always-on by introducing continuous improvement through code and automation, continuous testing (performance, functional), and provide operational insight through analytics. Assess the availability of critical business flows, identify service level objectives and indicators, and conduct destructive and resiliency testing to reach 99.995% availability for the firm's critical products and services leading to improved customer experience and customer satisfaction. Develop original and/or complex code, provide coding guidance/review, and create documentation Introduce enterprise capabilities, tools, and innovation to improve availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, CI/CD integration, continuous testing (performance, functional, ), continuous improvement, and standardization/automation of key SRE metrics and IT Service Operations processes. Evolve continuous inspection capabilities code quality to identify problems before they manifest in production. Introduce and expand AIOps, and robotic process automation (RPA) to solve complex operational and systemic issues, and to improve availability of products to customers. Share support responsibilities for critical applications, to identify systemic issues, conduct blameless post mortems, root cause analysis, and introduce strategic solutions in code that solve the problem and eliminate repeat issues. Be willing to work non-standard business hours on an on-call basis in a 24x7x365 environment. Lead projects, teams, or serve as a peer mentor Required Qualifications: 5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education 5+ years experience troubleshooting and systems administration experience across multiple OS Platforms: Solaris, AIX, PKS, Kubernetes, OpenShift, Linux, Windows, VMware 3+ years experience with web platforms: Java, Apache, Tomcat, Weblogic, Oracle 2+ years experience with database technologies: Basic SQL, Cassandra DB, Oracle, Postgres SQL 2+ years experience with Observability tools: Traffic Manager, Message Processor, AppDynamics, Filebeat, Basemon, etc. 2+ years experience using logging/monitoring tools: ELK, Filebeats, Splunk, Netcool, SiteScope, Kafka Desired Qualifications: 5+ years of software development experience with languages such as Perl, Python, Java, JavaScript, Ruby, JSON, Angular, NodeJS 2+ years experience with Automation Scripting: Bash, Shell, Ansible, Terraform, Azure DevOps 1+ year of experience with Cloud technologies: PCF, Azure, AWS, GCP, etc 2+ years Incident Management System experience 2+ years experience with Agile Scrum (Daily Standup, Sprint Planning and Sprint Retrospective meetings) 2+ years experience using JIRA. 2+ years experience with Data Services platforms: Bigdata, Datalake, Hadoop, Spark. 1+ years experience with AIOPs tools: BigPanda, MoogSoft. Experience with one or more CI/CD Pipeline (Github, Jenkins) and Automation tools: Gradle, Maven, Git, Ansible, Puppet Experience with one or more Observability/Monitoring tools: Elastic, Kibana, Grafana, AppDynamics, Kafka, Big Panda, Splunk Experience with one or more Data/Data Structures: Kafka, Apache Airflow, Logstash, Spark, Oracle, SQL, Mongo, Hadoop, Cloudera, AWS EMR, S3 Knowledge of one or more additional capabilities: Uipath, Robotic Processing and Capacity Management An industry standard certification Keywords: continuous integration continuous deployment sthree database information technology Texas Lead SRE- Irving , TX [email protected] |
[email protected] View all |
Fri Oct 04 20:13:00 UTC 2024 |