Mahidhar - DevOps/SRE |
[email protected] |
Location: Austin, Texas, USA |
Relocation: Open |
Visa: H1B |
SUMMARY:
Around 10 years of experience in the IT industry, and supported DevOps platforms in Windows and Linux environments. Worked on Public, Private and Hybrid cloud environments to meet the client requirements. Hands on experience with AWS services like AWS Lambda, AWS RDS, AWS CloudFront, AWS WAF, AWS SAM, AWS CloudFormation, AWS DevOps pipelines, AWS EKS, AWS ECS, AWS EC2, AWS S3, AWS CloudWatch. Expertise in Azure which includes services like VM, VNET, Azure Monitor, Microsoft Defender for Cloud, ARM, Azure Active Directory, Management Groups and Storage accounts. Experience in Google Cloud components, Google container builders and GCP client libraries and cloud SDK s. Proficient with container systems like Docker and container orchestration like ECS, Kubernetes, worked with Terraform. Implemented a production ready, load balanced, highly available, fault tolerant Kubernetes infrastructure. Designed and developed Cloud Service projects and supported IaaS, PaaS, and SaaS based cloud products in AWS, Azure and GCP. Proficient in essential DevOps tools including Terraform, Docker, Kubernetes, Vagrant, Git, GitHub, SVN, Ant, Maven, Jenkins, JUnit, Selenium, Bamboo, Hudson, Chef, Ansible, Puppet, and Nagios. Experienced with principles and best practices of (SCM) in Agile, Scrum and Waterfall methodologies. Experience working with Azure services, AKS, Azure databases, Azure Security and Azure Networking. Worked on multi cloud providers AWS, Azure and GCP and automated CI/CD devops pipelines in AWS, AZURE and GCP. Secured Data is stored in the HashiCorp Vault. Secures, stores, and tightly controls access tokens and passwords used by the overall platform, started in the AWS cloud, and currently integrates with several services like: AWS IAM, Amazon DynamoDB, Amazon SNS. Implemented Elastic Search, Logstash, and Kibana (ELK) for log analytics, full-text search, and application monitoring, seamlessly integrated with AWS Lambda and CloudWatch. Integrated Kubernetes with Hashicorp Vault to inject configurations at runtime for each service using init, config sidecars and persistent volume sharing between app and config containers. Onboarded, supported multiple apps into enterprise monitoring platforms like New Relic, Splunk, DataDog, PagerDuty, ELK. Implemented system design architecture in multi cloud providers to meet the SLI, SLO, SLA for reliable systems. Worked in an incident management process to reduce MTTD, MTTM and MTTR using AIOps capabilities to resolve incidents faster. Automated the process of installation, configuration of WebSphere/WebLogic/Apache Tomcat/JBOSS using Ansible. Expertise in automating builds/deployment process using Python and Shell scripts with focus on DevOps tools AWS, Azure and GCP. Managed and Monitored Kubernetes clusters using Prometheus as a data aggregator and Grafana as a data visualization platform. Worked on regular security audits and worked closely with the security team to remediate vulnerabilities and ensure compliance with industry standards. Conducted regular disaster recovery drills to validate the effectiveness of recovery plans in AWS and Azure. Experienced in using bug tracking systems like JIRA, Remedy, IBM ClearQuest. Knowledge of Oracle Cloud Infrastructure Computer services, OCI database, OCI storage services, OCI security and IAM. Technical skills: Cloud Stack Amazon Web Services, Microsoft Azure, Google Cloud Containers Kubernetes, Docker, OpenShift DevOps Tools Chef, Ansible, Terraform, Kibana, Prometheus, Grafana, ELK, Vault, Consul, Data Dog, New Relic DevSecOps Tools Nexus RM, SonarQube, Snyk, Chef Inspec Operating Systems Linux (CentOS, Ubuntu), Microsoft Windows Server Other Technologies JIRA, Splunk, WebSphere, Nginx, NewRelic, Nagios, PagerDuty, JFrog, SonarQube Programming Languages Python, Java, Ruby CI, Test & Build system Ant, Maven, Jenkins, Git, SVN, Gradle Database & Servers MS SQL Server, Oracle, MySQL, Apache Tomcat Professional Experience: T-Mobile - Bellevue, WA Aug 2021 Present Sr. SRE/DevOps Responsibilities: Responsible for managing T-Mobile Customer support application's overall Infrastructure and Operations. Developed Reliability Engineering tools to enhance the uptime of customer-facing applications. Created SRE tools, including a Service Health Dashboard, SLO metrics, and SLI metrics converter, using Python, Ruby, and Go. Utilized SLO and Service health development tools to onboard and support many application teams to meet the reliability standards. Rollout Build a SLO worksheet model to developers for better SLI event breakdown, range and frequency of events and error budget. Developed Helm charts to deploy Kubernetes Services for various Java and python-based application releases. Implemented SRE best practices and supported those across different services in T-Mobile. Engineered Kubernetes CRUD operations and Operators to manage observability requirements for applications deployed on Kubernetes. Onboarded several apps from T-Mobile Data Centers to AWS Cloud, helped developers with Infra, data CICD pipelines creation. Developed AWS Cloud Formation templates to create custom sized VPC, subnets, EC2 instances, Security groups and implemented CloudWatch alarms and CloudTrail logging within CloudFormation templates to provide comprehensive infrastructure monitoring and auditing capabilities. Engineered Kubernetes CRUD operations and Operators to manage observability requirements for applications deployed on Kubernetes. Managed a single monitoring system that consolidates various metrics and alerts. This system used tools like DataDog and NewRelic, which are services that provide real-time monitoring and analytics. Automated the setup of infrastructure on cloud platforms such as AWS and GCP using Terraform, an infrastructure as code tool, making the process reproducible and efficient by creating reusable modules for common tasks like setting up monitoring and database services. Participated in a Day and Night On-Call rotation and contributed to postmortems and runbooks shared across the organization. Designed a DevSecOps process architecture and developed a production readiness roadmap to minimize toil during software releases. Collaborated with developers in creating automated pipelines for build, test and release, code fixes, monitoring. The previous application was using the client-side template which was hard to manage everything so rewrote the application to use MVC3 server-side template and made sure the application works faster and loads faster. Experience in writing the HTTP RESTful Web services and SOAP API's in Golang. Maintained and contributed to Ansible playbooks, AWX tower for application deployments, OS Patching etc., Certificate management. Setup monitoring and Incident management using APM, infra monitoring and Alerting using AppDynamics, Splunk and PagerDuty. Contributed to CI/CD pipeline automation using DevSecOps best practices and resolving failures during deployments. Maintaining an in-house ticketing system using a Python/Django backend with a Django REST Framework based API, using Angular.js for the web frontend. Supported and administered T-Mobile s perimeter security platform (Shield) with WAF, access proxy, and load balancing. Web application development using Python 3.5, Django 1.9, Flask, MongoDB, JavaScript, AJAX, HTML, XML and template languages. Automated manual repeated tasks like OS patching, SSL, and Client certificate rotation with Ansible and Python. SRE Runbooks creation and rollout for new features support and incident management. PayPal - Austin, TX Sep 2020 Jul 2021 Sr. SRE/DevOps Responsibilities: Developing One Pipeline (CI/CD) to automate and enforce all the teams and applications using CI/CD concepts to reduce manual efforts. Onboarding all the customers in CapitalOne to Enterprise One Monitoring platform and troubleshooting all the applications with issues. Working with Docker, Kubernetes, Terraform, Jenkins, NewRelic tools to develop infrastructure monitoring and deployment pipelines. Automated containerized applications deployment process using AWS ECS, ECR, EKS and AWS DevOps Processes. Designed and implemented Azure Resource Manager (ARM) templates for Infrastructure as Code, automating the provisioning of virtual networks, VMs, and storage resources. Optimized Azure infrastructure for cost efficiency by implementing auto-scaling and right-sizing strategies. Wrote UnitTests and Integration test for the MVC controllers and views to have good code coverage. Implemented SRE principles to bring stability, reliability to the systems, automated to eliminate the manual toil. Network Infrastructure automation, Service Mesh and Service Discovery for cloud and Kubernetes using Hashicorp Consul. Accomplished instant Anomaly Detection, Correlated Alerts and Events, Automatic Root Cause Analysis, and Incident Management. Experienced in developing Infrastructure using Cloud Formation Templates in AWS and Terraform. Automated Cloud Security management across different cloud providers using HashiCorp Vault. Developed and maintained an advanced platform to manage Kubernetes clusters lifecycle. Involved in L2 Production incidents and joined on calls to resolve the incidents and supported the applications. Performed Network automation across the cloud providers using Service Mesh, Service Discovery from HashiCorp Consul. Architected and designed highly available, fault tolerant and cost-effective systems using ASGs, ELB, DR in multi cloud providers. Texas Department of transportation (TxDOT) - Austin, TX Apr 2018 Aug 2020 Cloud Engineer Responsibilities: Responsible for implementing AWS solutions and setting up the cloud infrastructure with different services, like EC2, S3, VPC, ELB, AMI, EBS, RDS, Auto Scaling, Route53, Subnets, NACL s, CloudFront, Cloud Formation, Cloud Watch, Cloud Trail, SQS and SNS. Implemented Cloud Infrastructure as a Service (IaaS) Automation across AWS Public Cloud using Terraform to Provision Infrastructure across AWS Workloads. Worked at optimizing volumes and EC2 instances and created multiple VPC instances and experience in creating alarms and notifications for EC2 instances using Cloud Watch. Monitoring and alerting of production and corporate servers such as EC2 Metrics and storage such as S3 buckets using AWS CloudWatch. Configured Identity Access Management (IAM) Groups and users for improved login authentication using ADFS. Designed and implemented AWS Cloud Formation templates to automate the provisioning of AWS resources, ensuring consistency, and compliance across multiple environments, to reduce infrastructure setup time. Worked on collecting inventory using AWS SSM and configured automated tasks like Configure AWS Packages and Run PowerShell Script SSM Document tasks for maintenance. Involved with Docker and Kubernetes on multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploy either on public or private cloud. Administered and Engineered Jenkins for managing weekly Build, Test and Deploy chain as a CI/CD process, SVN/GIT with Dev/Test/Prod Branching Model for weekly releases. Participated in an on-call rotation to respond to critical incidents, ensuring rapid resolution and minimizing service disruptions. Installed, configured, and maintained web servers like HTTP Web Server, Apache Web Server, and WebSphere Application Server on RedHat Linux. Assisted developers with cloud resource provisioning, configuration, and troubleshooting, ensuring a smooth deployment process. Worked on Active Directory tasks to create OU s and push servers to domain and its related tasks to maintain server authentication. Anthen Inc - Norfolk, VA May 2016 Mar 2018 Infra/DevSecOps Engineer Responsibilities: Research and development of new technologies based on client project requirements to implement DevSecOps concepts in CI/CD. Achieved CMS DevSecOps Governance, Guidance, and best practices for DevSecOps teams operating in CMS Cloud Environments. Provide advanced engineering support to production support teams for complex application performance and infrastructure issues. Leverage pre-built infrastructure provisioning, application deployment and solution configuration processes. Security/Compliance (including Static Code Analysis, Open-source code analysis, DISA STIG, CMS Acceptable Risk Safeguards). DevSecOps Server Hardening, automated vulnerabilities using Jenkins, Ansible, ChefInspec, SonarQube, Nessus, Nexus RM, Heimdall. Used Chef Inspec for automated testing of infrastructure and applications as code. Worked with Heimdall web-based visualization server for viewing InSpec results, evaluations, and profiles. Automate builds, speed up releases, and capture success metrics across DevSecOps pipelines using Nexus repository manager. Ensure recovery of systems from infrastructure or service failures to mitigate disruptions. Worked in fully automated cloud environments in GCP, AWS and Azure multi cloud environments. Assisting with the software integration, including turning software builds into RPM packages. Troubleshooting, end user app problems. Global Data Mart Systems (India) Pvt. Ltd - Hyderabad, India June 2013 July 2015 Systems Engineer Responsibilities: Automated the process of installation, configuration of the web application servers like WebSphere/Apache Tomcat using Ansible. Supported Linux/ Unix OS and components. Diagnosed and resolved problems associated with DNS, DHCP, VPN, NFS and Apache. Configured and performed administration on standard UNIX services like SSH, LDAP, SSL, NFS, Sudo and FTP. Experience in developing Splunk queries and dashboards targeted at understanding application performance and capacity analysis. Provided Tier 3, Tier 2 key transactions metrics support to the customers by joining on call and resolving issues. no Participated in migration from a Windows Server 2003 domain to a Windows Server 2008 domain. Participates in a 24/7 on-call rotation, providing level 3 escalations and project work. Added users to the domain using Active Directory in a Windows Server 2003 and 2008 environment. Certification: AWS Certified Solutions Architect - Associate Education: Masters in Information systems Security Keywords: continuous integration continuous deployment javascript sthree information technology golang microsoft Texas Virginia Washington |