Nikhil Talusani - AWS Cloud Engineer |
[email protected] |
Location: Plano, Texas, USA |
Relocation: Yes |
Visa: H1B |
Nikhil Talusani
AWS Cloud Engineer +1(469)898-4464 [email protected] Plano, TX Yes H1B Professional Summary: 9 years of IT industry experience as a Data Engineer in various domains, including Retail/Pharmaceutical, Logistics, and Healthcare. Extensive experience in designing, architecting, and implementing scalable cloud-based web applications using AWS. Proficient in AWS Cloud Platform, Glue Catalog utilizing services such as EC2, S3, VPC, ELB, DynamoDB, Redshift, CloudWatch, and CloudFormation. Created and maintained reporting and analytics infrastructure for internal business clients using AWS services like Athena, Redshift, Redshift Spectrum, EMR, and QuickSight. Hands-on experience with AWS services like Elastic Map Reduce (EMR), Redshift, and EC2 for data processing. Utilized Amazon Lambda to develop APIs for server management and code execution in AWS. Expertise in Amazon EMR, Spark, Kinesis, S3, Boto3, Beanstalk, ECS, CloudWatch, Lambda, ELB, VPC, Elastic Cache, DynamoDB, Redshift, RDS, Athena, Zeppelin, and Airflow. Built advanced data analytics solutions using Python libraries like Pandas and NumPy. Proficient in Spark for optimizing and improving performance of existing algorithms in Hadoop, utilizing Spark Context, Spark-SQL, DataFrames, Pair RDDs, and Spark YARN. Extensive experience with Avro and Parquet files, converting data and parsing semi-structured JSON data to Parquet using DataFrames in PySpark. Skilled in developing SQL for relational databases like Oracle and SQL Server, supporting data warehousing and data integration solutions. Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala. Expertise in Glue Catalog, Data Extraction, Transformation, and Loading (ETL) using tools such as SQL Server Integration Services (SSIS), Informatica PowerCenter, and Data Transformation Services (DTS), including dynamic package creation for incremental loads. Experience Glue Catalog in deploying SSIS packages from development to production servers using package configurations. Involved in software development, data warehousing, analytics, and data engineering projects using Hadoop, MapReduce, Hive, and other open-source tools. Experience in setting up Puppet master, Puppet agents for managing enterprise policies, configurations. Experience in Configuring and Integrating the Servers with different environments to automatically provision and create new machines using Configuration management/provisioning tools like Ansible. Integrated Jenkins and Ansible. Designed and developed Jenkins Build deployments. Experience in CI (Continuous Integration) and CD (Continuous Deployment) methodologies using Jenkins. Created pipelines, data flows, and complex transformations using EMR and PySpark with S3 buckets in AWS. Designed and built star schema dimensions and cubes using SQL Server Analysis Services. Provided production support for applications, troubleshooting issues, proposing solutions, and developing tests and fixes. Created Hive, SQL, and HBase tables to load structured, semi-structured, and unstructured data from various sources.Experienced in Designing, architecting, and implementing scalable cloud-based web applications using AWS. Technical Skills: Big Data Ecosystem HDFS, Yarn, MapReduce, Spark, Kafka, Hive, Airflow, Impala, Sqoop, HBase, Flume, Oozie, Zookeeper Hadoop Distributions Cloudera, Hortonworks, Apache Cloud Environments AWS, Glue Catalog, EMR, EC2, S3, AWS Redshift, Athena, Lambda Operating Systems Linux, Windows Languages Python, SQL, Scala, Java Databases Oracle 12c/11g/10g, SQL Server, MySQL, DynamoDB, MS Access and Teradata. ETL Tools Informatica Power Center, SSIS, Matillion Report & Development Tools: Eclipse, IntelliJ Idea, Visual Studio Code, SSRS Jupyter Notebook, Tableau, Power BI. Development/Build Tools Maven, Gradle, Docker, Puppet, Jenkins, Kubernetes Repositories: GitHub, SVN Scripting Languages bash/Shell scripting, Linux/Unix Methodology Agile, Waterfall Professional Experience: COX Communications, Remote November 2021 to Present Senior Data Engineer Description: Cox Communications, Inc. (also known as Cox Cable and formerly Cox Broadcasting Corporation, Dimension Cable Services and Times-Mirror Cable) is an American digital cable television provider, telecommunications and home automation services. It is the third-largest cable television provider in the United States,[3] serving approximately 6.5 million customers, including 2.9 million digital cable subscribers, 3.5 million Internet subscribers,[4] and almost 3.2 million digital telephone subscribers, making it the seventh-largest telephone carrier in the country. Responsibilities: Designed and implemented data pipelines to collect, process, and analyze large-scale data sets from various sources (Teradata, Oracle, SQL, DB2, Structured Files, Json Files). Developed deep understanding of data sources, implemented data standards, maintained data quality, and managed master data. Extracted customer data from diverse sources (Excel, databases, log data) and created a data lake in HDFS. Pre-processed raw data, populated staging tables, and ingested refined data into the data lake using Python, Shell scripts, Hive, and Spark. Utilized Informatica Power Center ETL tool to design, develop, and deploy ETL workflows for various sources including AWS S3, Redshift, Snowflake, Oracle, and SQL Server. Managed and maintained end-to-end ETL pipelines from AWS S3 to DynamoDB. Experience with AWS services, Glue Catalog including EC2, ELB, IAM, VPC, Cloud Formation, Security Groups, and Auto Scaling. Implemented Ansible playbooks for EC2 instances, creating AMIs, snapshots, and EBS volumes. Created Lambda deployment function to trigger Auto Scaling Group (ASG) and route traffic to servers. Set up CloudWatch alarms for instance monitoring and used them in Auto Scaling. Troubleshot server alerts through CloudWatch and CloudTrail. Developed Ansible playbooks with AWS modules and deployed them in Jenkins for automated builds. Built Continuous Integration Environment (Jenkins, Sonar, and Nexus) and Continuous Delivery Environment (Puppet, Yum, rsync). Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, perform transformations, and insert into HBase. Provisioned Kubernetes clusters on EC2 instances using Docker and Terraform to create Dev/Test/Prod environments on AWS. Implemented Continuous Integration and Continuous Delivery using Jenkins. Integrated version control tools like GitHub with the build process in Jenkins. Extensively worked on Ansible deployments, creating playbooks with multiple roles, tasks, and loops. Provided critical analysis, solved issues, and recommended improvements in communication. Demonstrated strong problem-solving skills and a passion for making a positive impact. Quick learner proficient in adopting new and existing technologies under pressure. Environment: AWS EC2, Glue Catalog, ELB, S3, IAM, Lambda, VPC, Terraform, Docker, Snowflake, Teradata, Oracle, Ansible, Kafka, Kubernetes, GitHub, Jenkins. Apple, San Bruno, CA August 2020 September 2021 AWS Cloud Engineer Description: Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the world's largest technology company by revenue, with US$394.3 billion in 2022 revenue. As of March 2023, Apple is the world's biggest company by market capitalization. Responsibilities: Managed hybrid cloud environments (AWS) and led large-scale services migration from on-premises SQL to AWS RDS by proposing optimal services and architecture. Collaborated with the team to establish connectivity (VPN, Direct Connect, peering connection) between on-premises and cloud environments. Set up VPC with necessary security groups, NACLs, public and private subnets, NAT, and Internet Gateways. Created IAM roles and groups using AWS Identity Access Management (IAM) for users and resources. Proficient in working with multiple instances, Glue Catalog astic/Network/Application Load Balancers, security groups, auto scaling, and AMIs to design cost-effective, highly available, and fault-tolerant systems. Configured and created Glue Catalog, AWS ECS, IAM, ELB, Security Groups, Amazon RDS, auto scaling, web app servers, VPC, subnets, NAT, IGW, routing, and other resources. Developed RESTful API services using Spring Boot to upload data from local to AWS S3, list S3 objects, and perform file manipulation operations. Used AWS Glue for the data transformation, validate and data cleansing Configured and maintained AWS Lambda function triggered by Jenkins builds, storing artifacts in AWS S3 for team access. Utilized Route53 for traffic routing between regions and ensured alignment with cloud governance standards and security requirements through Security groups/Network ACLs. Installed applications on EC2 clusters using S3 buckets, CloudFront with ALB, and Web Application Firewall with Web ACLs. Set up SNS and SQS for notification and queueing. Configured Autoscaling groups to adjust replicas and increase high availability based on workloads. Conducted POCs on AWS ECS, Fargate (serverless), EKS, and DynamoDB for on-premises service migration planning. Created monitors, alarms, notifications and logs for lambda functions,Glue jobs,EC2 hosts using Troubleshot issues with cloud services, vendors, and open-source projects. Resolved network and performance-related issues during rotational on-call support. Used AWS S3 for snapshot storage and configured lifecycle policies for Applications and Databases logs. Proficient in working with SQL database RDS and NoSQL database DynamoDB, coordinating with respective teams for issue resolution. Analyzed weekly instance usage reports to select appropriate instance types based on Network I/O, CPU utilization, and RAM. Streamlined build processes and release management through continuous integration and deployment pipelines in collaboration with software development teams. Developed CI/CD pipelines for multiple environments, including AWS cloud, using Jenkins. Perform the automation using Puppet Configuration management. Automation of applications end to end through puppet. Leveraged Infrastructure as Code and automation tools like Terraform and Ansible to create scalable and repeatable infrastructure and provide guardrails for developers. Set up Kubernetes (k8s) clusters for running microservices, wrote Helm Charts and Kubernetes YAML files for microservice deployment. Created custom images and Docker Compose files for building containerized and multi-container applications. Utilized GitHub for version control and followed branching strategies. Proficient in using JIRA for ticket creation, workflows, and pulling reports from the dashboard. Configured AWS Multi-Factor Authentication in IAM to implement two-step authentication using Google Authenticator and AWS Virtual MFA. Responsible for infrastructure migration with updated versions for major releases. Environment: AWS IAM, ELB, CloudFront, RDS, ECS, EC2, S3, VPC, Terraform, Ansible, DynamoDB, Docker, JIRA, Kubernetes, GitHub, Jenkins. HP, Houston, TX November 2019 to July 2020 Associate Data Engineer Description: HP Inc. is an American multinational information technology company headquartered in Palo Alto, California, that develops personal computers, printers and related supplies, as well as 3D printing solutions. Responsibilities: Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow. Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snowflake Schemas. Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket. Evaluated suitability of Hadoop and its ecosystem to the above project and implemented / validating with various proof of concept (POC) applications to adopt them to benefit from the Big Data Hadoop Initiative. Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing. Worked on NoSQL databases including HBase and MongoDB. Configured MySQL Database to store Hive metadata. Utilized Apache Hadoop environment by Hortonworks. Deployed and administrated on Splunk and Hortonworks Distribution. Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL database and Sqoop. Extracted files from MongoDB through Sqoop and placed them in HDFS and processed. Created Kafka producer API to send live-stream data into various Kafka topics. Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams back to Kafka. Responsible for designing data pipelines using ETL for effective data ingestion from existing data management platforms to enterprise Data Lake. Designed the ETL process by creating high-level design documents including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling. Invoked in creating Hive tables, loading with data, and writing Hive queries, which will invoke MapReduce jobs in the backend. Utilized the Apache Hadoop environment by Cloudera. Monitoring and Debugging Spark jobs which are running on a Spark cluster using Cloudera Manager. Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS. Responsible for the development of Spark Cassandra connector to load data from flat file to Cassandra for analysis. Environment: AWS, Apache Hive, Scala, Cassandra, Cloudera, Apache Sqoop, Apache Spark, Kafka, HBase, MongoDB, Splunk, Matillion, Airflow. Biztime IT Solutions Pvt Ltd,India December 2016 August 2018 Data Engineer Description: Biztime IT is a leading emerging technology company, enables & delivers IT strategy & enterprise architecture capability, designs and develops large scale enterprise solutions. Our technological services include strategic consulting, systems design, custom application development, system integration and operations management. We work hard to develop solutions that help companies better manage their revenue stream and resources through improved productivity and cost-effective, scalable business solutions. Responsibilities: Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python. Got chance working on Apache NiFi like executing Spark script, Sqoop scripts through NiFi, worked on creating scatter and gather pattern in NiFi, ingesting data from Postgres to HDFS, Fetching Hive metadata and storing in HDFS, created a custom NiFi processor for filtering text from Flow files etc. Involved in running all the hive scripts through Hive, Impala, Hive on Spark and some through Spark SQL. Involved in performance tuning of Hive from design, storage and query perspectives. Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS. Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables. Wrote Spark SQL scripts to run over imported data and existing RDDs and implemented Spark practices partitioning, caching and check pointing. Developed code from scratch using Scala according to the technical requirements. Loaded all data into Hive from source CSV files using Spark. Experience in working with Scala while working with Map Reducing. Created Virtual Data Lake by using AWS Redshift, S3 to query large amount data stored on S3. Transferred data using Informatica tool from AWS S3 to AWS Redshift. Delivery experience on major Hadoop ecosystem Components such as Hive, Spark Kafka, Elastic Search & HBase and monitoring with Cloudera Manager and worked on loading disparate data sets coming from different sources to (HADOOP) environment using Spark. Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external tables. Developed Restful API services using spring boot to upload data from local to AWS S3, listing S3 objects and file manipulation operations. Developed shell script to pull the data from third party systems into Hadoop file system. Supported in setting up QA environment and updating configurations for implementing scripts with Pig. Ingested gigabytes of click stream data from external servers such as FTP server and S3 buckets on daily basis using custom Input Adapters. Created Sqoop scripts to import/export user profile data from RDBMS to S3 data lake. Environment: AWS S3, Redshift, Oracle, Mongo DB, T-SQL, Python, Hive, Scala, Restful API, Nifi, Impala, Map Reduce, Cloudera Manager. Bharti-Axa Life Insurance, Hyderabad,India Jun 2013 November 2016 Application Development Analyst Description: Bharti-Axa Life Insurance is one of India's leading business groups with interests in Telecom, agricultural business and financial services. Presently, the joint venture has a 51% stake from Bharti Group and 49% stake from AXA, in line with the regulatory framework on ownership in the Insurance sector. Responsibilities: Written Stored procedures and packages as per the business requirement and scheduled jobs for data checks. Developed complex queries to generate Monthly, Weekly reports to extract data for visualizing in QlikView. Designed and developed specific databases for collection, tracking and reporting of data. Designed, coded, tested, and debugged custom queries using Microsoft T-SQL and SQL Reporting Services. Conducted research to collect and assemble data for databases - Was responsible for design/development of relational databases for collecting data. Written the complex SQL Queries and scripts to give input to the reporting tools. Worked on the Data Ingestion framework to load data from the Edge Node to the appropriate directories in HDFS. Got a detailed understanding of NoSQL databases with Mongo and understood the concept of key value pair storage. Created Hive script to parse JSON data and added custom UDF to handle conversion of some hexadecimal fields to ASCII. Developed MapReduce programs to cleanse the data in HDFS to make it suitable for ingestion into Hive schema for analysis and to perform business specific transformations, like conversion of data fields, validation of data, and other business logic. Environment: T-SQL, AWS, Mongo DB, Hive, NoSQL, HDFS, MapReduce Keywords: continuous integration continuous deployment quality analyst business intelligence sthree database information technology hewlett packard microsoft California Texas |