Home

Shivam Mudgil - Data Engineer
[email protected]
Location: Remote, Remote, USA
Relocation: Yes
Visa: H1B
Shivam Mudgil
Data Engineer
7043275448
[email protected]

Yes
H1B
Professional Summary:
7 years of hands-on experience as a Data Engineer with a strong focus on Azure cloud technologies.
Proficient in SQL, Spark, Data Bricks, Python, Java, Scala, and scripting languages.
Proven track record of designing and building robust ETL pipelines to process, transform, and load data efficiently.
Experienced in optimizing data workflows and ensuring data quality and accuracy.
Familiarity with Azure data services, including Azure Data Factory, Azure Data Lake Storage, and Azure SQL Database.
Strong problem-solving skills and the ability to work collaboratively in cross-functional teams.
Excellent communication and interpersonal skills, with the ability to convey complex technical concepts to non-technical stakeholders.
Overall experience as Big Data Engineer, ETL developer, comprises designing, development, and implementation of data models for enterprise - level application.
Experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Tibco, Zookeeper, Kafka, Flume, MapReduce framework, Yarn, Scala, and Hue.
Experienced on End-to-End Software Development Life Cycle process in Agile Environment using SCRUM methodologies
Proven track record of successfully delivering ETL solutions using Matillion in a production environment.
Extensive hands-on experience with Matillion ETL, including the design and implementation of Matillion Orchestration jobs.
Proven track record of successfully delivering ETL solutions using Matillion in a production environment.
Extensive experience in working with NO SQL databases and its integration Dynamo DB, Cosmos DB, Mongo DB, Cassandra and HBase.
Experienced in Technologies on systems which comprises of massive amount of data running in highly distributive mode in Cloudera, Hortonworks Hadoop distributions and Amazon AWS.
Good Knowledge on architecture and components of Python, and efficient in working with Spark Core, Spark SQL, Spark streaming.
Experience in creating separate virtual data warehouses with difference size classes in AWS Snowflake.
Experience with data transformations utilizing SnowSQL and Python in Snowflake.
Experienced on Data lakes and Business intelligence tools in Azure.
Experienced on Spark Data Frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building.
Grafana , Data Visualization: Proficient in creating visually appealing and informative dashboards using Grafana.
Experienced on AWS compute services such as EC2, S3, Redshift, Cloud Watch, Elastic Search, AIM, ELK, ELB, EKS, ECS, SNS Elastic Map Reduce (EMR), EBS and accessing Instance metadata.
Experienced on Sqoop for data Ingestion, Hive & Spark for Data Processing & Oozie for designing complex workflows in Hadoop framework.
Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data.
Excellent exposure to Data Visualization with Tableau, Power BI, Seaborn, Matplotlib and ggplot2.
Experience on Data Analytics Services such as Athena, Glue Data Catalog & Quick Sight.
Expertise in deploying cloud-based services with Amazon Web Services (Databases, Migration, Compute, IAM, Storage, Analytics, Network & Content Delivery, Lambda and Application Integration).
Proficient in Data Visualization tools such as Tableau and Power BI, Big Data tools such as Hadoop HDFS, Spark and MapReduce, MySQL, Oracle SQL and Redshift SQL and Microsoft Excel (VLOOKUP, Pivot tables).
Experienced in using Databricks with Azure Data Factory (ADF) to compute large volumes of data.
Experienced in developing JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data.
Expertise in programming tools like Python, Scala, SQL, R, SAS and complete maintenance in Software Development Life Cycle (SDLC) like Agile/SCRUM environment.
Experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, Azure Data Factory.
Hands on experience in App Development using Hadoop, RDBMS, and Linux shell scripting.
Experienced in various Data Modeling packages like NumPy, SciPy, Pandas, Beautiful Soup, Scikit - Learn, Matplotlib, Seaborn in Python and Dplyr, TidyR, ggplot2 in R.
Expert in Data Extraction, Transforming and Loading (ETL) using SQL Server Integration Services (SSIS), DTS, Bulk Insert, BCP from sources like Oracle, Excel, CSV, XML.
Experience writing deployment script using DOCKER, SQLCMD, RS, DTUTIL and release document for release management team.
Experience with Data Visualization dashboard tools like Microstratergy, Tableau, Pentaho Report Designer, Cognos, Business Intelligence, Platfora.
Experience administering and maintaining source control systems, including branching and merging strategies with solutions such as GIT (Bitbucket/Gitlab) or Subversion.
Experience in deploying streaming maven build cloud dataflow jobs.
Experience in GCP Dataproc, Dataflow, PubSub, GCS, Cloud functions, BigQuery, Stackdriver, Cloud logging, IAM, Data studio for reporting etc.
Experience in Unix shell scripting, for ETL job run automation and Datastage engine admin.
Experience in providing highly available and fault tolerant applications utilizing orchestration technologies like Kubernetes on Google Cloud Platform.

Technical Skills


Big Data Ecosystem Hadoop, Spark, MapReduce, YARN, Hive, SparkSQL, Pig, Sqoop, HBase, Flume, Oozie, Zookeeper, Avro, Parquet, Maven, Snappy, StreamSets SDC
Hadoop Architecture HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce
Hadoop Distributions Cloudera, MapR, Hortonworks
NoSQL Databases Cassandra, Mongo DB, HBase
Languages Python, Scala, SQL
Databases SQL Server, MySQL, PostgreSQL, Oracle
ETL/BI Talend, Tableau
Operating Systems UNIX, Linux, Windows Variants
AWS Services EC2, EMR,S3,KMS,IAM,Redshift,Lambda,Athena
Languages Java, SQL, Scala, Python
PROFESSIONAL EXPERIENCE:

Credit One, Las Vegas, NV. Jan 2023 - Current
Role: Data Engineer

Responsibilities:
Developed and Deployed Application Platform Monitoring tool for observability and analyzing the distributed metrics from pods and kubernetes clusters on the openshift platform using yaml
files, helm chart and python agent
Collaborate with cross-functional teams, including DevOps, software engineers, and data scientists, to build scalable and reliable data infrastructure, services, and applications.
Developing and maintaining agile data engineering processes, such as continuous integration and delivery (CI/CD), to ensure that data processing pipelines are efficient, reliable, and
scalable.
Utilized Airflow to schedule and manage data ingestion workflows, including dynamic DAG generation, ensuring robust and scalable data processing.
Created Dashboards using Apache Superset for monitoring DAGs metrics from airflow to provide performance insights of the platform
Proficient in utilizing Jira for Agile project management, sprint planning, JQL, bug and issue tracking
Designed and implemented a comprehensive data ingestion framework capable of processing varied source data feeds, including zipped, encrypted, CSV, JSON, delimited, fixed width, and Ebcdic files.
Established file validation mechanisms, ensuring data integrity through file name format checks, readability assessments, delimiter count validation, and header validation against schema versions. Led the design and development of end-to-end ETL pipelines, ensuring the efficient extraction, transformation, and loading of data from various sources into Azure data storage solutions.
Utilized SQL, Spark, and Data Bricks to process large datasets, improving data processing speed.
Collaborated closely with data scientists and analysts to understand data requirements and optimize data structures for analytical purposes.
Implemented data validation and quality checks to maintain data integrity throughout the pipeline.
Developed custom data processing solutions in Python, Java, and Scala to address specific project needs, resulting in improvement in data processing efficiency.
Worked with Azure Data Factory to automate and schedule data workflows, reducing manual intervention and enhancing data pipeline reliability.
Conducted performance tuning and optimization of data pipelines to meet SLAs and minimize resource utilization.
Ensured compliance with data security and privacy regulations by implementing data encryption and access controls.
Provided documentation and training to support knowledge sharing among team members.
Contributed to the development of best practices and standards for data engineering within the organization.

Capital One, Plano, TX. Oct. 2019 to Dec 2022
Role: Data Engineer

Responsibilities:
Worked in a Fast-paced Startup environment with complete life cycle of product development (SDLC) and Agile/Scrum.
Working with AWS stack S3, EC2, Snowball, EMR, Athena, Glue, Redshift, DynamoDB, RDS, Aurora, IAM, Firehose, and Lambda.
Implemented Hadoop jobs on a EMR cluster performing several Spark, Hive & MapReduce Jobs for processing data for building recommendation Engines, Transactional fraud analytics and behavioural insights.
Worked on data pipeline used for all the Transformation of SQL and Python Scripts loads in Redshift for Incremental load process and ETL-Talend is used for picking the file s3 to targets.
Involved in writing Python scripts to build ETL pipeline and Directed Acyclic Graph (DAG) workflows using Airflow, Apache NiFi.
Involved in working with Java, J2EE, React, and NoSQL, Database and web technologies.
Worked on processed data from different sources to AWS Redshift using EMR - Spark, Python programming.
Worked on ETL jobs through Spark with SQL, Hive, Streaming & Kudu Contexts.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Involved in Spark Scala functions for mining data to provide real time insights and reports.
Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
Worked on validation of data transformations and perform end-to-end data validation for ETL and BI systems.
Worked in AWS environment for development and deployment of Custom Hadoop Applications.
Worked on migrating Hive & MapReduce jobs to EMR and Qubole with automating the workflows using Airflow.
Implemented Spark using Python and Scala and Spark SQL for faster testing and processing of data.
Worked on ETL jobs with Hadoop technologies and tools like Hive, Sqoop and Oozie to extract records from different databases into the HDFS.
Worked with team to build data pipelines and UI for website s data analysis modules using AWS and
Git .
Worked on writting several Map reduce Jobs using Pyspark, Numpy and used Jenkins for Continuous integration.
Involved in generating data extracts in Tableau by connecting to the view using Tableau MySQL connector.
Involved in working around Kubernetes.
Involved in the development and execution of ETL related functionality, performance and integration test cases and documentation.
Involved in writing Spark Scala scripts and UDF's to perform transformations on large datasets.
Enabled speedy reviews and first mover advantages by using Oozieto automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
Worked on ETL jobs with Hadoop technologies and tools like Hive, Sqoop and Oozie to extract records from different databases into the HDFS.
Involved in working with Linux environment and user groups.
Environment: Agile, Hadoop, Python, Oozie, Pig, Hive, Sqoop, Map Reduce, Jenkins, Tableau, GIT, AWS, EC2, S3, RedShift, RDS, Route53, EMR, Elastic Search, IAM, Scala, Mongo DB, Cassandra, ETL, Linux.


Acer Inc., Dallas, TX. June 2018 - Nov 2018
Role: Data Engineer

Responsibilities:
Followed Agile methodologies and implemented them on various projects by setting up Sprint for every two weeks and daily stand-up meetings.
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
Worked on AWS and BIG Data Technologies like HDFS, HIVE, SQOOP, EMR, TIBCO, SPARK AWS, REDSHIFT, EMR, EC2, DATAPIPELINE.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
Wrote Python scripts to process semi-structured data in formats like JSON.
Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python.
Created HLD and LLD Specification document for the design preparation for ETL and Data Architect team.
Implemented Hadoop jobs on a EMR cluster performing several Spark, Hive & MapReduce Jobs for processing data for building recommendation Engines, Transactional fraud analytics and Behavioral insights.
Worked on Tableau software for the reporting needs.
Worked on Analyzing and Developing Complex SQL queries, Stored Procedures, ETL Mapping for application development.
Configured AWS Identity and Access Management (IAM) Groups and Users for improved login authentication.
Worked on Processed data from different sources to hive target using Python Spark programming.
Worked on conducting systems design, feasibility and cost studies and recommend cost-effective cloud solutions such as Amazon Web Services (AWS).
Performed record joins using Hive and Spark using the Data sets and pushed the Tables to Apache Kudu.
Developed a Python script to integrate DDL changes between on-prem Talend warehouse and snowflake.
Involved in moving data from HDFS to AWS Simple Storage Service (S3) and extensively worked with S3 bucket in AWS.
Involved in working with NoSQL Databases like MongoDB.
Developed Spark streaming pipeline in Python to parse JSON data and to store in Hive tables.
Developed Spark scripts by using Python and Shell scripting commands as per the requirement.
Design and development of ETL mappings in Informatica / Ab Initio GDE to process data ingestion in Hadoop HDFS, Hive, Teradata and Oracle database.
Developed spark application for filtering Json source data inAWSS3 location and store it into HDFS with partitions and used spark to extract schema of Json files.
Worked with continuous Integration of application using Jenkins.
Implemented schema extraction for Parquet and Avro file Formats in Hive/MongoDB.

Environment: Agile, Hadoop, Oozie, Tibco, Pig, Hive, Map Reduce, Jenkins, Python, Tableau, Scala, NoSQL, MongoDB, AWS, EC2, S3, RedShift, RDS, ETL, Route53, EMR, Elastic Search, Cloud Watch, IAM.

Parr Infotech Jan. 2017 to July 2017
Role: Data Engineer

Responsibilities:
Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using SQOOP.
Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by directed.
Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
Importing data from SQL to HDFS & Hive for analytical purpose.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR and MapR (MapR data platform).
Creating ETL mappings and enhancing existing mappings to facilitate the data load in system.
Developed DAGs in Airflow scheduler using Python and experience in creating Dag-Runs and Task- Instances.
Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Worked with Pyspark for using Spark liberties by using python scripting for data analysis.
Worked on Python, Docker (containerized technology) to Automated with DevOps -Ansible and Docker.
Used Scala to write the code for all the use cases in Spark and extensive experience with Scala for data analytics on Spark cluster and Performed map-side joins on RDD.
Worked on creating various repositories and version control using GIT.
Worked on delivering major Hadoop ecosystem Components such as Pig, Hive, Spark.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
Involved in working with NoSQL databases like MongoDB, HBase and Cassandra.
Used AWS EMR as a data processing platform and worked on AWS S3 and snowflake as data storage platform.
Developing jobs scheduler and Shell Scripts for the ETL job automation in UNIX environment.
Developed python code for different tasks, dependencies and time sensor for each job workflow management and automation using Apache Airflow.
Worked on creating few Tableau dashboard reports, Heat map charts and supported numerous dashboards, pie charts and heat map charts that were built on Teradata database.
Populating Hbase tables and querying Hbase using hive shell.
Creating functional and technical ETL mapping specification document for data mappings.
Performed operations on AWS using EC2 instances, S3 storage, performed RDS, Lambda, analytical Redshift operations.
Developed and supported ETL transformation mapping and enhancing existing mappings to facilitate data load into DWH.
Worked on parsing the data from S3 through the Python API calls through the Amazon API Gateway generating Batch Source for processing.
Performed Data Ingestion using Sqoop, Used Hive QL & Spark SQL for data processing and scheduled the complex workflows using Oozie.
Created Logical and Physical data model in multilayer process zone data architecture in STAR / SNOWFLAKE Dimensional modelling in Teradata database and Digital ECO system in HDP Hadoop HDFS - HIVE.
Used Python API by developing Kafka producer, consumer for writing Avro Schemes.

Environment: Scrum, Hadoop, Oozie, Pig, Hive, Tibco, Map Reduce, Python, Scala, Docker, Tableau, GIT, NoSQL, MongoDB, HBase and Cassandra, AWS, EC2, S3, RedShift, RDS, Route53, EMR, Elastic Search, Cloud Watch, IAM, UNIX, ETL.

Maruti Suzuki India Limited. July 2015 to Dec. 2016
Role: Data Engineer

Responsibilities:
Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
Followed Waterfall methodology in developing the project.
Worked on building data processing pipeline based on Spark different AWS services like S3, EC2, EMR,
SNS, SQS, Lambda, Redshift, Data pipeline, Athena, AWS Glue, S3 Glacier, Cloud Watch, Cloud Formation, IAM, AWS Single Sign-On, Key Management Service, AWS Transfer for SFTP, VPC, SES, Code Commit, Code Build.
Developed the Pyspark code for AWS Glue jobs and for EMR.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQ.
Imported Data from AWS S3 into Spark RDD and performed transformation and actions on RDDs.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Used Scala for amazing concurrency support, and Scala plays the key role in parallelizing processing of the large data sets.
Configured spark streaming to receive real time data from the Apache Flume and store the stream data using Scala to Azure Table.
Worked on Data gathering, data cleaning and data wrangling performed using Python and R
Performed various data validation jobs in the backend through hive and Hbase.
Build cluster on AWS environment using EMR using S3, EC2, Redshift.
Involved in utilizing Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
Developed Apache Spark applications by using Scala for data processing from various streaming sources.
Optimized existing Scala code and improved the cluster performance.
Performed streaming data processing of EDI feeds as a result of chassis activity using Kafka, Spark streaming, Cassandra, Hadoop etc.
Implement Tableau server user access control for various Dashboard requirements.
Worked on Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML, CSV and generated Bags for processing using pig etc.
Worked on creating source to target data (DEM) mapping document with transformation logic for ETL build and data validation.
Worked on creating end-to-end solution for ETL transformation jobs that involve writing Informatica workflows and mappings.
Developed Python, SQL, Spark Streaming using PySpark and Scala scripts.
Developed ReciPy, a Python text interpreter module combining NLP, pos-tagging, and search techniques.
Involved in Developing a Restful service using Python Flask framework.
Worked on analyzing and understanding the ETL workflows that have been developed.
Involved in writting Python modules to view and connect the Apache Cassandra instance.
Worked with JSON, CSV, Sequential and Text file formats.
Performed Data Analysis / Data mining from various source systems using SQL s, capture metadata details, create data cleansing rules, Unwinding the logic from Stored procedures, ETL Mappings, SAS datasets etc.
Worked on populating Hbase tables and querying Hbase using hive shell.

Environment: Waterfall, AWS, EC2, EMR, Redshift, Hadoop, Oozie, Pig, Hive, Map Reduce, CVS, Docker, Python, Scala, Cassandra, Cloud Watch, SQL, Spark, Pyspark, Tableau, JSON, CSV, ETL.
Keywords: continuous integration continuous deployment user interface business intelligence sthree database rlang information technology Nevada Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1454
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: