Home

Arpit (Monu) Jain - Senior Azure Data Engineer
[email protected]
Location: NYC, New York, USA
Relocation: NY, MI / Remote
Visa: TN
Arpit (Monu) Jain
Senior Azure Data Engineer/ Lead
2018446579
[email protected]
NYC, NY
NY, MI / Remote
TN


Skills
Big Data Developer

Data Engineer

Hive Data Analyst

Python Developer

GCP Data Engineer

Scala Developer

Azure Data architect

Azure Data Engineer


AWS Data Engineer

Data Engineer Summary(Spark/Hive/HDFS)
Senior Big Data developer (Data Engineer) with over 12 years of experience and outstanding performer in Spark and open source technologies.
Deep understanding of different phased of data projects like Source, Ingestion, profiling ,cleaning , storage , processing , warehouse , reporting , Visualization.
Built messaging system using Kafka for sending ack message.
Used Spark 1.6, spark 2.0, PYSPARK with SCALA/PYTHON and JAVA.
Extensively used Hive and HDFS storage concepts for data storage and analytics.
Impala / Hive / Spark sql used for data extraction and profiling.
Wisely used different file format , compression based on nature of dataset
Sqoop tool based on MapReduce used for data ingestion to HDFS and exporting to RDBMS.
Cloud Summary
Expertise with Microsoft Azure for building data pipeline, integration with on premises data, computation using azure data bricks
Working closely with GCP on handling BIG QUERY reader with PySpark code and doing optimization on queries.
DATA proc cluster creation for client and cluster mode execution.
Microsoft certified Azure data Engineer. (Jan 2021)
Delivered multiple projects where I was responsible for building enterprise data lake over HDP and Azure.
AWS EMR , Glue catalog used for writing PySpark pipeline to process different file format data.
AWS S3 used for data storage and EC2 instance for file landing zone.
Azure data factory V2, Azure storage , Azure Data lake gen2 , Azure Databricks used for creating data pipeline , storage and computation.
Experience with AWS lambda ,AWS EMR , AWS redshift , S3 for data storage , data computation and analytics.
Snowflake SnowPro Core certified in July-2021.
Currently exploring GCP(Google cloud platform), working on code and data migration from Netezza/SQL server to Big query and computation migrated from Datastage to dataProc cluster and Pyspark.

Data reporting/interface
Used Datameer for data profiling and viewing parquet data.
Datastage tool used for data transformation and creating pipeline from source RDBMS to HDFS layer.
Dremio + Tableau used for connecting to HDFS/Hive and Visualize report on Tableau. Dremio act as query optimizer interface.
Programming language(SCALA/PYTHON/JAVA)
Expertise in PYSAPRK with Pandas library for data transformation and analysis.
Played data engineer role in multiple projects and using Spark + Scala for data pipelines.
Also use Spark 1.6 + core Java for data transformation. Used XML parsing with JAVA libraries.
Soft Skills
Recognized for inspiring management team members to excel and encouraging creative work environments.
Proven success in leadership, operational excellence and organizational development with keen understanding of Data driver business.
Act as an converter to take business requirement and convert to technical design and then in code also.
Tools
Maven / GITHUB / Jenkins used for code build , repository and deployment.
Hands on Eclipse , Jupyter Notebook , Pycharm , VS code IDE for code development.
SQL assistant used for running adhoc Hive queries for data profiling and data viewing.

Work History
2022-01
Current Azure Data Engineer/ Lead
CTS/ Air CA, NYC, NY
Toronto
Analyzing existing Datastage pipeline and migrated to PySpark executed in Azure Databricks
Writing all common routine in pyspark to build robust ETL tool in ADF/Databricks
Migrated Netezza data to Azure Synapse.
Azure Synapse used as data warehouse and data proc cluster as computation engine.
Developing stored proc in SQL server and calling from PySpark code for complex data transformation.
Writing custom functions to replicate datasatge functionality in pyspark .
Developed complex SQL query to execute in Synpase.


2021-03
2021-12 Data Engineer/ Lead
Cognizant/ Huntington Bank, Walgreens USA
NYC, NY
Working over Code migration from SSIS to PySpark in Hortonworks.
Technology conversion from T-sql to snowflake.
Actively participating in PySpark code design to cater client requirement.
Performance tuning spark and snowflake code to meet performance provided by SSIS and sql.
Deep analysis of data , understand the data and implement in code.
Worked in code conversion from DataStage to PySpark Job in Azure databricks.
Azure data lake used for storage and Data bricks used for computation.


2020-12 2021-03 Data Engineer
Tech Mahindra, ArcelorMittal, Montreal, CA
Building Data Ingestion tool using Azure event hub and Databricks.
ADLS gen2 and Azure syanpsis used for storage and data warehouse.
Loaded streaming data to warehouse in fact/dimensions schema using Pyspark code.
Azure storage used for control table and logs data to store.
PySpark used for data cleaning and transformation.

2020-01 2020-12 Data Engineer
Tech Mahindra, Rogers, Toronto, CA
Building Data Ingestion tool using Azure data factory,
Azure databricks notebook used for data computation and transformation.
Azure storage used for control table and logs data to store.
Migrating large volume of data from On premised HDP to Azure DL
Implemented Data migration from HDP 2.6 to Azure platform.
Built optimal storage design to achieve best storage cost and computing performance
Used extensively Spark dataset , Scala , shell scripting , Hive for this project
Build Data factory for migration from oracle / HDP to Azure.
GITHUB . MAVEN , Jenkins , Eclipse , PyCharm tools used for build , code management , deployment.

2018-01 - 2019-12 Data Engineer
Tech Mahindra ,RBC (Finance/IT), Toronto, CA
Delivered spark sql based solution for operations over swift messages
Impala used in Cloudera platform for data cleaning , data profiling and extraction of data for business.
Delivered python + pyspark code for client to automated excel work which earlier team was doing using excel tools
Build self-serve data layer for Data analytics and Data Scientist team using Dremio / Presto.
Delivered presentation on different file format and performance comparison in big data
Providing AWS cloud solution for clients big data migration from on premises to S3.
All computation moved to AWS EMR and data warehouse created in AWS Redshift
Datameer tool used for profiling, querying and viewing data.
2016-07 - 2017-12 Big Data Developer
Tech Mahindra , RBC Insurance, Toronto, CA
Completed data migration project from scratch to delivery and served as a Technical lead for Data lake project
Used shell Scripting for schedule, Spark - scala for data transformation and migration , Hive used for viewing and analysis
Migrated approx 60+ different system in to 3 downstream system and stored all data in data lake for future reporting and analysis requirement
Managed approx 15+ spark developers for this project and also responsible for Integration
SQOOP job used for importing/exporting data to/from Oracle to/from HDFS. It used MapReduce technology for processing.
Built messaging system using Kafka for sending ACK message to downstream application.
2014-10 - 2016-06 Big Data Developer
Tech Mahindra, Client:-NCB Jeddah ,Mumbai ,India
Served as Data lake data engineer for National commercial bank
I was responsible for choosing storage format , writing adhok and writing Hive routines for analysis purpose
Data storage in Data lake used for generating daily/monthly/yearly extract for vendors.
Writing Sqoop jobs, using HDFS as storage layer.
Using Hive query for data analysis.
Hortonworks distribution used for Hadoop.


.


2010-03 - 2014-10 Developer
Tech Mahindra, Mumbai, India
I served myself as Developer in Banking product from start of my career in IT industry. Part of SBI production troubleshooter team for 3 years
Used Cobol , PLSQL , Shell scripting for achieving different banking clients requirement
Implemented Swift payment system multiple banking projects.
Keywords: sthree information technology trade national California Michigan New York Tennessee

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];851
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: