Arpit - Sr Azure Data Engineer |
[email protected] |
Location: New City, New York, USA |
Relocation: |
Visa: |
****************
Feel free to reach me at: E: [email protected] / D: 201 897 5246 ************************* Data Engineer Summary(Spark/Hive/HDFS) Senior Big Data developer (Data Engineer) with over 12 years of experience and outstanding performer in Spark and open source technologies. Deep understanding of different phased of data projects like Source, Ingestion, profiling ,cleaning , storage , processing , warehouse , reporting , Visualization. Built messaging system using Kafka for sending ack message. Used Spark 1.6, spark 2.0, PYSPARK with SCALA/PYTHON and JAVA. Extensively used Hive and HDFS storage concepts for data storage and analytics. Impala / Hive / Spark sql used for data extraction and profiling. Wisely used different file format , compression based on nature of dataset Sqoop tool based on MapReduce used for data ingestion to HDFS and exporting to RDBMS. Cloud Summary Expertise with Microsoft Azure for building data pipeline, integration with on premises data, computation using azure data bricks Working closely with GCP on handling BIG QUERY reader with PySpark code and doing optimization on queries. DATA proc cluster creation for client and cluster mode execution. Microsoft certified Azure data Engineer. (Jan 2021) Delivered multiple projects where I was responsible for building enterprise data lake over HDP and Azure. AWS EMR , Glue catalog used for writing PySpark pipeline to process different file format data. AWS S3 used for data storage and EC2 instance for file landing zone. Azure data factory V2, Azure storage , Azure Data lake gen2 , Azure Databricks used for creating data pipeline , storage and computation. Experience with AWS lambda ,AWS EMR , AWS redshift , S3 for data storage , data computation and analytics. Snowflake SnowPro Core certified in July-2021. Currently exploring GCP(Google cloud platform), working on code and data migration from Netezza/SQL server to Big query and computation migrated from Datastage to dataProc cluster and Pyspark. Data reporting/interface Used Datameer for data profiling and viewing parquet data. Datastage tool used for data transformation and creating pipeline from source RDBMS to HDFS layer. Dremio + Tableau used for connecting to HDFS/Hive and Visualize report on Tableau. Dremio act as query optimizer interface. Programming language(SCALA/PYTHON/JAVA) Expertise in PYSAPRK with Pandas library for data transformation and analysis. Played data engineer role in multiple projects and using Spark + Scala for data pipelines. Also use Spark 1.6 + core Java for data transformation. Used XML parsing with JAVA libraries. Soft Skills Recognized for inspiring management team members to excel and encouraging creative work environments. Proven success in leadership, operational excellence and organizational development with keen understanding of Data driver business. Act as an converter to take business requirement and convert to technical design and then in code also. Tools Maven / GITHUB / Jenkins used for code build , repository and deployment. Hands on Eclipse , Jupyter Notebook , Pycharm , VS code IDE for code development. SQL assistant used for running adhoc Hive queries for data profiling and data viewing. Keywords: sthree |