Home

upendra - Data Engineer
[email protected]
Location: Alice, Texas, USA
Relocation: YES Anywhere in USA
Visa: OPT
Upendra G
[email protected]
Data Engineer
Contact : (567) 243-8896
Professional Summary
Five+ years Professional experience in big data Development primarily using Hadoop and Spark Ecosystems.
Experience in design, development, and Implementation of Big data applications using Hadoop ecosystem frameworks and tools like HDFS, Sqoop, Spark, Scala, Storm HBase, Kafka, Flume.
Experience in Design and Development of Big data applications using HDFS, Yarn, Pig, Hive, Sqoop, Spark, Storm HBase, Kafka, Flume, Nifi, Impala, Oozie, Zookeeper, Airflow, etc.
Expertise in developing using Python, Scala and Java.
Extensive experience working on spark in performing ETL using Spark Core, Spark-SQL and Real-time data processing using Spark Streaming.
Experience working on very large database (VLDB) with high scalability / availability and implementing Best Practices in SQL Server (SSIS, SSAS, SSRS) /PostgreSQL /MySQL / Oracle databases.
Expertise in working with RDD s, Data Frames and DataSet API s.
Good understanding of Distributed Systems architecture and design principles behind Parallel Computing.
Expertise in developing production ready Spark applications utilizing Spark-RDD, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
Proficient in building and implementing data engineering pipelines using Azure services like Azure Data Factory, Azure Data Lake Storage, Azure Data Bricks, and Azure Synapse Analytics.
Strong experience troubleshooting failures in spark applications and fine-tuning spark applications and hive queries for better performance.
Established expertise in developing SQL, DDL, DML and vendor specific data programming languages such as PL/SQL or Transact SQL
Experience in understanding of the Specifications for Data Warehouse ETL Process and interacting with the designers and the end users for informational requirements.
Extensively worked with Kafka as messaging service for real-time data pipelines.
Writing UDFs and integrating with Hive using Java.
Experience on Migrating SQL database to Azure Data Lake, Azure data Lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
Demonstrated experience in delivering data and analytic solutions leveraging AWS, Azure or similar cloud data lake.
Experience with Parquet files, AVRO and ORC file formats and compression. Worked with various file formats such as CSV, JSON, XML, ORC, Avro, and Parquet file formats.
Worked with various compression techniques like BZIP, GZIP, Snappy, and LZO.
Data Streaming from various sources like cloud (AWS, Azure) and on - premises by using the tools like Spark.
Have good exposure with the star, snowflake schema, data modelling and work with different data warehouse projects.
Strong understanding of the entire AWS Product and Service suite primarily EC2, S3, VPC, Lambda,
Redshift, Spectrum, Athena, EMR(Hadoop) and other monitoring service of products and their applicable
Expertise in writing scripts in SQL and HQL for analytics applications in RDBMS and Hive.
Expertise in working with Hive optimization techniques like Partitioning, Bucketing, vectorizations and Map side-joins, Bucket-Map Join, skew joins, and creating Indexes.
Experience in working with Flume and NiFi for loading log files into Hadoop.
Experience in working with NoSQL databases like HBase and Cassandra.
Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS. Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.
Experience in working with various build and automation like GIT, Maven, SVN and Jenkins.
Experienced in performing code reviews, involved closely in smoke testing sessions, retrospective sessions.
Experience in working with various SDLC methodologies like Waterfall, Agile Scrum, and TDD for developing and delivering applications.
Strong troubleshooting and production support skills and interaction abilities with end users.

Technical Skills:

Hadoop/Big Data Technologies HDFS, Apache NIFI, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambari, Storm, Redshift ,Spark and Kafka
Programming and Scripting Python, Spark, Kafka, Scala, Java, SQL, JavaScript, Shell Scripting, HiveQL
Cloud AWS, Azure
Java Technologies Java, J2EE, JDBC
Databases Oracle, MY SQL, MS SQL Server, Vertica, Teradata
Analytics Tools Tableau, Microsoft SSIS, SSAS and SSRS
Operating Systems Linux, Unix, Windows 8, Windows 7, Windows Server 2008/2003

Professional Experience:

Geha, Lee s Summit, MO
Data Engineer Sep 2021 to Present

Responsibilities:
Work closely with Business Analysts and Product Owner to understand the requirements.
Developed applications using spark to implement various aggregation and transformation functions of Spark RDD and Spark SQL.
Used Joins in SPARK for making smaller datasets to large datasets without shuffling of data across nodes.
Attended requirement calls and worked with Business Analyst and Solution Architects to understand the requirements.
Created pipelines in Azure using ADF to get the data from different source systems and transform the data by using many activities.
Design and developed Batch processing and real-time processing solutions using ADF, Databricks clusters and stream Analytics.
Experience in designing, developing, and implementing ETL pipelines using Azure Databricks.
Ingested huge volume and variety of data from disparate source systems into Azure DataLake Gen2 using Azure Data FactoryV2.
Created reusable pipelines in Data Factory to extract, transform and load data into Azure SQL DB and SQL Data warehouse.
Implemented both ETL and ELT architectures in Azure using Data Factory, Databricks, SQL DB and SQL Data warehouse.
Proficiency in using Apache Spark and PySpark to process large datasets, including data ingestion, transformation, and aggregation.
Proficiency in using Delta Lake with various data formats, including Parquet, Avro, JSON, and CSV, and experience in reading and writing data from/to Delta tables using Databricks notebooks and Spark SQL.
Experience in using Databricks Delta Lake, a scalable and performance storage layer for Delta tables, which provides ACID Transactions, schema enforcement, and time travel capabilities.
Created, provisioned multiple Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
Experienced in developing audit, balance and control framework using SQL DB audit tables to control the ingestion, transformation, and load process in Azure.
Solid experience in Data Warehousing best practices working with Metadata, repositories, and experience within adisciplined lifecycle methodology.
Managing Databricks Notebooks Delta Lake with Python Delta Lake with Spark SQL
Developed and executed migration strategies to move workloads from on-premises or other cloud platforms to Azure, leveraging OCI for supporting components.
Used Azure Logic Apps to develop workflows which can send alerts/notifications on different jobs in Azure.
Used Azure Devops to build and release different versions of code in different environments.
Well-versed with Azure authentication mechanisms such as Service principal, Managed Identity, Key vaults.
Created External tables in Azure SQL Database for data visualization and reporting purposes.
Worked with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.

Environment: Python, Azure Data Factory (ADF), Azure Data Lake Store (ADLS), Azure Data Lake Analytics, Azure Databricks, Azure Machine Learning, Azure Synapse Analytics, Azure Purview, Azure Logic Apps, Power BI, SQL Server, Oracle, Python, PySpark, Spark-SQL,

Deloitte, Austin, TX
(Worked through Deloitte India for Austin Client)
Data Engineer Aug 2019 to Aug 2021

Responsibilities:
Participated in requirement gathering meetings with regulators, reviewed and gathered the requirements.
Developed and built data engineering pipelines using PySpark/Python in Azure Databricks using services like Azure Functions, Azure Data Lake Storage, and Azure SQL Database.
Expert in building Databricks notebooks, extracting data from various source systems like DB2, Teradata, and performing data cleansing, data wrangling, data ETL processing, and loading to Azure SQL Database (SQL Server).
Designed and implemented data models for efficient data storage, retrieval, and analysis in a data warehousing environment, utilizing Azure Synapse Analytics (formerly Azure SQL Data Warehouse).
Performed ETL operations in Azure Databricks by connecting to different relational database source systems using job connectors.
Developed Azure Data Factory (ADF) pipelines to orchestrate and manage data movement and transformation activities within Azure services.
Utilized Python scripts to create a customized read/write utility function for Snowflake, facilitating data transfer from an Azure Data Lake Storage to Azure Synapse Analytics.
Implemented ETL Processes in Azure Data Factory to migrate Campaign data from external sources like Azure Data Lake Storage, Parquet, and Text Files into Azure Synapse Analytics.
Demonstrated expertise in SQL, proficient in writing complex queries, optimizing SQL performance, and working with databases like Azure SQL Database and Azure Synapse Analytics.
Wrote complex SQL, T-SQL, Procedures, Functions, and Packages to validate data and testing processes in Azure SQL Database and Azure Synapse Analytics.
Streamlined data integration processes by utilizing Azure Data Factory's intuitive interface and automation features, reducing development time, and improving overall efficiency in data movement tasks.
Refactored complex workflows into Azure Databricks notebooks with PySpark and Pandas, developed data quality rules, and end-to-end transformation logics, created Azure Data Factory pipelines for data flow orchestration.
Designed and created automated applications for building reports, dashboards, and visualization solutions using Azure Synapse Analytics, Power BI, and Tableau.

Environment: Azure Synapse Analytics, Azure Data Factory (ADF), Azure Databricks, Azure Data Lake Storage, Azure Functions, Azure SQL Database, Azure Key Vault Other Technologies: Python, PySpark, Spark-SQL, T-SQL, JSON, Unix Shell Scripting.



Bank Of America, Charlotte, NC
(Worked through L&T from India)
Junior Data Engineer Jul 2018 to Jul 2019

Responsibilities:
Work on requirements gathering, analysis and designing of the systems. Actively involved in designing Hadoop ecosystem pipeline.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Involved in designing Kafka for multi data center cluster and monitoring it.
Responsible for importing real time data to pull the data from sources to Kafka clusters.
Worked with spark techniques like refreshing the table and handling parallelly and modifying the spark defaults for performance tuning.
Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
Performed SQL Joins among Hive tables to get input for Spark batch process.
Worked with data science team to build statistical model with Spark MLLIB and PySpark.
Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS
Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
Configured Hive bolts and written data to hive in Hortonworks as a part of POC.
Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
Developed Python script for start a job and end a job smoothly for a UC4 workflow
Developed Oozie workflow for scheduling & orchestrating the ETL process.
Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
Wrote Python scripts to parse XML documents and load the data in database.
Worked extensively on Apache Nifi to build Nifi flows for the existing Oozie jobs to get the incremental load, full load and semi structured data and to get data from Rest API into Hadoop and automate all the Nifi flows runs incrementally.
Created Nifi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
Developed shell scripts to periodically perform incremental import of data from third party API to Amazon AWS
Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
Used version control tools like GITHUB to share the code snippet among the team members.
Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment: Python, Spark, Scala, Kafka, Hbase, Nifi, MYSQL, Oracle 12c, Linux, Oozie, Sqoop, Shell Scripting, AWS.
Keywords: machine learning business intelligence sthree database information technology microsoft procedural language Missouri North Carolina Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1139
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: