Home

Kumar gude - Data engineer
[email protected]
Location: Irving, Texas, USA
Relocation:
Visa: GC
KUMAR GUDE
Azure\Big Data Engineer
Phone: +14699278106 Email: [email protected]
PROFESSIONAL SUMMARY

Having 11+ years of experience in Data Engineering using Kafka, Hadoop, Snowflake & Informatica and Azure Cloud Platform.
Experience in implementing large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI. Hands on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.
Exhibit profound experience designing cloud-based solutions in Azure by creating Azure SQL database, setting up Elastic pool jobs and design tabular models in Azure analysis services.
Deployed and developed various data pipelines extensively and intricately harnessing the power of Azure Data Factory.
Have extensive experience in creating pipeline jobs, schedule triggers using Azure data factory.
Expertise in Azure Data Platform, including Azure Synapse, Data Factory, SQL, and ADLS.
Utilized Delta Lake, Delta Tables, Delta Live Tables, Data Catalogues and Delta Lake API for implementing Data pipelines.
A Data Engineer with expertise in Azure Data Factory (ADF) possesses skills in managing and orchestrating data workflows within the Azure ecosystem.
Develop Power BI reports & effective dashboards after gathering and translating end user requirements
Azure Data Factory is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines to move and transform data from various sources to destinations.
Worked on Snowflake Schema, Data Modeling and Elements,, and Source to Target Mappings, Interface Matrix and Design elements.
Orchestrated data integration pipelines in ADF using various Activities like Get Metadata, Lookup, For Each, Wait, Execute Pipeline, Set Variable, Filter, until, etc.
Developed and maintained multiple Power BI dashboards/reports and content packs
Experience in building near-real-time ETL pipelines using Snowflake, Kafka and PySpark.
Implemented Spark performance tuning, Spark SQL, Spark Streaming in Bigdata & Azure Databricks.
Proficient in scripting languages such as Python, PySpark and Scala, enabling seamless integration of custom functionalities into data pipelines.
Hands-On working experience with a diverse range of file formats, including CSV, JSON, Parquet, and Avro, to efficiently store, process, and exchange data within data engineering pipelines and analytics workflows.
Highly skilled in utilizing Hadoop, HDFS, Map-Reduce, Hive, and SparkSQL and for efficient ETL tasks, real-time data processing, and analytics.
Created POWER BI Visualizations and Dashboards as per the requirements
Well-versed in both Map Reduce 1 (Job Tracker) and Map Reduce 2 (YARN) setups.
Experience in developing Spark applications in Python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
Configured and managed Zookeeper to ensure efficient coordination and synchronization of distributed data processing systems.
Highly skilled in AWS, Snowflake Database, Python, Oracle, Exadata, Informatica, SQL, PL/SQL, bash scripting, Hadoop, Hive, Databricks.
Developed data integration solutions using Palantir AIP to ingest, transform, and analyze large volumes of structured and unstructured data from various sources.
Implemented Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and unstructured data to meet business functional requirements
Provided technical expertise and support to troubleshoot and resolve issues related to data ingestion, processing, and analysis in Palantir AIP environments.
Proficient in using Snow SQL for complex data manipulation tasks and developing efficient data pipelines.
Experienced in partitioning strategies and multi-cluster warehouses in Snowflake to ensure optimal query performance and scalability.
Experienced in developing and designing Dashboards using Power BI
Proficient in utilizing virtual warehouses, caching, and Snow Pipe for real-time data ingestion and processing in Snowflake.
Strong knowledge of Snowflake's time travel feature for auditing and analyzing historical data.
Have extensive experience in creating views on the tables for downstream teams.
Demonstrated ability to design and implement data integration strategies between Snowflake and external systems, leveraging technologies such as Apache Airflow or custom-built orchestration frameworks to ensure seamless data movement and synchronization.
Design and developed Batch processing and real-time processing solutions using ADF, Databricks clusters and stream Analytics
Experience in Data analysis, Data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming.
Deployed and tested (CI/CD) developed code using Visual Studio Team Services (VSTS), Jenkins & Jules.
Implemented data visualization using Power BI, Tableau, and DAX.
Collaborated seamlessly with data analysts and stakeholders to implement well-aligned data models, structures, and designs.
TECHNICAL SKILLS


Azure Azure Blob Storage, ADLS Gen 2, Azure Delta Lake, Azure Data Factory, Airflow, Azure Data Bricks, Logic Apps, Functional Apps, Snowflake, Azure DevOps, Azure Synapse Analytics.
Big Data Technologies Map Reduce, Hive, Teg, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper
Hadoop Distribution Cloudera, HortonWorks
Languages SQL, PL/SQL, Python, HiveQL, Scala.
SDLC Agile, Waterfall
Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Version Control GIT, GitHub, Big Bucket
IDE &Build Tools Eclipse, Visual Studio.
Databases MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 11g/12c, Cosmos DB
Data Visualization Tableau, Power BI
EDUCATION
Bachelor s computer science and engineering from Saveetha School of Engineering

WORK EXPERIENCE

Azure Data Engineer
Client: Citi Bank July 2022 to Present
Responsibilities:
Implemented end-to-end data pipelines using Azure Data Factory to extract, transform, and load (ETL) data from diverse sources into Snowflake.
Designed and implemented data processing workflows using Azure Databricks, leveraging Spark for large-scale data transformations.
Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
Migrated on premises enterprise data warehouse to cloud based snowflake Data Warehousing solution and enhanced the data architecture to use snowflake as a single data platform for all analytical purposes.
Ingested data to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processed the data in In Azure Databricks.
Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
Developed the robust and scalable ETL Pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, and Pipeline to extract, transform and load data from different sources like Azure SQL, Blob storage, and Azure SQL DW.
Designed and configured Azure Cloud relational servers and databases, analyzing current and future business requirements.
Implemented Azure, self-hosted integration runtime in ADF.
Developed the data pipeline to migrate the data from on-premises (MY SQL, Cassandra) to Blob storage using the Azure Data Factory and python.
Validated results and created business reports using Power BI reporting services
Verifying the schema change of source files and verifying duplicate files in the source location. Worked on creating a query parser script in python.
Setup Snowflake Stage and Snowpipe for continuous loading of data from S3 buckets into landing table.
Worked on Power Shell scripts to automate the Azure cloud system creation of Resource groups, Web Applications, Azure Storage Blobs & Tables, and firewall rules.
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from the container whenever the function executes.
Designed and deployed data pipelines using Data Lake, Databricks, and Apache Airflow.
Developed Spark applications in Azure Databricks using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns.
Migration of on premise data (Oracle/ Teradata) to Azure Data Lake Store(ADLS) using Azure DataFactory(ADF V1/V2).
Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Databricks.
Created and provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
Develop Snowflake Procedure to perform transformations, load the data into target table and purge the stage tables.
Designed and optimized data models and schemas within Palantir AIP to support complex analytical queries and reporting requirements.
Documented technical designs, configurations, and best practices for Palantir AIP implementations to facilitate knowledge sharing and onboarding of new team members.
Created several Databricks Spark jobs with PySpark to perform several tables-to-table operations.
Developed custom monitoring and alerting solutions using Azure Monitor and Snowflake Query Performance Monitoring (QPM) for proactive identification and resolution of performance issues.
Integrated Snowflake with Power BI and Azure Analysis Services for creating interactive dashboards and reports, enabling self-service analytics for business users.
Created pipelines in Azure using ADF to get the data from different source systems and transform thedata by using many activities.
Optimized data pipelines and Spark jobs in Azure Databricks for improved performance, including tuning of Spark configurations, caching, and leveraging data partitioning techniques.
Implemented data cataloging and data lineage solutions using tools like Azure Purview and Apache Atlas to provide a comprehensive understanding of data assets and their relationships.
Responsible for end-to-end deployment of the project that involved Data Analysis, Data Pipelining, Data Modelling, Data Reporting, and Data documentation as per the business needs.
Experienced in leveraging data visualization tools like Power BI for the end customer.
Performance analysis and fixing issues for Spark Jobs to optimize the execution time to reduce the cost of execution resources.
Create views in Snowflake to support reporting requirements.
Collaborated with cross-functional teams including data scientists, data analysts, and business stakeholders to understand data requirements and deliver scalable and reliable data solutions.

Environment: Azure HDInsight, Databricks, Data Lake, Cosmos DB, MySQL, Azure SQL, Spark SQL, Snowflake, Cassandra, Power BI, Blob Storage, Data Factory, Python, PySpark.

British American Tobacco
Azure Data Engineer Sep 2019 to June 2022
Responsibilities
Implemented end-to-end data pipelines using Azure Data Factory to extract, transform, and load (ETL) data from diverse sources into Azure Synapse Analytics.
Designed and implemented data processing workflows using Azure Databricks, leveraging Spark for large-scale data transformations.
Built scalable and optimized schemas, tables, and views to support complex analytics queries and reporting requirements.
Perform Snowflake administration tasks like resource monitor, Access Control, Monitor Credits and Data Usage and Setting up virtual warehouses, Creating Stages, Create Schema and Tables, File Formats, Users etc.
Developed data ingestion pipelines using Azure Event Hubs and Azure Functions to enable real-time data streaming into Azure Synapse Analytics.
Leveraged Azure Data Lake Storage as a data lake for storing raw and processed data, implementing data partitioning and data retention strategies.
Utilized Azure Blob Storage for efficient storage and retrieval of data files, implementing compression and encryption techniques to optimize storage costs and data security.
Integrated Azure Data Factory with Azure Logic Apps for orchestrating complex data workflows and triggering actions based on specific events.
Implemented data governance practices and data quality checks using Azure Data Factory and Azure Synapse Analytics, ensuring data accuracy and consistency.
Implemented data replication and synchronization strategies between Azure Synapse Analytics and other data platforms using Azure Data Factory and Change Data Capture techniques.
Developed and deployed Azure Functions for data preprocessing, data enrichment, and data validation tasks in data pipelines.
Working with JIRA to report on Projects, and creating sub tasks for Development, QA, and Partner validation.
Experience in full breadth of Agile ceremonies, from daily stand-ups to internationally coordinated PI Planning.
Developed interactive reports and dashboards to utilize data for interactive Power BI dashboards based on business requirements.


Environment: Azure Blob Storage, ADLS Gen2, Databricks, Azure Event Hubs, Azure Function Apps, Azure Data Factory, Azure Synapse Analytics, Python, SQL, Azure SQL, JIRA, Power BI,



Big Data Engineer
Client: WellCare, Tampa, Florida April 2016 to Aug 2019
Responsibilities
Spearheaded the integration of diverse healthcare data sources using Hadoop and Spark, ensuring seamless data flow and consolidation for improved analytics.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS, Apache Hive, HBase, Apache Spark, Zookeeper, and Kafka.
Experienced in data warehousing, requirement-driven data modeling, data-modeling techniques, scalable database programming.
Extensively worked on creating combiners, partitioning, distributed cache to improve the performance of MapReduce jobs.
Research and troubleshoot data quality issues, provide fixes and propose both short-term and long-term solutions by monitoring and improving the front-end performance.
Designed and implemented scalable data architectures to accommodate the growing volumes of healthcare data, optimizing storage and processing capabilities.
Develop and maintain scalable data pipelines and build out new API integrations to support continuing increases in data volume and complexity.
Implemented performance tuning strategies with Spark and SQL to enhance data processing efficiency, enabling faster analytics and reporting for WellCare Health Plans.
Ensured robust data security measures and compliance with healthcare regulations, safeguarding sensitive patient information and maintaining the integrity of healthcare data.
Collaborated with healthcare professionals and data scientists to understand data requirements, providing technical solutions that align with WellCare's healthcare analytics goals.
Implemented cost-effective storage solutions within the Hadoop ecosystem, optimizing data storage and retrieval costs for WellCare Health Plans.
Analyzed the SQL scripts and designed the solution to implement using PySpark.
Extracted the data from other data sources into HDFS using Kafka.
Extracted the data from MySQL into HDFS using Kafka.
Applied performance tuning techniques to Spark and SQL jobs, ensuring optimal processing of healthcare data and timely delivery of insights.
Established robust data backup and recovery mechanisms within the data infrastructure, ensuring the availability and reliability of healthcare data.
Stayed abreast of emerging technologies in big data and healthcare analytics, proactively evaluating and integrating new tools to enhance WellCare's data processing capabilities.
Utilized Jira as a central platform for task management and collaboration within the data engineering team for all phases of development projects. This included engaging with external users on both business and technical requirements.
Engaged in sprint planning sessions within the Agile framework, breaking down larger data engineering projects into manageable tasks. Utilized Jira to organize and prioritize these tasks, fostering an iterative development process that aligns with the dynamic needs of WellCare Health Plans.
Utilized version control systems like Bitbucket for efficient code management and collaborated effectively with cross-functional teams using collaborative tools.
Collaborated with the data engineering team for all phases of larger and more-complex development projects and engages with external users on business and technical requirements.
Environment: Hadoop, MapReduce, Hive, HDFS, Kafka, Zookeeper, Python, Spark, PySpark, SQL, MySQL, MS SQL Server 2012/2014/2016/2017, Agile, Jira, Bitbucket.

Chevron Corporation, Santa Rosa, NM Oct 2013 to Mar 2016
Big Data Engineer
Responsibilities
Utilized Sqoop to import data from MySQL to Hadoop Distributed File System (HDFS) on a regular basis, ensuring seamless data integration.
Created and optimized data pipelines using Python and Shell scripting to ingest, transform, and store data from various sources.
Proactively identified and resolved issues related to data processing, storage, and performance to ensure smooth and uninterrupted data workflows.
Performed aggregations on massive datasets using Apache Spark, storing processed data in the Hive warehouse for subsequent analysis.
Created shell scripts for preprocessing and transforming raw data into structured formats suitable for analysis.
Utilized Python libraries including Pandas, NumPy, and Spark for data transformations, cleaning, and aggregation.
Worked extensively with Data Lakes and big data ecosystems (Hadoop, Spark, Hortonworks, Cloudera) for efficient data processing.
Stored raw and processed data in Hadoop Distributed File System (HDFS), organizing it effectively for streamlined data processing.
Built HBase tables by leveraging HBase integration with Hive, facilitating efficient storage and retrieval of data.
Used Unix shell scripting to schedule and manage data import/export jobs.
Applied Kafka and Spark Streaming for processing streaming data, enabling real-time data analysis and insights generation.
Designed and implemented a data pipeline using Kafka, Spark, and Hive, ensuring seamless data ingestion, transformation, and analysis.
Implemented CI/CD pipelines for building and deploying projects in the Hadoop environment, ensuring streamlined development processes.
Used JIRA for managing project workflows, tracking issues, and collaborating effectively with cross-functional teams.
Worked with Spark using Python (Pyspark) and Spark SQL for faster data testing and processing, enabling efficient data analysis and insights generation.
Utilized Zookeeper for coordination, synchronization, and serialization of servers within clusters. Employed Oozie workflow engine for job scheduling, ensuring efficient and reliable distributed data processing.
Used Unix shell scripting to manage and maintain big data clusters, including provisioning, scaling, and configuration management.
Leveraged Git as a version control tool to maintain code repositories, ensuring efficient collaboration, version tracking, and code management.

Environment: Hadoop, HDFS, Hive, Python, SQL, Kafka, HBase, Spark, PySpark, Kafka, Zookeeper, Hortonworks, Cloudera, Unix, JIRA Git, GitHub.


Wipro, Hyderabad
Client: Apollo Hospitals Nov 2012 - Sep 2013
Hadoop Developer

Responsibilities
Designed and implemented Hadoop-based data warehouses for Apollo Health Care, enabling efficient storage and retrieval of patient records and medical data.
Implemented cost-effective storage solutions within the Hadoop ecosystem, optimizing data storage and retrieval costs for Apollo Health Care.
Established robust data backup and recovery mechanisms within the Hadoop infrastructure, ensuring the availability and reliability of healthcare data for Apollo Health Care.
Performance tuning experience with spark /MapReduce or SQL jobs.
Successfully scaled Hadoop clusters at Apollo Health Care to handle increasing volumes of healthcare data, ensuring optimal performance and responsiveness.
Implemented performance tuning strategies to enhance data processing and analytics capabilities.
Experience using Source Code and Version Control systems like SVN, Git, Bit Bucket etc.
Collaborated with Cross-functional teams to understand their requirements and provided effective technical solutions.

Environment: Apache Spark, Apache Hadoop, Hive, SQL, MS SQL Server 2012, Shell Scripting, Git, GitHub.
Keywords: continuous integration continuous deployment quality analyst machine learning business intelligence sthree database information technology microsoft procedural language New Mexico

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];2517
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: