Home

pavani - Azure Data Engineer
[email protected]
Location: Austin, Texas, USA
Relocation: Texas
Visa: GC
SUMMARY

10+ years of experience in interacting with business users to analyze the business process and requirements and transforming requirements into data warehouse design, documenting and rolling out the deliverables.
Experience in creating Spark applications using pySpark and Spark-SQL for data extraction, transformation and aggregation from multiple source systems (different file formats, databases) for analyzing & transforming the data to uncover insights into the customer usage patterns.
Created logic app workflow to read the data from SharePoint and store in the blob container
Having experience with code analysis, code management in Azure Databricks
Having experience in creating secrets and accessing key vaults for database, SharePoint credentials
Created pipelines, datasets, linked services in Azure Data Factory
Automated deployment using Data Factory's integration with Azure DevOps Pipelines
Implemented Master Data Management Solutions(MDM) using Databricks to consolidate and cleanse disparate data sources, improving data accuracy and consistency.
Created and optimized Databricks notebooks for data profiling, quality assessment and validation of master data, enhancing data governance practices.
Ensured data consistency and integrity for Master Data Management by implementing Databricks Delta, optimizing data reliability through ACID transactions and scalable metadata handling.
Configured and scheduled data pipelines using Databricks workflows
Having experience in creating physical and logical modeling
Having experience in creating stored procedures and SQL queries
Expert level skills in Extract Transform and Load tools (ETL) Data Warehouse.
In-depth Experience in Data Warehouse ETL Architecture and development of ETL using Azure Data Factory.
Experience with all stages of Project development life cycle across numerous platforms in a variety of industries.
Knowledge of different Schemas (Star and Snowflake) to fit reporting, query and business analysis requirements.
Experience with Production support for ETL processes and databases.
Experience in working with UNIX Shell Scripting, CRON, FTP and file management in various UNIX environments.
Experience in creating functional and technical specifications for ETL processes and designing architecture for ETL.
Hands-on experience in writing, testing and implementation of the Cursors, Procedures, Functions, Triggers, and Packages at Database level using PL/SQL.
Extensively used PL/SQL bulk load options, Collections, PL/SQL tables and V Arrays and RefCursors.
Experience in analyzing data using HiveQL, and pyspark programs
Completed comprehensive training in Azure Synapse Analytics, gaining expertise in data integration, data warehousing, and big data analytics.
Learned to design and implement complex ETL pipelines using Azure Synapse Data Flows and Azure Data Factory integration.
Explored how to connect Synapse with other Azure services like Azure Data Lake, Azure Blob Storage, and Azure Databricks.
Collected, cleaned, and transformed raw data into structured datasets, ensuring data analysts had accurate and ready-to-use data for analysis.
Proven ability to quickly learn and apply new technologies that translate requirements into client-focused solutions.
Excellent analytical, problem solving and communication skills.
Ability to multi-task in a fast-paced environment and to work independently or collaboratively.
Highly experienced and skilled Agile developer with a strong record of excellent teamwork and successful coding project management.











TECHNICAL SKILLS

Cloud Azure (Data Factory, Data Lake, DataBricks, Logic App, ARM, DevOps, Azure SQL)
Automation Tools Azure Logic App
Big Data PySpark, Hive
Code Repository Tools Azure DevOps
Database SQL Server Management Studio 18,Oracle 11g/10g, SQL Server 2000/2005
Database Tools SQL Navigator, TOAD, ERwin.
ETL Azure Data Factory(V2), Azure Databricks
Languages Python, Pandas, SQL, PL/SQL, UNIX Shell Script, Perl, C, C++
Operating System Windows 10, UNIX, LINUX


EDUCATION: Bachelor s in Information Technology from JNTUK





WORK EXPERIENCE

Fleetcor Technologies, Atlanta, GA June 2023 till date Azure Data Engineer
Responsibilities:
Created Pipelines in ADF using Linked Services/Datasets/Pipelines to Extract, Transform and Load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Ingest data from sources like SalesForce API using Azure Data Factory Pipelines.
Collected, cleaned and provided the data required for business analysis.
Developed and deployed the pipelines for various API calls.
Responsible for coordinating production migration activities and validating post production implementation
Involved in creating the pyspark DataFrames in Azure Databricks to read the data from Data Lake or from Blob storage and use Spark Sql context for transformation.
Worked on cloud POC to select the optimal cloud vendor based on a set of rigid success criteria.
Design, development and implementation of performant ETL pipelines using PySpark and Azure Data Factory
Integration of data storage solutions in spark especially with Azure Data Lake storage and Blob storage.
Performance tuning of Hive and Spark jobs.
Developed Hive scripts from Teradata SQL scripts to process data in Hadoop.
Created Hive tables to store the processed results and written Hive scripts to transform and aggregate the disparate data.
Worked on production support for monitoring/troubleshooting/bugfixes of data pipelines.
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Analyzed large data sets using Hive queries for Structured data, unstructured and semi-structured data.
Worked with structured data in Hive to improve performance by various advanced techniques like Bucketing, Partitioning, and Optimizing self joins.
Written and used complex data type in storing and retrieving data using HQL in Hive.

Environment: Azure Blob Storage, Azure Data Factory(V2), Azure Data Lake Gen2, Azure SQL,Spark 3.0,Databricks Workflows, Hive.


Humana, Louisville, KY Jan 2023 June 2023 Azure Site Reliability Engineer
Responsibilities:
Respond quickly to incidents, minimize downtime, and perform root cause analysis.
Establish and maintain an on-call rotation to handle emergencies.
Establish metrics, logs, and traces to gain insights into system behavior.
Work closely with developers to ensure that new features meet reliability standards.
Participate in design reviews to anticipate potential operational issues.
Responsible for coordinating production migration activities and validating post production implementation for Producer Services

Environment: Azure Blob Storage, Service Bus, Service Now, Azure SQL, Azure DevOps

Mosaic, Tampa, FL Oct 2020 Dec 2022
Azure Data Engineer
Description: The MIC utilizes the ADF, Azure Logic Apps as the primary interface that provides an online platform to securely connect to the variety of data sources and bring data from those sources into MIC for batch data processing.
Responsibilities:
Provided user interface to enter their data using the Azure Databricks Widgets to Create/Update/Search/Retrieve/display the Vendor Data.
Process Campaign data using the main notebook and invoke subnotebooks by reading from Delta table.
Developed PySpark scripts to ingest data from JSON/CSV files to Delta lake
Worked on troubleshooting and performance tuning of Spark Jobs.
Deployed Azure Databricks notebooks to Azure Devops Git Repos and Created build and release pipelines for CI/CD.
ADF jobs are executed using a trigger that can be set to run on a specific schedule
The jobs are monitored for MIC for any Interface Failures like Source Access Failures and troubleshoot for the root cause
The jobs are restarted manually depending on the cause of the failure
The jobs are monitored for MIC for any Interface Failures like Source Access Failures and troubleshoot for the root cause
The jobs are restarted manually depending on the cause of the failure

Environment: Azure Logic App, Azure Blob Storage, Azure Data Factory(V2), Azure Data Lake Gen2, Azure SQL, Azure DevOps.


Comcast, Sterling, VA Apr 2019 Sep 2020
Data Engineer
Description: Focus is to improve customer satisfaction and improve business outcomes. Analyses, Processes Web analytics UI logs to find out where the customers/agents are facing issues
Responsibilities:
Having experience in using GET and POST requests to get JSON data from Elasticsearch
Having experience in parsing nested json using Pandas and get the data using GET and POST methods by hitting Elasticsearch and validated by comparing with Kibana dashboards
Having Experience in parsing nested JSON documents using Python 3 and load data to S3 then ThoughtSpot
Converted the data frame from wide to long and vice versa using Pandas
Implemented various functions in NumPy and Pandas for mathematical operations and arrays.
Communicated with different teams to understand and report UI errors
Having experience in creating worksheets and pinboards in ThoughtSpot
Responsible for processing and analyzing UI logs using Pandas and Python
Experience in doing performance tuning of long running Python script

Environment: Jupyter tools, ElasticSearch 7.9.1, Kibana,Python 3, Pandas,AWS S3,ThoughtSpot


Nokia, San Jose, CA Sep 2017 Mar 2019
Azure Data Engineer
Description: This Project mainly focuses on a single data warehouse to consolidate all the data sources that the NSW functional groups use to compute and standardize KPIs. Azure Data Factory(V2) & PySpark Databricks are used as ETL tools to increase speed of information access.
Responsibilities:
Involved in creating specifications for ETL processes, finalized requirements and prepared specification documents
Experience in creating and installing packages on cluster using Databricks
Experience in using Azure Databricks notebooks to clean, transform the data and load to Azure SQL staging tables using JDBC connector
Design, develop Data Factory pipelines to transform Excel, SharePoint Lists, JSON, Avro formats load into Azure SQL server.
Build pySpark scripts for transforming data and load to Azure SQL server staging table.
Created Scope and Access Tokens on Databricks to make pySpark scripts connect to Key Vaults.
Created ODBC/JDBC connections to connect Pyspark scripts to Azure SQL Database.
Experience in creating and loading data into Hive tables with appropriate static and dynamic partitions, intended for efficiency.
Implemented various frameworks like Data Quality Analysis, Data Validation and Data Profiling with the help of technologies like Big data, PySpark, Pandas with database like Azure SQL server
Experience managing Azure Blob Storage, Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight/Databricks
Developed ETL jobs using PySpark to migrate data from MYSQL server to Azure SQL DB.
Create Reusable ADF pipelines to call REST APIs.
Responsible for unit testing and in creating detailed Unit Test Document with all possible Test cases/Scripts
Used Azure Devops for scheduling ADF jobs and used Logic Apps for scheduling ADF pipelines
Worked with structured and unstructured data in Pyspark
Developed Pyspark scripts to integrate the data flow between SharePoint, SharePoint lists and Azure SQL Server.

Environment: Azure Logic App, Azure Blob Storage, Azure Databricks, Azure Data Factory(V2), Azure Data Lake Gen2, Azure DevOps, Azure SQL.
Keywords: cprogramm cplusplus continuous integration continuous deployment user interface sthree database active directory procedural language California Florida Georgia Kentucky Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];4343
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: