Resume View

Home

Druvana - Azure Snowflake Data Engineer

Location: , , USA

Relocation: Yes

Visa: Green Card

Druvana Battula
Azure Snowflake Data Engineer
Email: druvana.4@gmail.com/rio.roy@coderversion.com Mobile: +1 (860)-499-3013

PROFESSIONAL SUMMARY:
Having 10+ years of experience in the IT industry contributing and implementing Data Integration, Data transformation and Data Consolidation procedures, currently working as an Azure Data Engineer.
Experience in data conversion and migration from legacy systems to modern cloud-based platforms like Salesforce, Snowflake, and Azure Synapse.
Good at using modern data tools and cloud systems like Azure to make business processes better.
Proficient with Python, SQL, and R Programming.
Strong knowledge and experience in Azure Data Factory, Azure Synapse Analytics, Azure Databricks for batch data processing.
Strong knowledge and experience in Azure Stream Analytics, Azure Databricks for real-time data processing.
Worked with real time data streaming platforms Apache Kafka and Azure Event Hubs.
Good experience working with Azure Data Lake Storage Gen2, Azure BLOB Storage, Azure Data Lake Analytics.
Good experience working with Relational Databases such as MS SQL Server 2016, MySQL, PostgreSQL, Azure SQL Database.
Hands-on experience in writing T-SQL queries, procedures, views, indexes and worked on SQL query optimization and tuning.
Good Experience working with Data Warehouses such as Snowflake, Dedicated SQL Pool, Amazon Redshift.
Strong knowledge and hands-on experience working with Serverless and Dedicated SQL Pool within the Azure Synapse Analytics.
Involved in the process of creation of Security groups in the Azure Active Directory.
Worked with Azure key Vault secrets and certificates ensuring the security as per the organizational standards.
Worked with Azure Logic apps to automate workflows connecting various services and systems.
I understand and use Microsoft Azure services like File Storages, Databases, Incremental Loads, and Multi dependency trigger file pipelines. I ve migrated data from on-premise to cloud, loaded data from Snowflake, AWS S3, and REST API, and handled Data Quality Checks, SCD Type 1 and Type 2 with Data Flows, plus External tables, Synapse Notebooks, and Logging web services for fast Big Data processing.
I ve set up integration from data sources like File Storages, RDBMS, Spreadsheets, S3 bucket, and Data lakes.
I ve designed ETL workflows to combine structured and semi-structured data from Azure, AWS, and GCP platforms.
Built scalable data pipelines using Azure Data Factory, Kafka, and DBT to ensure smooth data ingestion across multiple cloud environments.
Experienced in orchestrating and automating complex ETL workflows using Apache Airflow, improving efficiency and monitoring of data pipelines.
Implemented pre-processing and transformations using Azure Databricks, Azure Data Factory (Dataflows).
Experience in working with Databricks and different components in Spark, Git, SQL and HDFS.
Experience working with version control system Git and web-based GitHub and Azure DevOps.
In depth knowledge of Software Development Life Cycle (SDLC) with thorough understanding of various phases such as requirements analysis/design and development.
Hands on experience performing ETL in Power Query Editor, Data Modelling, Reports/Dashboards creation in Power BI.
Knowledge of using Power BI Desktop and creating rich dashboards visually telling stories of the business I use Power BI Desktop to build dashboards that show clients their business status, strengths, weaknesses, and potential clearly.
I ve worked with Power BI visualizations like tree map, area chart, funnel chart, and custom visuals for interactive data analysis.
Experience with migration projects to Azure cloud and Azure architecture decision making to implement ETL and data movement solutions using Azure Data Factory (ADF)
Demonstrated experience in Enterprise Data warehouse design using Star Schema, and Snowflake schema dimensional models.
Hands-on experience using Jira, Azure DevOps Boards following agile methodology (SCRUM).
CERTIFICATION:
Microsoft Certified: Azure Data Engineer Associate (DP-203)
Microsoft Certified: Azure Data Fundamentals (DP-900)

TECHNICAL FORTE:
Database: SQL Server, HBase,Spark-Redis,CosmosDB,Oracle,MySQL,Postgres
SDLC: Agile, Scrum, Waterfall
Data Modeling: MS Visio, Power BI
Database Programming: T-SQL, Dynamic SQL, DAX
Reporting Tools: Microsoft Power BI (Power View, Power Pivot, Power Query)
Source Control & Collaboration Tool: Jira, Confluence, SharePoint, Git, GitHub, Azure DevOps, Jenkins
Cloud Technologies: MS Azure (IaaS, PaaS, SaaS), Azure SQL, Azure Data Warehouse Factory, PowerShell, Azure Storage, Snowflake, AWS S3, Salesforce Data Cloud.
Hadoop Core Services: HDFS, Map Reduce, Spark, Hadoop,YARN.
Hadoop Distribution: Cloudera Hortonworks, Apache Hadoop.
Data Services: Hive, Pig, Impala, Sqoop, Flume, Kafka.
Scheduling Tools: Zookeeper, Oozie.
Monitoring Tools: Cloudera Manager
Programming Languages: C, Scala, Python, R, SQL, PL/SQL
Build Tools: Jenkins, Maven

PROFESSIONAL EXPERIENCE:
Client: Cigna Healthcare, Bloomfield, Connecticut. Mar 2023 Till Date
Role: Azure Snowflake Data Engineer
In the healthcare domain, the project focused on patient data analytics at Cigna Healthcare. As an Azure Snowflake Data Engineer, the role involved integrating and processing large-scale ADT data to improve analytics, enhance patient insights, and optimize healthcare operations using Azure and Snowflake.
Responsibilities:
Handled ETL data pipelines from start to finish for patient (ADT) analytics using Azure services and Snowflake.
Led data conversion activities from legacy systems into Snowflake and Azure Synapse, ensuring accurate data mapping and validation.
Led the integration of diverse data sources, including patient data and demographics, using Azure Data Factory to efficiently collect and aggregate information.
Contributed to improving data pipeline stability and integrity by actively collaborating on ETL tasks and implementing robust error-handling mechanisms.
Used Snowflake to store and manage different data types, keeping it scalable and easy to pull for processing.
Developed federated queries to analyze data across BigQuery and Snowflake without duplication, enhancing cost efficiency.
Automated data pipelines and workflows with event-based triggers and scheduling mechanisms.
Designed and implemented Apache Airflow DAGs to orchestrate and monitor ETL workflows, improving pipeline observability and failure recovery.
Built secure data platforms with role-based access control to protect sensitive patient data.
Deployed containerized ETL workloads using Kubernetes, ensuring scalability and efficient resource management for high-volume data processing.
Identified and resolved performance bottlenecks in data processing and storage layers, optimizing query execution and reducing data latency.
Designed and implemented Alteryx workflows to automate complex ETL processes, improving data transformation efficiency for patient analytics.
Implemented partitioning, indexing, and caching strategies in Snowflake, Snowpark and Azure services for enhanced query performance.
Optimized SQL queries and performance tuning in Snowflake and Synapse, reducing execution time.
Developed a CI/CD framework for data pipelines using Jenkins, collaborating with DevOps engineers for automation.
Utilized SQL queries, including DDL, DML, and various database objects, for data manipulation and retrieval.
Implemented data cleaning and transformation using Azure Data Factory's built-in capabilities, addressing issues like duplicates, and ensuring data integrity.
Integrated on-premises databases (MySQL, Cassandra) and cloud-based solutions (Blob storage, Azure SQL DB) using Azure Data Factory, applying transformations, and loading data into Snowflake.
Designed and implemented data integration workflows to unify structured and semi-structured data from multiple cloud platforms, ensuring seamless interoperability.
Integrated Salesforce Data Cloud with Snowflake to unify customer data, enabling advanced analytics and reporting.
Used Azure Machine Learning and Snowflake to study patient behavior, spotting age and interest patterns to guide decisions.
Designed and optimized data ingestion pipelines for Salesforce Data Cloud using Azure Data Factory, ensuring seamless data synchronization for real-time insights.
Built and optimized data models and schemas using technologies like Apache Hive, Apache HBase, or Snowflake to support efficient data storage and retrieval for analytics and reporting purposes.
Collaborated with Azure Logic Apps administrators to monitor and resolve issues related to process automation and data processing pipelines.
Executed Hive scripts through Hive on Spark and Spark-SQL to address diverse data processing needs.
Utilized Kafka, Spark Streaming, and Hive to process streaming data, developing a robust data pipeline for ingestion, transformation, and analysis.
Configured Azure Stream Analytics to process live healthcare telemetry data, enabling real-time anomaly detection for patient vitals.
Utilized JIRA for project reporting and created sub-tasks for development, QA, and partner validation.
Worked with Agile methods, joining daily stand-ups, sprint planning, and reviews.

Environment: Azure Databricks, Azure Data Factory, Azure Logic Apps, Functional App, Snowflake, Snowflake Schema, MySQL, Azure SQL Database, HDFS, MapReduce, YARN, Apache Spark, Apache Hive, SQL, Python, Scala, PySpark, Shell Scripting, GIT, JIRA, Jenkins, Apache Kafka, Azure Machine Learning, Azure Stream Analytics, Azure Personalizer, Power BI, Tableau, Apache HBase, Azure DevOps, Kubernetes, Salesforce Data Cloud, Azure Logic Apps.

Client: Nike, Beaverton, Oregon. July 2021 to Feb 2023
Role: Azure Data Engineer
In the retail domain, the project focused on data engineering and analytics at Nike. As an Azure Data Engineer, I designed and optimized ETL pipelines using Azure Data Factory, Databricks, and Snowflake to streamline data processing, enhance data quality, and enable real-time analytics for better business insights and decision-making.
Responsibilities:
Designed and executed data processing workflows in Azure Databricks, leveraging Spark for large-scale data transformations.
Constructed scalable Snowflake schemas using Snowpark, tables, and views to support complex analytics queries.
Achieved proficiency in Azure Data Factory, Databricks, ADLS Gen2, Delta Lake, and other Azure services.
Developed data ingestion pipelines using Azure Event Hubs and Azure Functions to enable real-time data streaming into Snowflake.
Developed and automated ETL/ELT pipelines in Azure Data Factory and Databricks to support large-scale data migration and transformation.
Leveraged Azure Data Lake Storage as a data lake for storing raw and processed data, implementing data partitioning and data retention strategies.
Utilized Azure Blob Storage for efficient storage and retrieval of data files, implementing compression and encryption techniques to optimize storage costs and enhance data security.
Integrated Azure Data Factory with Azure Logic Apps for orchestrating complex data workflows and triggering actions based on specific events.
Implemented data governance practices and data quality checks using Azure Data Factory and Snowflake, ensuring data reliability and consistency.
Designed Alteryx-based data pipelines for aggregating sales and inventory data, reducing manual intervention and enhancing decision-making processes.
Developed DBT models for transforming raw retail sales data into analytics-ready tables, improving data consistency and reporting efficiency.
Integrated DBT with Snowflake and Azure Synapse to automate SQL transformations, enabling efficient version control and modular pipeline development.
Implemented data replication and synchronization strategies between Cosmos DB, Neo4J, Snowflake, and other data platforms using Azure Data Factory and Change Data Capture techniques.
Utilized Apache Airflow to schedule and manage automated data pipelines, reducing manual intervention and optimizing resource utilization.
Developed and deployed Azure Functions for data preprocessing, data enrichment, and data validation tasks within data pipelines.
Implemented DevOps practices by automating the deployment of data pipelines and infrastructure using Azure DevOps, enabling continuous integration and continuous delivery (CI/CD) for seamless updates and scalability of the data platform.
Developed custom monitoring and alerting solutions using Azure Monitor and Snowflake Query Performance Monitoring (QPM) from Snowpark for the proactive identification and resolution of performance issues.
Integrated Snowflake with Power BI and Azure Analysis Services for creating interactive dashboards and reports, enabling self-service analytics for business users.
Used BigQuery for ad-hoc analysis of retail sales and inventory data, enabling faster business decisions.
Integrated Amazon Redshift as a secondary data warehouse for running large-scale batch analytics on historical sales data.
Implemented best practices in data integrity, validation, and transformation, ensuring high-quality analytics-ready datasets.
Optimized query performance in Redshift by implementing distribution keys and sort keys, reducing processing time for complex queries.
Designed partitioned and clustered tables in BigQuery to optimize query performance on large datasets.
Optimized data pipelines and Spark jobs in Azure Databricks for improved performance, including tuning Spark configurations, caching, and leveraging data partitioning techniques.
Implemented data cataloguing and data lineage solutions using tools like Azure Purview and Apache Atlas to provide a comprehensive understanding of data assets and their relationships.
Mastered with consistency, in Hive queries by creating and querying HIVE tables to retrieve useful analytical information.

Environment: Azure Data Factory, Azure Logic Apps, Azure Databricks, Spark, Neo4J, Snowflake, Snowpark, COSMOS DB, Azure Event Hubs, Azure Functions, Snowflake schemas, Azure Data Lake Storage, HIVE, Azure Blob Storage, Power BI, Azure Analysis Services, Azure Machine Learning, Azure Purview, Apache Atlas.

Client: Barclays, Whippany, New Jersey. Feb 2019 to June 2021
Role: Big Data Developer
In the banking and financial services domain, the project focused on big data processing and ETL pipeline development at Barclays. As a Big Data Developer, I built and optimized data ingestion, transformation, and real-time analytics pipelines using Azure Data Factory, Spark, Hive, and Sqoop. My role involved processing large-scale financial data, enabling efficient reporting, and improving data accuracy for business insights and regulatory compliance.

Responsibilities:

Established an ETL framework employing ADF, Sqoop, Pig, and Hive to seamlessly extract data from diverse sources, ensuring its availability for consumption.
Collaborated with technical and business teams to establish robust data collection and analysis procedures.
Developed Spark and Scala-based ETL jobs for migrating data from Oracle to new MySQL tables.
Leveraged Spark (RDDs, Data Frames, Spark SQL) and Spark-Cassandra Connector APIs for tasks such as data migration and generating business reports.
Conducted a thorough analysis of source data, efficiently managed data type modifications, and utilized Excel sheets, flat files, and CSV files to generate ad-hoc reports in Power BI.
Implemented Slowly Changing Dimensions (SCD) strategies, handling updates seamlessly and enhancing data reliability.
Developed ETL workflows in Azure Data Factory to efficiently move historical banking data into BigQuery for analytical processing.
Implemented automated data validation and reconciliation mechanisms, ensuring consistency and reliability across integrated data sources.
Built Alteryx workflows for financial data reconciliation, identifying discrepancies in transaction records with improved precision.
Utilized BigQuery ML to generate predictive insights on customer transactions, improving fraud detection strategies.
Integrated HBase with Hadoop for efficient storage and retrieval of semi-structured financial data.
Developed and executed complex HiveQL queries to perform advanced analytics on structured and semi-structured financial datasets.
Implemented automation for deployments using YAML scripts, ensuring streamlined builds and releases.
Proficiently worked with Azure Data Factory, Azure Key Vault, Azure Function Apps, Azure Logic Apps, Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka, and Sqoop.
Leveraged Apache Hadoop s HDFS to store and process large financial datasets, ensuring high availability and fault tolerance.
Utilized YARN for resource management and optimized cluster utilization for improved performance.
Designed MapReduce jobs to aggregate customer transaction data, reducing processing time for compliance reports.
Configured and managed HDFS storage policies to optimize data placement and access efficiency.
Collaborated extensively on the creation of combiners, partitioning, and distributed cache to enhance the performance of MapReduce jobs.
Managed source code and enabled version control using Git and GitHub repositories.

Environment: Azure Data factory, Azure Logic Apps, Azure Data bricks, ETL, Sqoop, Pig, Hive, HDFS, Spark, Scala, Oracle, MySQL, RDDs, Data Frames, Informatica, Spark SQL, Spark-Cassandra Connector, Spark Streaming, Power BI, Spark, Excel, Flat Files, CSV, YAML, Git, GitHub, HBase, Zookeeper, Flume, Kafka, MapReduce, Data Classification Algorithms.

Client: State Farm, Bloomington, Illinois. Nov 2017 to Feb 2019
Role: Data Warehouse Developer

In the insurance domain, the project focused on data warehousing and business intelligence at State Farm. As a Data Warehouse Developer, I designed and maintained ETL pipelines, data marts, and reporting solutions using Azure Data Factory, SSIS, and Power BI. My role involved integrating and transforming diverse data sources to enhance data accessibility, automate reporting, and support business intelligence for decision-making.

Responsibilities:
Created and maintained databases for Server Inventory and Performance Inventory, Utilized Azure cloud services.
Worked in Agile Scrum Methodology, participating in daily stand-up meetings. Possess significant experience working with Visual SourceSafe for Visual Studio 2010 and utilized Trello for project tracking.
Generated Drill-through and Drill-down reports in Power BI, incorporating drop-down menu options, data sorting, and defining subtotals.
Utilized Data Warehouse for developing Data Mart, which feeds downstream reports. Developed a User Access Tool enabling users to create ad-hoc reports and run queries for data analysis in the proposed Cube.
Designed and implemented dimensional data models using Star and Snowflake schema to optimize query performance and support efficient reporting.
Created packages for transferring data between ORACLE, MS ACCESS, FLAT FILES, Excel Files, to SQL SERVER 2008R2 using SSIS.
Deployed SSIS Packages and established jobs for efficient package execution.
Possess expertise in creating ETL packages using Azure data factory, SSIS to extract, transform, and load data from heterogeneous databases into the data mart.
Experienced in building Cubes and Dimensions with various architectures and data sources for Business Intelligence.
Involved in creating SSIS jobs for automating report generation and cube refresh packages.
Proficient with SQL Server Reporting Services (SSRS) for authoring, managing, and delivering both paper-based and interactive Web-based reports.
Developed stored procedures and triggers to ensure consistent data entry into the database.

Environment: Azure Data factory, COSMOS DB, SQL, Server Inventory, Performance Inventory, Agile Scrum Methodology, Visual SourceSafe, Visual Studio 2010, Trello, Power BI, Data Warehouse, Informatica, Data Mart, User Access Tool, Cube, SSIS, ORACLE, MS ACCESS, FLAT FILES, Excel Files, SQL SERVER 2008R2, MDX Scripting, SSRS, Stored Procedures, Triggers, Snowflake.

Client: Symmetric Solutions, Bangalore, India. July 2013 to Sept 2016
Role: Data Warehouse Developer

The project focused on data warehousing and business intelligence at Symmetric Solutions. As a Data Warehouse Developer, I designed and optimized ETL data flows and dimensional models using SSIS and SQL Server to migrate and transform data from multiple sources. My role involved building SSAS cubes, generating complex reports, and implementing Slowly Changing Dimensions (SCD) to enhance data accuracy for business intelligence and analytics.

Responsibilities:
Expert in designing ETL data flows using SSIS, creating mappings/workflows to extract data from SQL Server, and performing Data Migration and Transformation from Access/Excel Sheets using SQL Server SSIS.
Efficient in Dimensional Data Modeling for Data Mart design, identifying Facts and Dimensions, and developing fact tables, dimension tables, using Slowly Changing Dimensions (SCD).
Experience in Error and Event Handling: Precedence Constraints, Break Points, Check Points, and Logging. Experienced in Building Cubes and Dimensions with different Architectures and Data Sources for Business Intelligence and writing MDX Scripting.
Thorough knowledge of Features, Structure, Attributes, Hierarchies, Star and Snowflake Schemas of Data Marts.
Good working knowledge of Developing SSAS Cubes, Aggregation, KPIs, Measures, Partitioning Cube, Data Mining Models, and Deploying and Processing SSAS objects.
Experience in creating Ad hoc reports and reports with complex formulas, as well as querying the database for Business Intelligence.
Expertise in developing Parameterized, Chart, Graph, Linked, Dashboard, Scorecards, and Reports on SSAS Cube using Drill-down, Drill-through, and Cascading reports using SSRS.
Flexible, enthusiastic, and project-oriented team player with excellent written, verbal communication, and leadership skills to develop creative solutions for challenging client needs.

Environment: MS SQL Server 2016, Visual Studio 2017/2019, SSIS, Share point, MS Access, Team Foundation server, Git.

Education:

Bachelors in Computer Science - Jawaharlal Nehru Technological University Hyderabad -2013
Keywords: cprogramm continuous integration continuous deployment quality analyst machine learning business intelligence sthree database active directory rlang information technology microsoft procedural language

To remove this resume please click here or send an email from rio.roy@coderversion.com to usjobs@nvoids.com with subject as "delete" (without inverted commas)

rio.roy@coderversion.com;5037

Enter the captcha code and we will send and email at rio.roy@coderversion.com
with a link to edit / delete this resume
Captcha Image: