Data Engineer - Sr . Azure Data engineer |
[email protected] |
Location: Remote, Remote, USA |
Relocation: Yes |
Visa: GC |
Spandana Vunnam
Azure Data Engineer Email: [email protected] Mobile: (972) 292-8824 SHORT SUMMARY: Experienced Azure Data Engineer with 10+ years in the IT industry, including 6+ years in Data Engineering and 4 years in Data Warehousing. Specialized in Data Integration, Transformation, and Consolidation of diverse datasets. Proficient in Python, SQL, and Scala Programming, with strong expertise in Azure Databricks, Azure Data Factory, Synapse Analytics, and Databricks for batch and real-time data processing. Skilled in working with data streaming platforms like Apache Kafka and Azure Event Hubs. Hands-on experience with Spark notebooks, Azure Data Lake Storage, Azure Purview, BLOB Storage, and relational databases. In-depth knowledge of ETL processes, Power BI for data visualization, and migration projects to Azure. Adept at leveraging Azure services and Snowflake to design and implement scalable data solutions, ensuring data accuracy and integrity throughout the pipeline. PROFESSIONAL SUMMARY: Having 10+ years of experience in the IT industry contributing and implementing Data Integration, Data transformation and Data Consolidation procedures, currently working as an Azure Data Engineer. Proficient with Python, SQL, and Scala Programming and hands-on experience with source systems. Strong knowledge and experience in Azure Data Factory, Azure Synapse Analytics, Azure Databricks for batch data processing. Developed a comprehensive strategy for leveraging ADF to streamline data integration processes and enhance reporting capabilities. Strong knowledge and experience in Azure Stream Analytics, Azure Databricks for real-time data processing. Worked with real time data streaming platforms Apache Kafka and Azure Event Hubs. Good experience working with Azure Data Lake Storage Gen2, Azure BLOB Storage, Azure Data Lake Analytics. Involved in the process of creation of Security groups in the Azure Active Directory and leveraged Terraform for Infrastructure as Code (IaC) to automate provisioning, deployment, and management of Azure resources. Worked with Azure key Vault secrets and certificates ensuring the security as per the organizational standards. Worked with Azure Logic apps to automate workflows connecting various services and systems. Strong theoretical background and hands-on experience in Microsoft Azure service like File Storages, Databases, Incremental Loads, Multi dependency trigger file pipeline, Migrate Data from on premise to Cloud, Loading Data from (Snowflake, REST API). Implemented pre-processing and transformations using Azure Databricks, Azure Data Factory (Dataflows). Experience with migration projects to Azure cloud and azure architecture decision making to implement ETL and data movement solutions using Azure Data Factory (ADF). Developed and optimized large-scale data processing pipelines using PySpark for ETL processes on Azure Databricks and Implemented data transformation and aggregation tasks using PySpark DataFrames and SQL APIs. Developed and optimized Kafka topics, partitions, and replication strategies and utilized Spark applications using PySpark to process terabytes of data. Designed and implemented conceptual, logical, and physical data models for complex databases, ensuring data integrity and optimization. Designed and implemented data transformation workflows using DBT, enhancing data pipeline efficiency and accuracy. Proficient in Hadoop, Spark, Cloudera, Hive, Sqoop, Flume, and Kafka for customer behavioural data analysis. Comprehensive experience in real-time Big Data technologies, including Hadoop, Spark, Flink, HBase, Hive, RDDs, Data Frames and Cassandra migration. Developed and implemented automated unit tests to ensure code quality and functionality and utilized PyTest and other testing frameworks to create robust test suites. Performed data profiling, cleansing, and enrichment using Informatica MDM to enhance data quality. Performed Data Quality Checks using Data Flows, Implementing SCD Type 1 and Type 2 using Data Flows, External tables, Spark and Synapse, Logging web services which provides fast and efficient processing of Big Data. Implemented integration from various data sources like Oracle cloud, File Storages, RDBMS, Spreadsheets, Data lakes. Good Experience working with Data Warehouses such as Snowflake, Dedicated SQL Pool. Demonstrated experience in Enterprise Data warehouse design using Star Schema, Snowflake schema dimensional models and Utilized Snowpipe to automate the continuous loading of data into Snowflake tables from Azure Blob Storage. Good experience working with Relational Databases such as MS SQL Server 2016, MySQL, PostgreSQL, Azure SQL Database. Applied Data Vault 2.0 methodology to create scalable, auditable, and adaptable data warehouses, supporting evolving business requirements. Developed and optimized complex stored procedures in SQL Server, MySQL, or Oracle databases to improve database performance and efficiency. Hands on experience performing ETL in Power Query Editor, Data Modelling, Reports/Dashboards creation in Power BI. Utilized different visualization tools such as tree maps, area chart, funnel chart and imported custom visuals in Power BI for interactive data analysis. Experienced in working with Databricks and different components in Spark, Git, SQL and HDFS. Experienced in working with version control system Git and web-based GitHub. In depth knowledge of Software Development Life Cycle (SDLC) with thorough understanding of various phases such as requirements analysis/design and development. Hands-on experience using Jira, Azure DevOps Boards following agile methodology (SCRUM). CERTIFICATION: Microsoft Certified: Azure Data Engineer Associate (DP-203) TECHNICAL SKILLS: Cloud Services MS Azure (IaaS, PaaS, SaaS), Azure SQL, Azure Databricks, Azure Purview, Azure Data Factory, Azure Key vault, Azure Logic Apps, Azure Event hubs, Azure Blob Storage, Snowflake, SnowSQL. Big Data Technologies PySpark, Scala, MapReduce, Hive, Apache Flink, Apache Spark, Hive, Impala, Kafka, Zookeeper, Oozie, Cloudera, HBASE. Hadoop Distribution Cloudera, Horton Works, Apache Hadoop. Programming Languages Python, SQL, Scala, PL/SQL. Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS. SDLC Agile, Scrum, Waterfall, Trello. Source Control & Collaboration Tool Jira, Confluence, SharePoint, Git, GitHub, Azure DevOps. Build Tools Jenkins, Maven. Databases MS SQL Server 2018/2016/2008, Azure SQL DB, Azure Synapse, MS Access, Oracle 11g/12c, Cosmos DB, PostgreSQL, MongoDB, T-SQL. Other Tools Power BI, Tableau, DBT, Unit test, Data Vault, Terraform. PROFESSIONAL EXPERIENCE: Client: Penske Logistics, Tampa, FL Sep 2022 Till Date Role: Azure Snowflake Data Engineer Responsibilities: Developed and implemented of end-to-end data pipelines using copy activity, lookup activity in Azure Data Factory by seamlessly extracting, transforming, and loading diverse data sources into Azure Synapse, Oracle, Cosmos and Snowflake to optimize logistics and transportation operations. Implemented strategic data processing workflows in Azure Databricks, harnessing Spark's capabilities to execute large-scale data transformations, contributing significantly to operational efficiency in managing package tracking, shipment details, and supply chain data. Implemented scalable and optimized Snowflake and Azure Synapse Schemas tables, and views, catering to complex reporting requirements and analytics queries, ultimately improving decision-making processes across the logistics and transportation landscape. Contributed to a 25% improvement in data pipeline stability and integrity by actively collaborating on ETL tasks and implementing robust error-handling mechanisms. Created and maintained dimensional models (star and snowflake schemas) to enhance data warehousing and reporting efficiency. Developed and maintained Terraform scripts to automate the creation and management of Azure resources such as virtual networks, storage accounts, and SQL databases Implemented Relational Database Modeling techniques to structure and organize data across various database environments, including Microsoft SQL Server and Azure SQL Database. Optimized data pipelines and Spark jobs in Azure Databricks using PySpark and SparkSQL for enhanced performance, including tuning Spark configurations, caching, and leveraging data partitioning techniques to support the efficient processing of large datasets. Implemented data quality checks and validation processes in PySpark to ensure the integrity and accuracy of data and Implemented Scala and Python scripts for data extraction from various sources, including REST APIs and Azure SQL Database. Designed data ingestion pipelines utilizing Azure Event Hubs and Azure Functions, facilitating real-time data streaming into Azure Synapse for timely insights into package tracking and shipment information. Automated testing of ETL processes in Azure Data Factory and Azure Databricks environments and used mocking and stubbing techniques to simulate data scenarios and validate transformations. Used Azure Data Lake Storage for efficient storage of raw and processed data, applying data partitioning and retention strategies to enhance data management and support the company's daily operations. Leveraged Azure Blob Storage for streamlined storage and retrieval of data files, implementing compression and encryption techniques to optimize costs and enhance data security. Integrated Azure Data Factory with Azure Logic Apps and to orchestrate complex data workflows, triggering actions based on specific events and significantly improving customer service through enhanced package tracking and communication. Utilized Microsoft Visual Studio for database development, debugging, and performance tuning in SQL Server environments. Integrated Delta Lake with Azure Data Lake Storage (ADLS) to provide a scalable and secure data storage solution Implemented data replication and synchronization strategies between Azure Synapse, Snowflake, and other data platforms, utilizing Azure Data Factory and Change Data Capture (CDC) techniques to maintain data integrity. Leveraged capabilities of Snow pipe to process semi-structured data formats such as JSON and Parquet, enabling efficient storage and analysis of complex data structures in Snowflake. Leveraged Azure Event Grid to trigger Snowpipe data loading upon file arrival in Azure storage, enhancing data processing workflows. Implemented notification feature of Snow pipe to trigger downstream data processing tasks in Azure Databricks and Azure Data Factory, enabling timely insights and decision-making in the logistics and transportation domain. Developed and deployed Azure Functions for data preprocessing, enrichment, and validation tasks within data pipelines, contributing to increased operational efficiency in managing diverse datasets. Executed data archiving and retention strategies utilizing Azure Blob Storage and Azure Synapse, optimizing operational efficiency in managing historical data, troubleshooted and resolved issues related to Terraform deployments. Implemented Slowly Changing Dimension (SCD) and Change Data Capture (CDC) techniques to integrate these workflows to manage historical and incremental changes in customer data effectively. Established custom monitoring and alerting solutions using Azure Monitor and Azure Synapse Query Performance Monitoring (QPM), ensuring proactive identification and resolution of performance issues. Integrated Azure Synapse and Snowflake with Power BI and Azure Analysis Services to create interactive dashboards and reports, empowering business users with self-service analytics enhancing Operational Efficiency, Customer Service and Decision-Making. Utilized JIRA for issue and project workflow management. And employed Git as a version control tool to maintain the code repository. Collaborated with Devops engineers to establish automated CI/CD pipelines aligning with client requirements. Exhibited excellent time management abilities, regularly meeting project deadlines through workflow optimization, Agile methodology application, and active participation in planning and daily stand-ups. Environment: Azure Databricks, Delta Lake, Data Factory, Azure Synapse, Logic Apps, Terraform, Airflow, Azure Data Lake Storage, Blob Storage, Snowflake, Snowpipe, Functional App, MS SQL, Oracle, Informatica, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power BI. Client: Citigroup Inc, Dallas, TX July 2021 to Aug 2022 Role: Azure Data Engineer with Snowflake Responsibilities: Implemented end-to-end data pipelines for a fraud detection system, utilizing Azure Databricks, Azure Data Factory (ADF), and Logic Apps. Utilized ETL processes to gather information from diverse sources and feed it into a centralized fraud detection platform, adhering to Data Vault 2.0 methodology for flexible and scalable data integration. Designed and implemented fraud detection workflows using a scalable data processing framework, incorporating optimized data models to support complex analytics queries, and reporting requirements specific to fraud analytics. Leveraged MS SQL, Oracle, and other relevant databases for efficient data storage and retrieval. Implemented a secure and compliant for storing raw and processed data, utilizing Azure Data Lake Storage, HDFS, and incorporating partitioning and retention strategies aligned with fraud detection regulations. Integrated ADF Pipelines for data ingestion, enabling real-time data streaming into the fraud detection system. Implemented functions within lookup activity code implementation for complex business logic and data transformations, including decoding, mapping, filtering, and reformatting data. Integrated the fraud detection system with Azure Logic Apps for orchestration, managing complex data workflows, and triggering actions based on specific fraud-related events. Implemented robust data governance practices and quality checks using MS SQL, Oracle, and other relevant databases to ensure the accuracy and consistency of data within the fraud detection system. Conducted regular audits and performance tuning of DBT models to ensure optimal performance in Azure SQL Database and Synapse Analytics. Designed and deployed functions using Python, Scala, and PySpark specifically tailored for data preprocessing, enrichment, and validation tasks within the fraud detection pipelines. Implemented robust and scalable HDFS cluster architecture and data lake solutions supporting batch and real-time processing. Optimized data pipelines for improved performance, incorporating tuning techniques specific to Azure Databricks and Spark. Implemented Hive for data warehousing. Maintained and updated test cases to adapt to evolving data requirements and system enhancements and Documented test procedures and results to facilitate knowledge sharing and troubleshooting. Developed ETL processes in to synchronize changes from operational systems to the analytics store supporting different Slowly Changing Dimensions (SCDs) strategies. Implemented facts and dimensions tables to understand the principles of dimensional modelling, enabled simpler queries and reporting for business users. Proficient in loading data into Snowflake from various sources, including structured and semi-structured data formats, using Snowflake's native loading utilities, such as SnowSQL, Snowpipe, and Snowflake connectors. Experienced in tuning SQL queries and optimizing query execution plans in Snowflake to improve query performance and reduce latency. Designed and implemented data security and compliance measures in Snowflake to ensure data protection and regulatory compliance. Implemented monitoring and alerting solutions using Azure Monitor and JIRA for proactive identification and resolution of performance issues within the fraud detection pipelines and utilized Jenkins for automated pipeline deployment. Proficient in creating interactive and visually compelling reports and dashboards using Power BI to visualize complex datasets and derive actionable insights. Integrated Terraform with OLTP and CI/CD pipelines to enable continuous delivery and deployment of infrastructure changes sand developed automated testing frameworks for validating infrastructure code before deployment Implemented data cataloging and lineage solutions using Azure Purview, incorporating GIT for version control, to provide a comprehensive understanding of data assets and their relationships within the fraud detection system. Environment: Azure Databricks, Data Factory, Logic Apps, Data Vault, Functional App, Unit testing, DBT, Terraform, Snowflake, Snowpipe, SnowSQL, MS SQL, MongoDB, Oracle, HDFS, MapReduce, Spark, Hive, SQL, Python, Scala, Pyspark, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power BI, Azure Kubernetes, Azure Purview. Client: CITRIX Fort Lauderdale, FL Oct 2018 to Dec 2020 Role: Data Engineer Responsibilities: Designed and implemented complex data pipelines using Azure Data Factory, ensuring seamless data integration and transformation workflows. Employed Azure Data Factory (ADF), Sqoop, Pig, and Hive to establish an ETL framework for extracting data from diverse sources and ensuring availability for consumption. Integrated Azure SQL Database with ADF for structured data storage and real-time data processing, enabling high-performance analytics. Implemented Azure Data Lake Storage (ADLS) for secure, scalable, and cost-effective storage of large datasets, facilitating advanced analytics. Leveraged Spark SQL and Data Frames for advanced data querying and manipulation, improving data accessibility and analytical capabilities. Engineered a robust Spark Streaming application to handle real-time data analytics, providing timely insights and enhancing business decision-making processes. Developed and maintained Flink jobs to perform complex event processing, data transformations, and aggregations on streaming data Processed HDFS data and created external tables in Hive, developing reusable scripts for table ingestion and repair activities across the project. Developed Spark and Scala-based ETL jobs for migrating data from Oracle to new MySQL tables, leveraging the capabilities of Spark for efficient data processing. Utilized Spark (RDDs, Data Frames, Spark SQL) and Spark-Cassandra Connector APIs for tasks such as data migration and generating business reports, demonstrating versatility in Spark usage. Pioneered in data crunching, ingestion, and transformation activities, engineering a Spark Streaming application for real-time sales analytics, showcasing proficiency in real-time data processing. Conducted thorough analysis of source data, managed data type modifications, and utilized various data formats (Excel sheets, flat files, CSV files) to generate ad-hoc reports in Power BI, ensuring data-driven decision-making. Implemented Slowly Changing Dimensions (SCD) strategies, seamlessly handling updates and enhancing data accuracy by 25%, showcasing expertise in data quality management. Implemented automation for deployments using YAML scripts, ensuring streamlined builds and releases, demonstrating proficiency in deployment practices. Collaborated extensively on the creation of combiners, partitioning, and distributed cache to enhance the performance of MapReduce jobs, demonstrating teamwork and optimization skills. Managed source code and enabled version control using Git and GitHub repositories, ensuring code integrity and collaboration among team members. Environment: Azure Data Factory, Azure SQL Databases, Azure Data Lake Storage, Sqoop, Pig, Hive, HDFS, Spark, Scala, MySQL, RDDs, Data Frames, Spark SQL, Spark-Cassandra Connector, Excel sheets, flat files, CSV files, Power BI, Azure Key Vault, Azure Function Apps, Azure Logic Apps, Apache HBase, Zookeeper, Flume, Kafka, Git, and GitHub. Client: Nielsen Corporation, Tampa, FL Aug 2017 to Sep 2018 Role: ETL Developer Responsibilities: Created and maintained databases for Server Inventory and Performance Inventory. Developed stored procedures and triggers to ensure consistent data entry into the database. Generated Drill-through and Drill-down reports in Power BI, incorporating drop-down menu options, data sorting, and defining subtotals. Utilized Data Warehouse for developing Data Mart, which feeds downstream reports. Developed a User Access Tool enabling users to create ad-hoc reports and run queries for data analysis in the proposed Cube. Created packages for transferring data between ORACLE, MS ACCESS, FLAT FILES, Excel Files, to SQL SERVER 2008R2 using SSIS. Deployed SSIS Packages and established jobs for efficient package execution. Possess expertise in creating ETL packages using SSIS to extract, transform, and load data from heterogeneous databases into the data mart. Experienced in building Cubes and Dimensions with various architectures and data sources for Business Intelligence. Involved in creating SSIS jobs for automating report generation and cube refresh packages. Proficient with SQL Server Reporting Services (SSRS) for authoring, managing, and delivering both paper-based and interactive Web-based reports. Worked in Agile Scrum Methodology, participating in daily stand-up meetings. Possess significant experience working with Visual SourceSafe for Visual Studio 2010 and utilized Trello for project tracking. Environment: SQL Server technologies, Power BI, SSIS, SSRS, Agile Scrum Methodology, Visual SourceSafe, Visual Studio 2010, and Trello for project tracking. Client: Portware, Hyderabad, India June 2012 to July 2016 Role: Data Warehouse Developer Responsibilities: Experience in designing ETL data flows using SSIS, creating mappings/workflows to extract data from SQL Server, and performing Data Migration and Transformation from Access/Excel Sheets using SQL Server SSIS. Efficient in Dimensional Data Modeling for Data Mart design, identifying Facts and Dimensions, and developing fact tables, dimension tables, using Slowly Changing Dimensions (SCD). Experience in Error and Event Handling: Precedence Constraints, Break Points, Check Points, and Logging. Experienced in Building Cubes and Dimensions with different Architectures and Data Sources for Business Intelligence and writing MDX Scripting. Thorough knowledge of Features, Structure, Attributes, Hierarchies, Star and Snowflake Schemas of Data Marts. Good working knowledge of Developing SSAS Cubes, Aggregation, KPIs, Measures, Partitioning Cube, Data Mining Models, and Deploying and Processing SSAS objects. Experience in creating Ad hoc reports and reports with complex formulas, as well as querying the database for Business Intelligence. Implemented OLAP (Online Analytical Processing) and OLTP (Online Transactional Processing) to optimize data storage, retrieval, and analysis for diverse business needs. Expertise in developing Parameterized, Chart, , Linked, Dashboard, Scorecards, and Reports on SSAS Cube using Drill-down, Drill-through Graph, and Cascading reports using SSRS. Flexible, enthusiastic, and project-oriented team player with excellent written, verbal communication, and leadership skills to develop creative solutions for challenging client needs. Environment: MS SQL Server 2016, Visual Studio 2017/2019, SSIS, Share point, MS Access, Team Foundation server. Education: Bachelor of Technology from Jawaharlal Nehru Technological University Kakinada in 2012. -- Thanks & Regards Vishwas Role: Sr. Bench Sale Recruiter Email: [email protected] Phone: +1 8606093513 Ext: 158 Linked In: linkedin.com/in/vishwas-komuravelli-976292290 TECSHAPER INC 3458 Lakeshore Drive, Tallahassee, FL 32312, United States Keywords: continuous integration continuous deployment business intelligence database active directory information technology microsoft procedural language Florida Texas Keywords: continuous integration continuous deployment business intelligence database active directory information technology microsoft procedural language Florida Texas |