Shirisha - Azure Data Engineer |
[email protected] |
Location: Dallas, Texas, USA |
Relocation: Remote |
Visa: H1B |
Name -Shirisha k
Email- [email protected] phone- +1(972)945 5150 PROFESSIONAL SUMMARY: 11+ years Professional experience in IT including 11+ years of comprehensive experience in Database Engineer, ETL and Big Data Hadoop Development. Over 11+ Years of experience in design, development, and Implementation of Big data applications using Hadoop ecosystem frameworks and tools like HDFS, MapReduce, Pig, Hive, Spark, Scala, Storm HBase, Kafka, Flume, Nifi, Impala, Oozie, Zookeeper, Airflow, etc. Responsible to work on Azure backup server to a backup azure virtual machine, MS DB files, and folders and protect azure VM into the Azure cloud using recovery vault. Good experience in Microsoft Azure/cloud services like SQL data warehouse, Azure SQL server, Azure Databricks/HDInsight, Azure Data Lake, Azure blob storage, Azure Data factory, Stream analytics, Elastic pools, Azure Cli, Azure analysis service, Redis elastic cache, ARM Experience in creating Data Governance Policies, Business Glossary, Data Dictionary, Reference Data, Metadata, Data Lineage, and Data Quality Rules. Experience in implementing ML Algorithms using distributed paradigms of Spark/Flink, in production, on Azure Databricks/AWS sagemaker. Extending datacenters workloads from on-premises to Azure cloud to create disaster recovery (DR) site in azure using Azure site recovery and provide business continuity and disaster recovery (BCDR) Experience with SSIS, Power BI Desktop, Power BI Services, ML Language Interactions, DAX Traced and catalogue data processes, transformation logic and manual adjustments to identify data governance issues. Expertise in developing Scala and Java applications and good working knowledge of working with Python. Good Expertise in ingesting, processing, exporting, analyzing Terabytes of structured and unstructured data on Hadoop clusters in Healthcare, Insurance, and Technology domains. Experience in working with various SDLC methodologies like Waterfall, Agile Scrum, and TDD for developing and delivering applications. Experience in gathering requirements, analyzing requirements, providing estimates, implementation, and peer code reviews. Experience in using to analyze data from multiple sources and creating reports with Interactive Dashboards using power BI Implemented layered architecture for Hadoop to modularize design. Developed framework scripts to enable quick development. Designed reusable shell scripts for Hive, Sqoop, Flink and PIG jobs. Standardize error handling, logging and metadata management processes. Indexed processed data and created dashboards and alerts in Splunk to be utilized/ action by support teams. Experienced in Dimensional Data Modeling experience using Data modeling, Relational Data modeling, ER/ Studio, Erwin, and Sybase Power Designer, Star Join Schema/Snowflake modeling, FACT & Dimensions tables, Conceptual, Physical & logical data modeling. Expert in creating indexed views, complex stored procedures, effective functions and appropriate triggers to facilitate efficient data manipulation and data consistency. Experience in using to analyze data from multiple sources and creating reports with Interactive Dashboards using power BI Expert in data extraction, transformation and loading from various sources like oracle, excel, CSV, XML. Demonstrated experience in delivering data and analytic solutions leveraging AWS, Azure or similar cloud data lake. Data Streaming from various sources like cloud (AWS, Azure) and on - premises by using the tools Spark. TECHNICAL SKILLS: Cloud : Azure (Data Factory, Data Lake, Data Bricks, Logic App, ARM, Azure SQL) ETL Tools : Azure Data Factory(V2), Azure Data Bricks, Azure Cloud, SSIS, SSAS, IBM IIS DataStage Big Data : Apache NiFi, PySpark, Apache Kafka Reporting Tools : Power BI, SAP Business Objects 4.1, & QlikView, Tableau Desktop Automation Tools : Azure Logic App, DataStage Director, Crontab, Control-M, UiPath Code Repository Tools : GitLab, SVN Tortoise, TFS, StarTeam Database : Oracle (10g, 11g, 12c), SQL Server (2008 R2), Db2, Netezza, MySQL, PostgreSQL Operating System : Windows, Sun-Solaris, Red Hat Linux, Ubuntu Other tools : Visual Studio, MS Office, WinSCP and Putty PROFESSIONAL EXPERIENCE: Client: R1RCM, Chicago, IL Jan 2021 Till Date Role: Azure Data Engineer Responsibilities: Involved in requirements gathering interacting with product owners and business analysts to understand the requirements. Developed Spark Scripts using Spark Session, Python, Principal and Spark SQL to access hive tables into spark for faster processing of data. Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. Exposure on Azure Data Factory activities such as Lookups, Stored procedures, if condition, for each, Set Variable, Append Variable, Get Metadata, Filter and wait. Configured the logic apps to handle email notification to the end users and key shareholders with the help of web services activity. Designed and developed workflow (Autosys) and scripts (Sqoop, PySpark and Hive) for data ingestion, data processing like cleanup, parsing and data cataloging. Created dynamic pipeline to handle multiple sources extracting to multiple targets and extensively used azure key vaults to configure the connections in linked services. Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines and monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notification of failure pipelines. Involved in on-premises to AWS cloud migration initiative, as part of initiative created workflow (Step Function) and scripts (AWS Lambda, Python, Databricks- PySpark, Redshift) for data ingestion, data processing like cleanup, parsing and data cataloging. Extensively worked on Azure Data Lake Analytics with the help of Azure Data bricks to implement SCD-1, SCD-2 approaches. Created Azure Stream Analytics Jobs to replication the real time data to load to Azure SQL Data warehouse. Implemented delta logic extractions for various sources with the help of control table; implemented the Data Frameworks to handle the deadlocks, recovery, logging the data of pipelines. Understand the latest features like (Azure DevOps, OMS, NSG Rules, etc..,) introduced by Microsoft Azure and utilized it for existing business applications. Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB). Install and configure Virtual machines, storage account, virtual network, LogicApps Azure load balancer in the Azure cloud. Deployed the codes to multiple environments with the help of CI/CD process and worked on code defect during the SIT and UAT testing and provide supports to data loads for testing; Implemented reusable components to reduce manual interventions. Responsible to work on Azure backup server to a backup azure virtual machine, MS DB, files, and folders and protect azure VM into the Azure cloud using recovery vault. Developing Spark (Scala) notebooks to transform and partition the data and organize files in ADLS. Working on Azure Data bricks to run Spark-Python Notebooks through ADF pipelines. Using Data bricks utilities called widgets to pass parameters on run time from ADF to Data bricks. Created Triggers, PowerShell scripts and the parameter JSON files for the deployments. Worked with VSTS for the CI/CD Implementation. Extending datacenters workloads from on-premises to Azure cloud to create disaster recovery (DR) site in azure using Azure site recovery and provide business continuity and disaster recovery (BCDR) Reviewing individual work on ingesting data into azure data lake and provide feedback based on reference architecture, naming conventions, guidelines, and best practices. Implemented End-End logging frameworks for Data factory pipelines. Visual Data Lineage: Provides visual representations of how data flows through systems alation and transformations. Data Provenance: Tracks the origin and history of data to ensure transparency and trust. Environment: Spark, Python, Scala, Kafka, DAG, Hadoop, Apache Spark, Azure cloud, SparkSql, Hue, GCP, Hive, Jenkins, Redshift, HDFS, Sqoop, Unix / Linux. Client: FIS Global, IA April 2019 - Jan 2021 Role: AWS Data Engineer Responsibilities: Experience in all aspects of analytics/data warehousing solutions (Database issues, Data modeling, Data mapping, ETL Development, data management, data migration and reporting solutions). Strong understanding of Data Modeling (Relational, dimensional, Star and Snowflake Schema) & Data analysis. Install and configure Virtual machines, storage account, virtual network, azure load balancer in the Azure cloud. Working with business users and business analyst for requirements gathering and business analysis. Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing and transforming the data to uncover insights. Worked on implementing Prepare architecture and implementation of the applications in the Azure cloud and on-premises environments. Involved in on-premises to AWS cloud migration initiative, as part of initiative created workflow (Step Function) and scripts (AWS Lambda, Python, Databricks- PySpark, Redshift) for data ingestion, data processing like cleanup, parsing and data cataloging. Design and implement AppLogic & database solutions in Azure SQL data warehouse, Azure SQL. Create and maintain optimal data pipeline architecture in Azure using Data Factory and Azure Databricks. Create self-service reporting in Azure Data Lake Store Gen2 using an ETL approach. Install and configure Virtual machines, storage account, virtual network, azure load balancer in the Azure cloud. Writing pyspark and spark Sql transformation in Azure Databricks to perform complex transformations for business rule implementation. Creating data factory pipelines that can bulk copy multiple tables at once from relational database to Azure data Lake Gen2. Implemented Data Governance using Excel and Collibra Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Designed and developed workflow (Autosys) and scripts (Sqoop, PySpark and Hive) for data ingestion, data processing like cleanup, parsing and data cataloging. Used Pandas, Opencv, NumPy, Seaborn, Tensorflow, Keras, Matplotlib, Sci-kit-learn, NLTK in Python for developing data pipelines and various machine learning algorithms Experience with Requests, Report Lab, NumPy, SciPy, Pytables, cv2, imageio, Python-Twitter, Matplotlib, HTTPLib2, Urllib2, Beautiful Soup, Data Frame and Pandas python libraries during development lifecycle. Working on Microsoft AWS toolsets including AWS Data Factory Pipelines, AWS Data bricks, AWS Data Lake Storage Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Created Dax Queries to generated computed columns in Power BI. Used Power BI, Power Pivot to develop data analysis prototype, and used Power View and Power Map to visualize reports. Responsible for implementing, designing, and architect solutions for Azure cloud and network infrastructure, Datacenter migration for the public, private, and hybrid cloud. Created ELB Security Groups, Auto Scaling Groups, spun up GPU Large Instances, ECS Clusters Which Consist of Task Definitions, ECS Services By using single CloudFormation Template Administrating and supporting company s Azure Kubernetes infrastructure, ensuring it is secure, resilient and performance and responsible for complete DevOps activities and coordinating with development team. Working as Kubernetes Administrator, involved in configuration for web apps, Azure App services, Azure Application insights, Azure Application gateway, Azure DNS, Azure traffic manager, App services. Setting up the complete Kubernetes Dev Environment from scratch to deploy latest tools which is related to Deep learning and machine learning using helm charts on premises BareMetal for different teams Created pipelines in ADF using Datasets/Pipeline to extract, transform and load data from different sources like Azure SQL, blob storage, Azure Data warehouse. Create the custom logging framework for ETL pipeline logging using Append variables in Data factory Developed JSON scripts for deploying the pipeline in Azure Data factory (ADF) that process the data using the SQL Activity Involved in the development of real time streaming applications using Kafka, Hive, PySpark, Apache Flink on distributed Hadoop Cluster Design, develop, and implement Azure-based data solutions using Databricks, ADLS Gen2, Virtual Machines, Milvus Vector DB, and Python Participated in the Data Governance working group sessions to create Data Governance Policies Configuration of Communities, Domains, Asset Model, Relations per MS Data Governance Approach & Requirements Alation aims to provide a unified platform that improves data visibility, usability, and governance, helping organizations make more informed decisions based on their data assets. Developed common Flink module for serializing and deserializing AVRO data by applying schema. Collibra Workflow development and configuration based on MS Data Governance Approach & Requirements Developed server-based web traffic using RESTful API's statistical analysis tool using Flask, Pandas. Enabling monitoring and azure log analytics to alert support team on usage and stats of the daily runs. Implement continuous integration/continuous development best practice using Azure Devops, ensuring code versioning. Usage Analytics: Tracks how data is accessed and used within the organization. Custom Reports: Allows users to create and customize reports on data usage alation and catalog activity. Hands on experience on developing SQL scripts for automation purpose. Build complex distribution system involving huge amount data handling, collecting metrics building data pipeline and Analytics. Environment: Azure Data Lake, Azure Data factory, Flink, Pandas, Power bi, numpy, spark, Databricks, Azure Devops, Agile, Python, SQL, FHIR standards, ER model, Adobe Creative Suite. Client: Sprint Overland Park, KS Nov 2017- Mar 2019 Role: Azure Data Engineer Responsibilities: Developed Spark applications to implement various aggregation and transformation functions of Spark RDD and Spark SQL. Worked on DB2 for SQL connection to Spark Scala code to Select, Insert, and Update data into DB. Developed and designed data ingestion using Apache Nifi/Kafka Used Broadcast Join in SPARK for making smaller datasets to large datasets without shuffling of data across nodes. Extracted, Transformed and Loaded data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL, and U-SQL Azure Data Lake Analytics. Data is Ingested to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Worked on Azure Services like IaaS, PaaS and worked on storage like Blob (Page and Block), SQL Azure. Implemented OLAP multi-dimensional functionality using Azure SQL Data Warehouse. Retrieved data using Azure SQL and Azure ML which is used to build, test, and predict the data. Worked on Cloud databases such as Azure SQL Database, SQL managed instance, SQL Elastic pool on Azure, and SQL server. Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB). Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster. Designed and developed Azure Data Factory pipelines to Extract, Load and Transform data from difference sources systems (Mainframe, SQL Server, IBM DB2, Shared Drives, etc.) to Azure Data Storage services using a combination of Azure Data Factory, Azure Databricks (PySpark, Spark-SQL), Azure Stream Analytics and U-SQL Azure Data Lake Analytics. Data Ingestion into various Azure Storage Services like Azure Data Lake, Azure Blob Storage, Azure Synapse Analytics (formerly known as Azure Data Warehouse). Configured and deployed Azure Automation Scripts for a multitude of applications utilizing the Azure stack (including Compute, Web & Mobile, Blobs, ADF, Resource Groups, Azure Data Lake, HDInsight Clusters, Azure Data Factory, Azure SQL, Cloud Services, and ARM), Services and Utilities focusing on Automation. Involved in Migrating Objects from Teradata to Snowflake and created Snow pipe for continuous data load. Increased consumption of solutions including Azure SQL Databases, Azure Cosmos DB and Azure SQL. Created continuous integration and continuous delivery (CI/CD) pipeline on Azure that helps to automate steps in the software delivery process. Deploying and managing applications in Datacenter, Virtual environment, and Azure platform as well. Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's and PySpark. Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries. Handled importing of data from various data sources, performed transformations using Hive, and loaded data into HDFS. Design, development, and implementation of performant ETL pipelines using PySpark and Azure Data Factory. Environment: Hadoop, Hive, Azure Data Lake, Azure Data Factory, Azure cloud, Spark, Databricks, PowerBI, Tableau, Python, SQL Client: Lowe's Companies Austin, TX Sep 2016 Oct 2017 Role: Data Engineer Responsibilities: Designed Distributed algorithms for identifying trends in data and processing them effectively. Used Spark and Scala for developing machine learning algorithms which analyses click stream data. Have good experience in setting up separate application and reporting data tiers across servers using Geo replication functionality. Implemented Disaster Recovery and Failover servers in Cloud by replicating data across regions. Good experience in creating Elastic pool databases and schedule Elastic jobs for executing TSQL procedures. For Log analytics and for better query response used Kusto Explorer and created alerts using Kusto query language. Designed SSIS Packages using Business Intelligence Development Studio (BIDS) to extract data from various data sources and load it into SQL Server database for further Data Analysis and Reporting by using multiple transformations. Worked on creating correlated and non-correlated sub-queries to resolve complex business queries involving multiple tables from different databases. Perform analyses on data quality and apply business rules in all layers of data extraction transformation and loading process. Perform validation and verify software at all testing phases which includes Functional Testing, System Integration Testing, End to End Testing, Regression Testing, Sanity Testing, User Acceptance Testing, Smoke Testing, Disaster Recovery Testing, Production Acceptance Testing and Pre-prod Testing phases. Involved in planning cutover strategy, go-live schedule including the scheduled release dates of Portfolio central Datamart changes. Environment: Python, SQL server, PostgreSQL Hadoop, Databricks, AWS Kinesis, Kafka, Apache Spark, HDFS, HBase, MapReduce, Hive, Impala, Pig, Sqoop, Mahout, shell scripting, REST, LSTM, RNN, Spark MLLib, MongoDB, Tableau, Unix/Linux, SSIS, SAS, Tableau, Excel Client: T- Mobile, Dallas, TX June 2014 August 2016 Role: Data Warehouse Developer Responsibilities: Gathered all the business requirements from the business Partners. Responsible for building dependency analysis on Object migration. Used Amazon QuickSight to build customer journey analytics dashboards to represent channel level communications. Create data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Athena and Quicksight Create Athena data sources on S3 buckets for adhoc querying and business dashboarding using Quicksight and Tableau reporting tools. Copy Fact/Dimension and aggregate output from S3 to Redshift for Historical data analysis using Tableau and Quicksight Used copy to bulk load data part of migration. Worked on views to get better security on snowflakes. Designed and developed workflow (Autosys) and scripts (Sqoop, PySpark and Hive) for data ingestion, data processing like cleanup, parsing and data cataloging. Involved in Hadoop cluster Monitoring and Performance tuning. Provisioned airflow in EKS and designed strategy to setup airflow. Involved in on-premises to AWS cloud migration initiative, as part of initiative created workflow (Step Function) and scripts (AWS Lambda, Python, Databricks- PySpark, Redshift) for data ingestion, data processing like cleanup, parsing and data cataloging. Being in Agile-Scrum environment, prioritize enhancements and new initiations based on dependencies and urgency during iteration planning meetings specifically across simultaneous work streams with high potential of conflicts among various functional stakeholders. Used crontab to schedule the jobs. Environment: AWS Athena, Snowflake, SnowPipe, AWS EC2, AWS S3, AWS EMR, Athena, QuickSight, AWS Lambda, Shell Scripting, Python, Spark, Spark SQL, Redshift and clickhouse Client: ABN AMRO Bank, IL March 2013 May 2014 Role: Data Engineer Responsibilities: Involved in architecture design, development, and implementation of Hadoop deployment, backup, and recovery systems. Created various kinds of reports using Power BI and Tableau based on the client's needs. Implemented indexes such as clustered index, non-clustered index, covering index appropriately on data structures to achieve faster data retrieval. Worked with Data Governance tools and extract-transform-load (ETL) processing tool for data mining, data warehousing, and data cleaning using SQL. Optimized SQL performance, integrity, and security of the project s databases/schemas. Performed ETL operations to support incremental, historical data loads and transformations using SSIS. Created SSIS packages to extract data from OLTP to OLAP systems and scheduled jobs to call the packages. Implemented Event Handlers and Error Handling in SSIS packages and notified process results to various user communities. Designed SSIS packages to import data from multiple sources to control upstream and downstream of data into SQL Azure database. Used various advanced SSIS functionalities like complex joins, conditional splitting, column conversions for better performance during package execution. Developed impactful reports using SSRS, MS Excel, Pivot tables and Tableau to solve the business requirements. Creating and deploying parameterized reports using power BI and Tableau. Maintained the physical database by monitoring performance, integrity and optimize SQL queries for maximum efficiency using SQL Profile. Worked on formatting SSRS reports using the Global variables and expressions. Created Power BI Reports using the Tabular SSAS models in Power BI desktop and published them to Dashboards using cloud service. Created designs and process flow on to how standardize Power BI Dashboards to meet the requirements of the business. Made changes to the existing Power BI Dashboard on a regular basis as per requests from Business. Created data maintenance plan for backup and restore and update statistics and rebuild indexes. Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map reduce jobs that extract and Zookeeper for providing coordinating services to the cluster. Environment: Hadoop Hortonworks2.2, Hive, Pig, HBase, Scala, Sqoop and Flume, Oozie, AWS, S3, EC2, Glue, EMR Spring, Kafka, SQL Assistant, Python, UNIX, Ter Keywords: continuous integration continuous deployment machine learning business intelligence sthree database information technology golang microsoft Illinois Iowa Kansas Texas |