Bahnu Prakash - Azure Data Engineer, snowflake, big data |
[email protected] |
Location: Anderson, Alabama, USA |
Relocation: open |
Visa: Green card |
BHANU PRAKASH
AZURE DATA ENGINEER Contact: 469 770 9097 Professional Summary Around 10 years of work experience in Development and Implementations of Data Warehousing solutions. Experienced in Azure Data Factory and preparing CI/CD scripts for the deployment. Glue, S3, Lambdas, and AthenaExperience in building Azure Stream Analytics ingestion spec for data ingestion which helps users to get sub second results in Realtime. Experience in building ETL (Azure Data Bricks) data pipelines leveraging PySpark, Spark SQL. Experience in building the Orchestration on Azure Data Factory for scheduling purposes. Experience working with Data warehouse like Teradata, Oracle, SAP. Experience on Implementation of Azure log analytics providing Platform as a service for SD-WAN firewall logs. Experience in building the data pipeline by leveraging the Azure Data Factory. Selecting appropriate low cost driven AWS/Azure services to design and deploy an application based on given requirements. Expertise on working with databases like Azure SQL DB, Azure SQL DW. Solid programming experience on working with Python, Scala. Strong expertise in Relational Data Base systems like jenkins, MS SQL Server, MS Access, DB2 design and database development using SQL, PL/SQL, SQL PLUS, TOAD, SQL-LOADER. Highly proficient in writing, testing and implementation of triggers, stored procedures, functions, packages, Cursors using PL/SQL. Performed loads into Snowflake instance using snowflake connector in IICS for a separate project to support data analytics and insight use case for sales team. Hands-on experience with Snowflake utilities, Snows, Snow pipe, Big data model techniques using Python. Experience with Snowflake multi-cluster warehouses and building snowpipe. Experience working in a cross-functional AGILE Scrum team. Happy to work with the team who are in middle of the road with some Big Data challenges for both on prem and cloud. Hands-on experience in Azure Analytics Services Azure Data Lake Store (ADLS), Azure Data Lake Analytics (ADLA), Azure SQL DW, Azure Data Factory (ADF), Azure Data Bricks (ADB) etc. Orchestrated data integration pipelines in ADF using various Activities like Get Metadata, Lookup, For Each, Wait, Execute Pipeline, Set Variable, Filter, until, etc. Good knowledge on polybase external tables in SQL DW and involved in production support activities. Working knowledge on Azure services like Azure Data Factory, Azure Databricks and AWS services like EC2, S3, EMR, Redshift, Athena, Glue Meta store, Lambda etc. Expertise working on features like models, macros to create data pipelines using Data Build Tool (DBT) as transformation tool. Worked to extract data from various database sources like DB2, SQL, Main frame, CSV Files and Flat Files. Extensive experience in debugging, troubleshooting, monitoring and performance tuning using CA7, Control-M, Jenkins, DBT Cloud. Excellent understanding and knowledge of NOSQL databases like HBase and Mongo DB. Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Experience with ETL workflow Management tools like Apache Airflow and have significant experience in writing the python scripts to implement the workflow. Worked on AWS Data Pipeline to configure data loads from S3 into Redshift. Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle). Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files. Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python. Experience in analyzing, designing, developing, implementing, tuning and supporting the Business Intelligence and Data Warehouse tools using Power BI. Good knowledge of detailed Data analysis and translates business requirements into logical data models and storytelling dashboards using Power BI. Sound knowledge on all Power BI scenarios like Calculations, Filters, Actions, Parameters, Groups, Sets, Graphs, Maps, DAX, Power Query and blank query. Superior communication skills, strong decision making and organizational skills along with outstanding analytical and problem-solving skills to undertake challenging jobs. Experience in solving complex problems, strong analytical skills, and the ability to trace issues to the root source. Proficient in teaming up with the project sponsors, Analysts, QA, and Architecture teams to design and deliver outstanding business intelligence and data warehouse applications. Professional Experience Client: Retail Business Services, Atlanta, Sep 2021 Present. Sr. Data Engineer Responsibilities: Created Linked Services for multiple source system (i.e.: Azure SQL Server, ADLS, BLOB, Rest API). Created Pipeline s to extract data from on premises source systems to azure cloud data lake storage; Extensively worked on copy activities and implemented the copy behavior s such as flatten hierarchy, preserve hierarchy and Merge hierarchy. Implemented Error Handling concept through copy activity. Exposure on Azure Data Factory activities such as Lookups, Stored procedures, if condition, for each, Set Variable, Append Variable, Get Metadata, Filter and wait. Configured the logic apps to handle email notification to the end users and key shareholders with the help of web services activity; create dynamic pipeline to handle multiple source extracting to multiple targets; extensively used azure key vaults to configure the connections in linked services. Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines; monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notification of failure pipelines. Extensively worked on Azure Data Lake Analytics with the help of Azure Data bricks to implement SCD-1, SCD-2 approaches. Created Azure Stream Analytics Jobs to replication the real time data to load to Azure SQL Data warehouse. Implemented delta logic extractions for various sources with the help of control table; implemented the Data Frameworks to handle the deadlocks, recovery, logging the data of pipelines. Understand the latest features like (Azure DevOps, OMS, NSG Rules, etc..,) introduced by Microsoft Azure and utilized it for existing business applications. Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB). Deployed the codes to multiple environments with the help of CI/CD process and worked on code defect during the SIT and UAT testing and provide supports to data loads for testing; Implemented reusable components to reduce manual interventions. Developing Spark (Scala) notebooks to transform and partition the data and organize files in ADLS. Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data and Used Spark Data Frame Operations to perform required Validations in the data. Familiar with the REST API for interacting with Kafka and its related components. Working on Azure Data bricks to run Spark-Python Notebooks through ADF pipelines. Using Data bricks utilities called widgets to pass parameters on run time from ADF to Data bricks. Created Triggers, PowerShell scripts and the parameter JSON files for the deployments. Worked with VSTS for the CI/CD Implementation. Reviewing individual work on ingesting data into azure data lake and provide feedbacks based on reference architecture, naming conventions, guidelines and best practices. Implemented End-End logging frameworks for Data factory pipelines. Environment: Azure Data Factory, Azure Data Bricks, PolyBase, Azure DW, ADLS, Azure DevOps, BLOB, Azure SQL Server, Azure synapse. Client: PWC, Portland, OR Feb 2018 Aug 2021. Data Engineer Responsibilities: Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store. Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses. Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume. Worked on troubleshooting spark application to make them more error tolerant. Worked on fine-tuning spark applications to improve the overall processing time for the pipelines. Wrote Kafka producers to stream the data from external rest APIs to Kafka topics. Utilized Spark SQL API in Spark with Scala to extract and load data and perform SQL queries. Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase. Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient Joins, transformations, and other capabilities. Exercised the manipulation of raw data and ingesting it from AWS S3 into Snowflake. Responsible for ensuring data in Snowflake is updated timely and available for reporting and business analytics at the start of business daily. Experience working for EMR cluster in AWS cloud and working with S3, Redshift, Snowflake. Involved in creating Hive tables, loading, and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions, Buckets in Hive. Good experience with continuous Integration of application using Bamboo. Used Reporting tools like Tableau to connect with Impala for generating daily reports of data. Collaborated with the infrastructure, network, database, application, and BA teams to ensure data quality and availability. Designed, documented operational problems by following standards and procedures using JIRA. Environment: Snowflake, AWS EMR, Spark, Scala, Python, Hive, Sqoop, Oozie, Kafka, YARN, JIRA, S3, Redshift, Athena, Shell Scripting, GitHub, Maven. Client: American Airlines, Fort Worth, TX July 2016 Jan 2018 Data Engineer. Responsibilities: Involved in processing of batch files from multiple source systems. Responsible for using Python scripts to curate raw data from Blob storage to ADL Gen2 using Azure data Factory. Automated curation jobs in Azure Data Factory to convert different source formatted data to compressed csv data. Migration of on-premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2). Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the target snowflake database. Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Exercised skill in integrating Snowflake with Azure infrastructure by using DBT as transformation tool. Good hands on and to understand data transformation requirements to leverage models, materializations in DBT. Involved in the usage of macros in DBT to load data from ADL Gen2 to Snowflake data warehouse and also for schema tests, data quality tests. Identified and implemented cleansing requirements as ELT process in snowflake with SQL scripts as part in multiple layers to curate the data and created calculation models on top it as required for use-cases. Automated and created pipelines in various environments and scheduled production jobs in DBT Cloud while maintaining single source of trusted code in GitLab. Actively document implementations, so others can easily understand the requirements, implementation, and test conditions. Monitored the working of project and enhanced the processing of large datasets by query optimization and clustering. Responsible for ensuring data in Snowflake is updated timely and available for reporting and business analytics at the start of business daily. Environment: Snowflake, Azure Blob Storage, Azure Data Factory, Data Build Tool (DBT Cloud), Gitlab. Client: Northern Trust, Chicago, IL Nov 2014 April 2016. Power bi Developer/ Data Engineer. Responsibilities: Understanding source systems and customer requirements. Loading data from different data sources like Excel and SQL using Import mode. Created interactive data visualizations for better decision making. Worked on Bar chart, pie chart, pivot table, straight table and Gauge chart. Participating in daily standup calls. Involved in testing and bugs resolution process. Responsible for all administrative activities like manage data connection on Power BI services, collaborate reports with Users and Row Level Security. Supporting users by working on any ad-hoc requests or Support tickets related to reports. Gathered Business Requirements, interacted with the Users, Project Manager and SMEs to get a better understanding of the Business Processes. Developed and designed interactive dashboards for Desktop and Mobile devices. Resolved the performance issues of the Reports. Extensively worked on Dax Calculations part. Prepared SQL queries for data validation for each visual created in Power BI and documented. Configured On-Premises Data Gateway to schedule data refresh on Power BI services. Involved in embedding Power BI reports in SharePoint and collaborate with Business Users. Supporting users by working on any ad-hoc requests or Support tickets related to reports. Environment: Power BI, Excel, SQL Server, Data modelling, ETL. Client: Prism Technologies, India May 2013 to Sep 2014 Data Analyst Responsibilities Created various datasets for the reports based on required parameters and columns in the report. Developed & executed several complex T-SQL procedures for creating datasets for the reports. Fine-tuning Stored Procedures to improve performance that was achieved by removing unnecessary cursors, temporary tables wherever possible. Validated and scripted VB Script, JavaScript. Migration of DTS packages to SSIS packages. Worked to create Software Requirement Specification (SRS) documents. Defined data sources for the reports to pull the data from server and database in the background. Identified the necessary steps to enhance user response time by checking fragmentation and re-indexed tables for efficient application access methods. Created views for easier implementation on web pages and wrote triggers on those views to provide efficient data manipulation. Created packages using DTS for moving data from MS Access to SQL Server and vice versa. Configured and managed database maintenance plans for update statistics, database integrity check and backup operations. Implemented indexed views, and tuning stored procedures to help increase database performance. Monitored and tuned server configuration to ensure maximum capacity and output. Environment: Windows NT 4.0/2008, MS SQL Server 2008, VB, T-SQL, SQL Server Enterprise Manager. Keywords: continuous integration continuous deployment quality analyst business analyst business intelligence sthree database active directory information technology microsoft procedural language Delaware Illinois South Dakota Texas |