Sudhishna - Data Engineer |
[email protected] |
Location: Bernardsville, New Jersey, USA |
Relocation: |
Visa: |
PROFESSIONAL SUMMARY:
7.1 years of experience as Azure Cloud Data Engineer in Microsoft Azure Cloud technologies including Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics (SOL Data warehouse), Azure SQL Database, Azure Analytical services, Polybase, Azure Cosmos NoSQL DB, Azure Key vaults, Azure HDInsight Big Data Technologies like Hadoop, Apache Spark, and Azure Data bricks. Hands-on experience in Data Modeling, Dimensional Modeling, implementation, and support of various applications in OLTP and Data Warehousing. Experience in dealing with Apache Hadoop components like HDFS, HiveQL, Sqoop, Big Data and Big Data Analytics. Leading a team of data engineers and business intelligence developers from different geography to deliver the demand coming from the various countries from different markets of MSD/Merck. Good experience in implementing the master data management and data quality implementation Strong knowledge of relational databases like Red-Shift, Oracle, Teradata, SQL Server 2008, and proficient in writing PL-SQL procedures. Extensively followed Agile methodologies in development including scrum ceremonies like stand-up meetings, scrum reviews, planning, and retrospectives. Exposure to other BI & Data warehousing technologies and services like Tableau, QlikView, Alteryx, Power BI. Good knowledge of Data warehousing and Business intelligence concepts with strong knowledge of Data modeling design e.g., Star, Snowflake design. Ability to meet deadlines and handle multiple tasks, decisive with strong leadership qualities, flexible in work schedules, and possess good communication and interpersonal skills. Experience in using GIT, BITBUCKET for version controlling and error reporting and project management tools like JIRA, RALLY. Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data frame API, Spark Streaming, Pair RDD's and worked explicitly on PySpark. Developed framework for converting existing PowerCenter mappings and to PySpark (Python and Spark) Jobs. Established connection from Azure to On-premises data center using Azure Express Route for Single and Multi-Subscription. Performed ETL operations in Azure Databricks by connecting to different relational database source systems using JDBC connectors. Developed Python scripts to do file validations in Databricks and automated the process using ADF. Developed an automated process in Azure cloud which can ingest data daily from web service and load in to Azure SQL DB. Experience in Design and Development of ETL methodology for supporting Data Migration, data transformations & processing in a corporate wide ETL Solution using Teradata TD. Experience on Cloud Databases and Data warehouses (SQL Azure and Confidential Redshift/RDS). Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa. Expert in Data Analysis, Data Validation, Data Verification and identifying data mismatch before storing on prem. Worked with various streaming ingest services with Batch and Real-time processing using Spark streaming, Kafka, confluent, Storm, Flume and Sqoop. Took part in software development life cycle (SDLC) of the tracking systems Requirements, Gathering, Analysis, Detail Design, Development, System Testing and User Acceptance Testing using Waterfall and Agile Scrum methodologies. Used tools like GitHub, Slack, Jenkins, Docker, and JIRA to migrate legacy applications to cloud platform. Created Azure SQL database, performed monitoring and restoring of Azure SQL database. Performed migration of Microsoft SQL server to Azure SQL database. Experienced in Data Modeling & Data Analysis experience using Dimensional Data Modeling and Relational Data Modeling, Star Schema/Snowflake Modeling, FACT & Dimensions tables, Physical & Logical Data Modeling. TECHNICAL SKILLS: Big Data Technology: HDFS, Scala, MapReduce, Hive, Sqoop, Flume, Oozie, Spark, Kafka, Storm and Zookeeper. Languages: C, Java, Python, PL/SQL, Hive QL, Unix shell scripts Cloud Architecture: Azure Data lake, Data factory, Azure Databricks, Azure SQL database, Azure SQL Data warehouse. Reporting Tools Tableau, Oracle Reports , Ad-hoc, Power BI Frameworks MVC, Angular, Spring, Hibernate Databases: Oracle, SQL Server, PL/SQL and My SQL NoSQL Databases: HBase, Cassandra, MongoDB Web Technologies HTML, DHTML, XML, AJAX, JavaScript, JSP, JQuery Web Services: WSDL, SOAP, and REST API, Microservices. Databases: Oracle, SQL Server, MySQL Tools and IDE: Eclipse, NetBeans Version Control Tools: SVN, Git, CVS, TFS PROFESSIONAL EXPERIENCE: Role: Data Engineer Client: Blue Cross and Blue Shield of Kansas City,USA. May 2023 to Present Responsibilities: Created numerous pipelines in Azure using Azure Data Factory v2 to get the data from disparate source systems by using different Azure Activities like Move &Transform, Copy, filter, for each, Databricks etc. Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash. Created, provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters. Design and implement various layer of Data lake, Design star schema in Big Query. Implemented and managed Azure Databricks clusters to facilitate scalable and collaborative data analytics and machine learning workflows within the Azure ecosystem. Creating alerting policies for Cloud Composer, and Cloud Data fusion to notify of any job failure. Developed Spark code using Scala and Spark-SQL for faster processing of data. Configured and optimized Azure Databricks workspaces to enable efficient data exploration, transformation, and analysis, leveraging the platform's collaborative and interactive data processing capabilities. Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed Exploring Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark-SQL, Data Frame, pair RDD's, Spark YARN. Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks Experience with terraform scripts which automates the step execution in EMR to load the data to Scylla DB. Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks Process and load bound and unbound data from Google pub/sub topic to Big query using cloud Dataflow with Python. Developed Kafka consumer API in Scala for consuming data from Kafka topics. Implemented data quality checks using Spark Streaming and arranged bad and passable flags on the data. Created External Tables in Snowflake to process data in the external stage in the Cloud Implemented Spark-SQL with various data sources like JSON, Parquet, ORC, and Hive. Loaded the data into Spark RDD and did in memory data Computation to generate the output response. Used HTML, XML, CSS, AJAX and JavaScript for developing front end pages and client-side validations Define virtual warehouse sizing for Snowflake for different type of workloads. Integrated Azure Databricks with Azure Synapse Analytics and other data storage solutions to streamline data ingestion and processing, facilitating the seamless transfer of data between different data sources and processing environments Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala. Developed Spark scripts using Scala Shell commands as per the requirements. Environment: HDFS, Spark, Scala, Python, Data bricks, Tomcat, Netezza, Oracle, Azure, Azure Data Bricks, Sqoop, Snowflake, Terraform, Scylla DB, Cassandra, MySql, Oozie, HTML/DHTML, AJAX, CSS, XML, XSLT, JavaScript Role: Data Engineer Client: Global Payments. Atlanta, GA. Aug 2022 to May 2023 Responsibilities: Worked on Azure Data Factory to integrate data of both on-prem (MYSQL, Cassandra) and cloud (Blob storage, Azure SQL DB) and applied transformations to load back to snowflake. Developed custom activities using Azure Functions, Azure Databricks, and PowerShell scripts to perform data transformations, data cleaning, and data validation. Deployed Data Factory for creating data pipelines to orchestrate the data into SQL database. Working on Snowflake modelling using data warehousing techniques, data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture. Analytical approach to problem-solving; ability to use technology to solve business problems using Azure data factory, data lake and azure synapse. Developed ELT/ETL pipelines to move data to and from Snowflake data store using combination of Python and Snowflake Snow SQL. Worked with Azure Logic Apps administrators to monitor and troubleshoot issues related to process automation and data processing pipelines. Developed and optimized code for Azure Functions to extract, transform, and load data from various sources, such as databases, APIs, and file systems. Designed, built, and maintained data integration programs in a Hadoop and RDBMS Developed CI/CD framework for data pipelines using Jenkins tool. Collaborated with DevOps engineers to developed automated CI/CD and test-driven development pipeline using azure as per the client requirement. Developed and maintained Shell and Unix scripts for automating data processing and system administration tasks. Collaborated on ETL tasks, maintaining data integrity and verifying pipeline stability. Working with JIRA to report on Projects, and creating sub tasks for Development, QA, and Partner validation. Experience in full breadth of Agile ceremonies, from daily stand-ups to internationally coordinated PI Planning Environment: Azure Databricks, Data Factory, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, GIT, JIRA, Jenkins, kafka, ADF Pipeline, Power Bi. Role: Data Engineer Client: Zomato Limited. Hyderabad, India. May 2019 to July 2021 Responsibilities: Involved in Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks. Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory and Logic Apps and also done POC on Azure Data Bricks. Perform ETL using Azure Data Bricks, Migrated on premise Oracle ETL process to azure synapse analytics. Worked on Migrating SQL database to Azure data lake, Azure data lake analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse Controlling and granting database access and Migrating on Premise databases to azure data lake store using Azure Data Factory Data transfer using azure synapse and Polybase. Integrating Python scripts with Azure Stream Analytics to process real-time data streams. Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development. Developed enterprise level solution using batch processing and streaming framework (using Spark Streaming, apache Kafka. Processed the schema oriented and non-schema-oriented data using Scala and Spark. Created Partitions, Buckets based on State to further process using Bucket based Hive joins. Created automated scripts using Sqoop commands and shell scripts to schedule and run Sqoop jobs on a regular basis. Created Partitions, Buckets based on State to further process using Bucket based Hive joins. Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing data. Worked on RDD s & Data frames (SparkSQL) using PySpark for analyzing and processing the data. Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing Implemented CICD pipelines to build and deploy the projects in Hadoop environment. Using JIRA to manage the issues/project workflow. Worked on Spark using Python (PySpark) and Spark SQL for faster testing and processing of data. Used Git as version control tools to maintain the code repository. Environment: Azure Databricks, Data Factory, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi. Role: Data Engineer Client: Foodpanda. Hyderabad, India Sep 2018 to May 2019 Responsibilities: Developed scalable data pipelines for real-time processing of large datasets using Azure Data Factory and Databricks, improving data processing efficiency by 30%. Implemented Microservices architecture for data integration, reducing system dependencies and improving overall data pipeline reliability by 20%. Collaborated with data scientists to integrate ML models using PySpark, enhancing the performance of predictive analytics models and reducing model deployment time by 25%. Performance tuning and monitoring in an enterprise environment. Designed and Developed data mapping procedures ETL-Data Extraction, Data Analysis and Loading process for integrating data using python programming. Used Hive and created Hive tables. Role: Big Data Developer Client: Uber. Hyderabad, India July 2016 to Aug 2018 Responsibilities: Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like S, HDFS, Hive, Zookeeper and Sqoop. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. Installed and Configured Sqoop to import and export the data into Hive from Relational databases. Administering large Hadoop environments build and support cluster set up. Close monitoring and analysis of the Map Reduce job executions on cluster at task level and optimized Hadoop clusters components to achieve high performance. Used Python & SAS to extract, transform & load source data from transaction systems, generated reports, insights, and key conclusions. Involved in data loading and writing Hive UDFs. Worked with Linux server admin team in administering the server hardware and operating system. Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environments. Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS. Environment: Hadoop YARN, Spark, Spark SQL, Python, Hive, Sqoop, Map Reduce, PowerBI, Oracle, Linux Education Master s in Business Analytics, Dec 2022, University of Texas at Arlington. Bachelor s in Electronics & Communication, Jun 2016, JNTU Hyderabad. Keywords: cprogramm continuous integration continuous deployment quality analyst machine learning business intelligence database active directory microsoft procedural language Georgia |