KARTHIK REDDY KARNA - DATA ENGINEER |
[email protected] |
Location: Allen, Texas, USA |
Relocation: |
Visa: GREEN CARD |
Karthik Reddy Karna
Azure Snowflake Data Engineer Mobile: 940-536-3859 Mail: [email protected] PROFESSIONAL SUMMARY To secure a challenging role as a Data Engineer, with around 10 years of software industry experience, with a focus on Azure cloud services and Big Data technologies like Spark, MapReduce, Hive, Yarn, and HDFS, using programming languages such as Scala and Python. With my 4 years of experience in Data Warehouse, I possess a deep understanding of ETL processes, data modeling, and data warehousing. I am committed to delivering efficient, scalable data solutions that drive business growth and support strategic decision-making. Highly results-driven Data Engineer specializing in the design and implementation of scalable data ingestion pipelines using Azure Data Factory. Expertise in leveraging Azure Databricks and Spark for distributed data processing and transformation tasks, ensuring optimal performance. Skilled in maintaining data quality and integrity through robust validation, cleansing, and transformation operations. Adept at architecting cloud-based data warehouse solutions on Azure, utilizing Snowflake for efficient data storage, retrieval, and analysis. Extensive experience with Snowflake Multi-Cluster Warehouses and deep understanding of Snowflake cloud technology. Proficient in utilizing advanced Snowflake features such as Clone and Time Travel to enhance database applications. Actively involved in the development, improvement, and maintenance of Snowflake database applications. Expertise in building logical and physical data models for Snowflake, adapting them to meet changing requirements. Skilled in defining roles and privileges to control access to different database objects. Thorough knowledge of Snowflake database, schema, and table structures, enabling efficient data organization and retrieval. Proven proficiency in optimizing Spark jobs and utilizing Azure Synapse Analytics for big data processing and advanced analytics. Track record of success in performance optimization and capacity planning to ensure scalability and efficiency. Experienced in developing CI/CD frameworks for automated deployment of data pipelines, collaborating with DevOps teams. Proficient in scripting languages such as Python and Scala, enabling efficient automation and customization. Skilled in utilizing Hive, SparkSQL, Kafka, and Spark Streaming for ETL tasks and real-time data processing. Strong working experience in the Hadoop ecosystem, including HDFS, MapReduce, Hive, and Python. Hands-on expertise in developing large-scale data pipelines using Spark and Hive. Experience in Data Modeling, Database Design, SQL Scripting, Development, and Implementation of Client-Server & Business Intelligence (SSIS, SSAS, SSRS) applications. Extensive experience with T-SQL in constructing Triggers, Tables, implementing stored Procedures, Functions, Views, User Profiles, Data Dictionaries and Data Integrity. Very good Experience in building Analytical dashboards using Excel, SSRS, Power BI, Qlik Sense, and Tableau. Good Experience in embedding Azure reports, dashboards, and visuals into an application and applying row-level security to datasets. Competent in utilizing Hive on Spark and SparkSQL to fulfill diverse data processing needs through the execution of Hive scripts. Collaborated with cross-functional teams to gather requirements, design data integration workflows, and implement scalable data solutions. Proficient in Agile and Waterfall methodologies, applying a flexible and adaptive approach to project management based on project needs. Experienced in utilizing JIRA for project reporting, task management, and ensuring efficient project execution within Agile methodologies. Technical Skills: Azure Services: Azure Data Factory, Airflow, Azure Data Bricks, Logic Apps, Azure Functions, Snowflake, Azure DevOps, Azure SQL Database, Azure Data Lake Storage Gen 2, Azure SQL Data Warehouse, Azure Storage, Azure Blob Storage, Cosmos DB, HDInsight, Active Directory, IAM. Big Data: MapReduce, Hive, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper. Hadoop Distribution: Cloudera, Horton Works Programming Languages: C#, Python, Java, SQL, PL/SQL, HiveQL, Scala, T-SQL, C/C++. Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS. Version Control: Git, GitHub, Bitbucket. IDE & Build Tools, Design: Eclipse, Visual Studio. Databases: My SQL, MS Access, DB2, Azure SQL Server, Azure Synapse, SQL server, MS SQL Server 2016/2014/2012, Oracle 11g/12c, Mango DB, Terraform Work Experience: Role: Azure Data Engineer | Nov 2022 Till Now Client: CenterPoint Energy Houston, TX Responsibilities: Implemented highly scalable data ingestion pipelines using Azure Data Factory, efficiently ingesting data from diverse sources such as SQL databases, CSV files, and REST APIs. Developed comprehensive data processing workflows utilizing Azure Databricks, harnessing the power of Spark for distributed data processing and advanced transformations. Ensured exceptional data quality and integrity by performing thorough data validation, cleansing, and transformation operations through the seamless integration of Azure Data Factory and Databricks. Proficient in leveraging PowerShell scripting for automation tasks, enhancing efficiency in managing Azure resources, and streamlining repetitive data engineering workflows. Designed, developed, and maintained data pipelines for seamless data ingestion, transformation, and loading into Snowflake. Implemented ETL processes using Snowflake features such as Snowpipe, Streams, and Tasks to ensure efficient data flow and transformation. Designed and implemented data models and schemas within Snowflake to support reporting, analytics, and business intelligence needs. Optimized data warehouse performance and scalability using Snowflake features like clustering, partitioning, and materialized views. Integrated Snowflake with external systems and data sources, including on-premises databases, cloud storage, and third-party APIs. Implemented data synchronization processes to maintain consistency and accuracy across different systems. Expertly created and optimized Snowflake schemas, tables, and views, optimizing data storage and retrieval for high-performance analytics and reporting requirements. Developed and fine-tuned Spark jobs to execute intricate data transformations, perform aggregations, and accomplish machine learning tasks on large-scale datasets. Collaborated closely with data analysts and business stakeholders, gaining deep insights into their needs and translating them into effective data models and structures within Snowflake Leveraged the powerful capabilities of Azure Synapse Analytics to seamlessly integrate big data processing and advanced analytics, unlocking valuable data exploration and insights generation opportunities. Implemented sophisticated event-based triggers and scheduling mechanisms to automate data pipelines and workflows, ensuring optimal efficiency and reliability. Established comprehensive data lineage and metadata management solutions, enabling efficient tracking and monitoring of data flow and transformations across the entire ecosystem. Worked on SQL performance measuring, query tuning, and database tuning and set up the RBAC model at the infra and data levels for enhanced security. Identified and successfully addressed performance bottlenecks in both data processing and storage layers, achieving significant enhancements in query execution and data latency reduction. Implemented cutting-edge strategies such as partitioning, indexing, and caching in Snowflake and Azure services, resulting in superior query performance and reduced processing time. Conducted thorough performance tuning exercises and robust capacity planning to ensure the scalability and efficiency of the entire data infrastructure. Developed a robust CI/CD framework for data pipelines using the Jenkins tool, enabling streamlined deployment and continuous integration of data workflows. Collaborated closely with DevOps engineers to develop and implement automated CI/CD pipelines and test driven development practices on Azure, precisely tailored to meet client requirements. Proficiently programmed in scripting languages such as Python and Scala, leveraging their flexibility and power to optimize data processes and enable customization. Contributed actively to ETL tasks, meticulously maintaining data integrity and performing rigorous pipeline stability checks. Designed and developed custom dashboards that provide actionable insights into complex datasets using PowerBI. Utilized Power BI's advanced analytics features for forecasting, trend analysis, and other predictive insights. Demonstrated hands-on expertise in utilizing Kafka, Spark Streaming, and Hive for processing streaming data in specific use cases, unlocking real-time data insights. Implemented row-level security in Power BI to control data access based on user roles and permissions. Designed and implemented end-to-end data pipelines encompassing Kafka, Spark, and Hive, effectively ingesting, transforming, and analyzing data for diverse business needs. Developed and fine-tuned Spark core and Spark SQL scripts using Scala, achieving remarkable acceleration in data processing capabilities. Utilized JIRA extensively for project reporting, creating subtasks for Development, QA, and Partner validation, and effectively managing projects from start to finish. Deeply experienced in Agile methodologies, actively participating in the full breadth of Agile ceremonies, from daily stand-ups to internationally coordinated PI Planning sessions Environment: Azure Databricks, Azure Data Factory, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance, data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, Power Bi. Role: Azure Data Engineer | Jun 2021 Oct 2022 Client: Zions Bank, Houston. Responsibilities: Designed and implemented end-to-end data pipelines using Azure Data Factory to efficiently extract, transform, and load (ETL) data from diverse sources into Snowflake. Utilized Azure Databricks to design and implement data processing workflows, leveraging the power of Spark for large-scale data transformations. Built optimized and scalable Snowflake schemas, tables, and views to support complex analytics queries and meet stringent reporting requirements. Developed data ingestion pipelines using Azure Event Hubs and Azure Functions, enabling real time data streaming into Snowflake for timely insights. Leveraged Azure Data Lake Storage as a robust data lake solution, implementing effective data partitioning and retention strategies. Utilized Azure Blob Storage for efficient data file storage and retrieval, implementing compression and encryption techniques for enhanced security and cost optimization. Integrated Azure Data Factory with Azure Logic Apps to orchestrate complex data workflows and trigger actions based on specific events. Implemented rigorous data governance practices and data quality checks using Azure Data Factory and Snowflake, ensuring data accuracy and consistency. Implemented efficient data replication and synchronization strategies between Snowflake and other data platforms, leveraging Azure Data Factory and Change Data Capture techniques. Developed and deployed Azure Functions to handle data preprocessing, enrichment, and validation tasks within data pipelines. Leveraged Azure Machine Learning in conjunction with Snowflake to implement advanced analytics and Machine learning workflows, enabling predictive analytics and data-driven insights. Designed and implemented data archiving and retention strategies using Azure Blob Storage and Snowflake's Time Travel feature. Developed custom monitoring and alerting solutions using Azure Monitor and Snowflake Query Performance Monitoring (QPM), ensuring proactive identification and resolution of performance issues. Integrated Snowflake with Power BI and Azure Analysis Services, enabling the creation of interactive dashboards and reports for self-service analytics by business users. Optimized data pipelines and Spark jobs in Azure Databricks to improve performance, leveraging techniques such as Spark configuration tuning, caching, and data partitioning. Implemented comprehensive data cataloging and data lineage solutions using tools like Azure Purview and Apache Atlas, providing a holistic view of data assets and their relationships. Experienced in using Airflow to automate end-to-end data pipelines, encompassing data extraction, transformation, loading, and integration across various data sources and destinations. Implemented CI/CD pipelines to build and deploy the projects in the Hadoop environment. Attended daily sync-up calls between onsite and offshore teams to discuss the ongoing features/work items, issues, blockers, and ideas to improve the performance, readability, and experience of the data presented to end users. Environment: Azure Databricks, Azure Data Factory, Azure SQL, Logic Apps, Synapse serverless pools, Synapse spark pools, Synapse Hadoop, Azure Purview, batch processing, Snowflake, map side joins, Hive, Spark SQL, RDD, PowerApps, Hortonworks, Buckets, Azure DevOps, Function App, Parquet, Kafka, Git, Lookup, Get Meta Data, HDInsights, Clusters, Python, pipelines, PySpark, Data frames. Role: Data Engineer | Jun 2020 May 2021 Client: AIG Jersey City New Jersey. Responsibilities: Developed pipelines in Azure Data Factory (ADF) using various activities, linked services, and datasets to extract the data from different sources, On Premise and write back into Azure cloud storages. Worked on migration of data from On-premises to Azure Cloud into Azure SQL, Azure Synapse Analytics, and Azure Data Lake. Implemented the design plan for Azure Synapse Analytics with optimization solutions. Built Transformations using Databricks, Spark SQL, Scala/Python stored in to ADLS/Azure Blob. Involved in creating the external tables and views in Azure Synapse Analytics (DW) and created stored procedures to move the data from external to internal tables. Created and maintained IAM policies in compliance with industry standards and organizational requirements. Utilized SQL queries, including DDL, DML, and various database objects, for data manipulation and retrieval. Integrated on-premises (MySQL, Cassandra) and cloud-based (Blob storage, Azure SQL DB) data using Azure Data Factory, applying transformations, and loading data into Snowflake. Developed Notebooks in Databricks for data extraction of various file formats and transforming and loading the detailed, aggregate data into Azure Data Lake and transmitting data into external data warehouses. Orchestrated seamless data movement into SQL databases using Data Factory's data pipelines. Developed serverless functions and triggers in Azure Functions to respond to changes in CosmosDB data, enabling real-time data processing. Designed and implemented scalable data ingestion pipelines using Apache Kafka, Apache Flume, and Apache Nifi to collect and process large volumes of data from different sources. Implemented data quality checks and data cleansing techniques to ensure accurate and reliable data throughout the pipeline. Created ETL transformations and validations using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory. Collaborated with Azure Logic Apps administrators to monitor and resolve issues related to process automation and data processing pipelines. Optimized code for Azure Functions to extract, transform, and load data from various sources, including databases, APIs, and file systems. Designed, built, and maintained data integration programs within Hadoop and relational database management system (RDBMS) environments. Implemented a CI/CD framework for data pipelines using the Jenkins tool to automate and streamline deployment. Developed and maintained RBAC policies and configurations in ECreLT, conducting regular reviews and updates to align with organizational requirements and compliance standards, thereby enhancing data security and mitigating potential risks of unauthorized access or data breaches. Executed Data Integration processes, including MSK Kafka connect and collaborations with partners like Delta Lake (Databricks). Built and optimized data models and schemas using Apache Hive, Apache HBase, or Snowflake for efficient data storage and retrieval for analytics and reporting. Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views and other T-SQL code and SQL joins for applications following SQL code standards. Environment: Azure Databricks, Azure Data Factory, Azure SQL, Logic Apps, Synapse serverless pools, Synapse spark pools, Synapse Hadoop, Azure Purview, batch processing, Snowflake, map side joins, Hive, Spark SQL, RDD, PowerApps, Hortonworks, Buckets, Azure DevOps, Function App, Parquet, Kafka, Git, Lookup, Get Meta Data, HDInsights, Clusters, Python, pipelines, Spark, Data frames Role: Big Data Developer | June 2018 - May 2020 Client: Charter Communications, Englewood CO Responsibilities: Designed and implemented a robust ETL framework utilizing Sqoop, Pig, and Hive to efficiently extract data from diverse sources and make it readily available for consumption. Performed data processing on HDFS and created external tables using Hive, while also developing reusable scripts for table ingestion and repair across the project. Developed ETL jobs using Spark and Scala to migrate data from Oracle to new MySQL seamlessly tables. Leveraged Spark (RDDs, Data Frames, Spark SQL) and Spark-Cassandra Connector APis for a range of tasks, including data migration and generation of business reports. Created a Spark Streaming application for real-time sales analytics, enabling timely insights and decision-making. Conducted comprehensive analysis of source data, efficiently managing data type modifications, and leveraging Excel sheets, flat files, and CSV files for ad-hoc report generation in Power BI. Implemented dynamic cluster scaling strategies in Hive, allowing for the automatic adjustment of cluster resources based on workload demands, ensuring optimal resource allocation during peak processing periods. Proficient in configuring and optimizing Hive clusters to handle large-scale data sets, accommodating varying workloads, and ensuring consistent performance in data processing tasks. Utilized Hive configuration settings to fine-tune cluster parameters, balancing resource usage across nodes to achieve optimal performance in data processing tasks. Implemented robust monitoring solutions for Hive clusters, ensuring real-time visibility into system metrics and proactively addressing issues to maintain uninterrupted data processing capabilities. Analysed SQL scripts and designed solutions using PySpark, ensuring efficient data extraction, transformation, and loading processes. Utilized Sqoop to extract data from various data sources into HDFS, enabling seamless integration of data into the big data environment. Handled data import from multiple sources, performed transformations using Hive and MapReduce, and loaded processed data into HDFS. Implemented dynamic cluster scaling strategies in Hive, allowing for the automatic adjustment of cluster resources based on workload demands, ensuring optimal resource allocation during peak processing periods. Leveraged Sqoop to extract data from MySQL into HDFS, ensuring seamless integration and availability of MySQL data within the big data environment. Implemented deployment automation using YAML scripts, streamlining the build and release processes for efficient project delivery. Worked extensively with a range of big data technologies, including Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka, and Sqoop. Implemented data classification algorithms using well-established MapReduce design patterns, enabling efficient data classification and analysis. Utilized advanced techniques such as combiners, partitioning, and distributed cache to optimize the performance of MapReduce jobs. Leveraged Git and GitHub repositories for efficient source code management and version control, ensuring seamless collaboration and code tracking. Environment: HDFS, Apache Hive, Apache Pig, Apache Spark, Apache HBase, Zookeeper, Apache Flume, Apache Kafka, Sqoop, MySQL, Oracle, Git and GitHub, Power BL Role: Data warehouse Developer | Feb 2014 -May 2018 Client: Truist Bank, Atlanta, GA Responsibilities: Managed and administered SQL Server databases, including creation, manipulation, and support of database objects. Contributed to data modelling and designing physical and logical database structures. Assisted in integrating front-end applications with the SQL Server backend, ensuring smooth data interactions and seamless functionality. Created stored procedures, triggers, indexes, user-defined functions, and constraints to optimize database performance and retrieve the desired results. Utilized Data Transformation Services (DTS) to import and export data between servers, ensuring smooth data transfer and synchronization. Wrote T-SQL statements to retrieve data and conducted performance tuning on T-SQL queries for improved query execution times. sServer using SSIS/DTS, employing features like data conversion and derived column creation as per requirements. Collaborated with stakeholders to understand data migration requirements and objectives, ensuring seamless integration of SSIS packages into the new data ecosystem. Generated migration reports to track progress and identify issues, facilitating a smooth migration process for SSRS reports. Supported the team in resolving issues related to SQL Reporting Services and T-SQL, leveraging expertise in report creation, including cross-tab, conditional, drill-down, top N, summary, form, OLAP, and sub-reports, while ensuring proper formatting. Provided application support via phone, offering assistance and resolving queries related to the SQL Server environment. Developed and tested Windows command files and SQL Server queries for monitoring the production database in a 24/7 support environment. Implemented comprehensive logging for ETL loads at both the package and task levels, capturing and recording the number of records processed by each package and task using SSIS. Developed, monitored, and deployed SSIS packages for efficient data integration and transformation processes. Environment: IBM WebSphere DataStage EE/7.0/6.0 (Manager, Designer, Director, Administrator), Ascential Profile Stage 6.0, Ascential QualityStage 6.0, Erwin, TOAD, Autosys, Oracle 9i, PL/SQL, SQL, UNIX Shell Scripts, Sun Solaris, Windows 2000. Keywords: cprogramm cplusplus csharp continuous integration continuous deployment quality analyst access management business intelligence database active directory information technology microsoft procedural language Colorado Georgia Texas |