Dinesh Kumar Kuppam - Lead Data Engineer |
[email protected] |
Location: Levittown, New York, USA |
Relocation: Yes |
Visa: H1B |
DINESH K
:(912)713-6930 : [email protected] https://www.linkedin.com/in/dinesh-kuppam-24251b127/ 12+ years of experience and innovative Data Engineer/Database Devloper specializing in the design, development, and optimization of real-time data pipelines and streaming architectures. With a proven track record of leveraging cutting-edge technologies, I excel in transforming raw data into actionable insights, empowering organizations to make data-driven decisions with speed and accuracy. Professional Summary Around 12+ years of IT experience, in dealing with Apache Hadoop Ecosystem Big Data Analytics and experience in Development and Support of database Applications. Hands on experience in using ecosystem components like Hadoop Map Reduce, HDFS, HBase, Zoo Keeper, Oozie, Hive, Cassandra, Sqoop, Pig, Flume. Hands-on experience with SQL and NoSQL databases such as Snowflake, HBase, Cassandra, Teradata databases, and MongoDB. Hands on experience in creating real time data streaming solutions using Apache Spark Core, Spark SQL, and Data Frames. Extensive knowledge in implementing, configuring, and maintaining Amazon Web Services (AWS) like EC2, S3, Redshift, Glue and Athena processing, High availability, fault tolerance, Scalability, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, Dynamo DB, Auto Scaling, Security Groups, Red shift, Cloud Watch, Cloud Formation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES. Proficient in designing and implementing big data solutions using Azure Databricks, leveraging its collaborative Apache Spark-based analytics platform. Expertise in designing, implementing, and optimizing data warehouse solutions using Snowflake on AWS, leveraging its elasticity, scalability, and performance capabilities. Experience in Extraction, cleansing, integration, loading of data from/to disparate data sources. Strong experience in loading and maintaining Data Warehouses and Data Marts using ETL processes. Experienced in database fine-tuning using Explain Plan and analysis tools, Monitor performance and identify bottlenecks and System Tuning and Monitoring indexes and analyzing their status for performance tuning and query optimization. Hands-on experience on AWS components such as EMR, EC2, S3, RDS, IAM, Auto Scaling, Cloud Watch, SNS, Athena, Glue, Kinesis, Lambda, Redshift, Dynamo DB to ensure a secure zone for an organization in AWS public cloud. Good Knowledge of all AWS services including Redshift, CloudFront, RDS instances, Autoscaling, and IAM policies. Skilled in setting up and configuring Azure Databricks clusters for processing large-scale datasets efficiently. Experienced in working with utilities like SQL Loader, External Tables, Import, Export to extract and load large volume of data. Extensively worked on Jenkins by installing, configuring and maintaining for Continuous integration (CI) and for -End-to-End automation for all build and deployments. Experience in branching, tagging and maintaining the version across the environments working on Software Configuration Management (SCM) tools like Subversion (SVN) and GIT. Worked on web servers like Apache and application servers like Web logic, Tomcat to deploy code. Developed various Unix Shell Scripts, Python scripts to manipulate data files, set-up environment variables, custom FTP utility, file archiving, Build/Deployment scripts. Proficient in architecting Snowflake data warehouses, including schema design, table structures, clustering, and partitioning strategies to optimize query performance and resource utilization. Extensive work experience on UNIX, Linux and Windows enterprise Applications. Hands on experience in logical/physical data modeling using ERWIN. Experience in Data ware housing Management. Experience in using Automation Scheduling tools. Strong understanding of object-oriented JavaScript. Good knowledge in understanding XML, HTML and CSS, JSON. Extensive exposure to Star, Snowflake Schema and Multidimensional Data Models. Good knowledge in CI (Continuous Integration) and CD (Continuous Deployment) methodologies. Troubleshooting the Java Applications with different test cases. Proficient in translating business requirements into logical data models, creating modularized enterprise applications. Experienced in developing and deploying data engineering pipelines on Azure Databricks, integrating with Azure services like Azure Data Lake Storage Gen2 and Azure Synapse Analytics. Extensive experience in migrating on-premises data warehouses or legacy systems to Snowflake on AWS, ensuring minimal downtime and maximum data integrity. Experience in interacting with users, analyzing client business processes, documenting business requirements, performing design analysis and developing technical design specifications. Ability to work in teams and independently with minimal supervision to meet deadlines. Excellent communication and interpersonal skills with the ability to manage responsibilities individually or as part of a team. Demonstrated high learning ability regarding new skills such as programming languages and application tools/software. Strong troubleshooting and Organizational skill, independent, self-managing and teamwork. Excellent communication, inter personal and business analytical skills, and ability to learn new concepts in a fast-paced environment. Project Management: Proficiency in software design and development using Object-Oriented Analysis and Design. Analysis/Design: Unit, integrated, functional testing. Skills Programming Language: C, PL/SQL, SQL Big Data & Big Data Ecosystem: Hadoop, Map Reduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume Operating systems: WINDOWS XP/Vista/7, LINUX, UNIX, UBUNTU & MAC OS-X Tools/Utilities: SQL DEVELOPER, SQL+, TOAD, ORACLE APEX Data Bases: Oracle, Snowflake, Mongo Db, Cassandra, HBase, Redshift, Postgres, MySQL CI/CD Tools: Jenkins Versioning Tools: SVN, GIT SDLC Methodologies: Agile, Waterfall Project Management: MS Office Suite, Clarity ETL/ELT Tools: SSIS, Informatica power center, IBM Data Stage 8.5, DBT Automation Tools: Autosys, control M Scripts: Unix, Python, Ant, Maven Testing Api: JMeter, Junit, cucumber Work History Apple Austin Texas June 2020 Present Role: Lead Data Engineer Responsibilities: Implementation of various services to support multiple recon projects for Apple Products Reconciliation internal to apple. Work on Expansion strategy and Enhancement requests from various business stakeholders across including EMEIA, AMR and APAC regions. Designed and implemented scalable data ingestion pipelines using Azure Data Factory, ingesting data from various sources such as SQL databases, CSV files, and REST APIs. Developed data processing workflows using Azure Databricks, leveraging Spark for distributed data processing and transformation tasks. Ensured data quality and integrity by performing data validation, cleansing, and transformation operations using Azure Data Factory and Databricks. Designed and implemented a cloud-based data warehouse solution using Snowflake on Azure, leveraging its scalability and performance capabilities. Created and optimized Snowflake schemas, tables, and views to support efficient data storage and retrieval for analytics and reporting purposes. Developed and optimized Spark jobs to perform data transformations, aggregations, and machine learning tasks on big data sets. Leveraged Azure Synapse Analytics to integrate big data processing and analytics capabilities, enabling seamless data exploration and insights generation. Configured event-based triggers and scheduling mechanisms to automate data pipelines and workflows. Implemented data lineage and metadata management solutions to track and monitor data flow and transformations. Identified and resolved performance bottlenecks in data processing and storage layers, optimizing query execution and reducing data latency. Implemented partitioning, indexing, and caching strategies in Snowflake and Azure services to enhance query performance and reduce processing time. Conducted performance tuning and capacity planning exercises to ensure the scalability and efficiency of the data infrastructure. Developed CI/CD framework for data pipelines using Jenkins tool. Collaborated with DevOps engineers to developed automated CI/CD and test-driven development pipeline using azure as per the client requirement. Involved in running all the Hive scripts through Hive on Spark and some through Spark SQL. Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing data. Designed and implemented real-time data processing solutions using Kafka and Spark Streaming, enabling the ingestion, transformation, and analysis of high-volume streaming data. Developed Spark core and Spark SQL scripts using Scala for faster data processing. Bimbo Bakeries USA New York March 2015 June 2020 Role: Senior Data Engineer Experience in working with distributed systems and Spark jobs optimization Designed and deployed scalable and fault-tolerant streaming data pipelines on AWS, processing over 1TB of data daily with sub-second latency. Experienced with PySpark and SparkSQL. Involved in developing spark application according to business requirements Developed Spark Programs using Python APIs to migrate application from SQL. Experience in RDS (Relational Database Service) is a managed database service provided by Amazon Web Services (AWS) that simplifies the setup, operation, and scaling of relational databases. RDS supports various database engines such as MySQL, PostgreSQL, Oracle, SQL Server. Proficient in managing Snowflake objects and resources using Snowflake's native interface, Snow SQL CLI, or Snowflake's web interface for administration and monitoring. Involved in Designing database to implement Micro Service architecture using Spring Boot Framework. Developed normalized Logical and Physical database models to design Authentication, Work flow Approval Process. Involved in designing and implementing a custom ETL solution in Java to allow for near real time sync between disparate systems. Responsible for smooth deployment of DBT scripts from GIT and DBT schedules to integrate with snowflake. Create snow pipe to move data from S3 to snowflake. Maintain data warehouse with more reliability and security. Responsible for setting up DBT from scratch and created 100+ models in DBT by connecting to snowflake and created jobs to process data. Familiarity with Snowflake's security features such as role-based access control (RBAC), encryption, and data masking to ensure data privacy and compliance with regulatory requirements. Spearheaded the adoption of Snowflake for real-time analytics, transforming batch-oriented data warehouses into real-time data platforms, enabling near-instantaneous insights for business users. Project and team management including team capacity, requirement gathering and prioritization of tasks before sprint planning Ensures that the Scrum process is rigorously followed by the team. This includes conducting Scrum ceremonies such as Sprint Planning, Daily Standup, Sprint Review, and Sprint Retrospective. Proposed and won approval in configure Jenkins CI for our entire build and deployment process. This is easing quick deployment to all our environments including production. Setup different user accounts for the different teams as well as other needed configurations to support our CI process. Involved in replace our custom ETL platform To Use Apache Kafka. TIAA-CREF Charlotte March 2014 March 2015 Role: Data Engineer Designing and implementing semi-structured data analytics platform leveraging Hadoop. Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level. Used Sqoop to load data from RDBMS into HDFS. Involved in transforming data from legacy tables to HDFS, and HBase tables using Sqoop . Implemented test scripts to support test driven development and continuous integration. Exported analyzed data to relational databases using Sqoop for visualization and generate reports for the BI team. Good understanding of ETL tools and their application to Big Data environment. Involved in Siebel To Salesforce.com Migration. Written complex Oracle SQL queries To Pull Opportunities and activities data from Siebel. Optimized query performance by removing unnecessary columns, eliminating redundant and inconsistent data, establishing necessary joins, creating useful indexes. Wrote Packages to store the business rules and exceptions and wrote PL/SQL code using Ref cursors and collections, Proven track record of building data pipelines and ETL processes using Snowflake's native features or third-party tools like Matillion, Informatica, or Talend to ingest, transform, and load data into Snowflake data warehouses Involved in understanding of business processes and coordinated with business analysts to get specific user requirements. Created Error files and Log Tables containing data with discrepancies to analyze and re-process data. Involved in planning process of iterations under the Agile Scrum methodology. Involved in preparing test plans, unit testing, System integration testing, implementation and maintenance. Involved in UAT testing, Bugs fixing and the code was sent to Production. Quill Chicago November 2012 March 2014 Role: Oracle/ETL Developer Worked on implementing several POCs to validate and fit the several Hadoop eco system tools on CDH and Hortonworks distributions. I analyzed and implemented ODS, Data marts, Data warehouse For Staples Europe. Experienced in migrating and loading data from different sources to oracle database and performing all types of transformations during the process using SSIS Control-Flow and Data-Flow. Worked on Performance tuning using the Partitioning and indexing concepts (Local and Global indexes on partition tables). Involved in maintaining and updating the procedure for ETL process. Gained very good business knowledge on Predictive analytics encompasses a variety of statistical techniques from predictive modelling and data mining. Involved in Database Migration from Oracle 10g to 11g. Created materialized views on remote source database and automated scheduler of refreshing of materialized view on source side. Created the DB-Links pointing to the remote database to access the source materialized views. Extracted data from the source views by writing the procedures to load the data into staging tables. Created SQL Loader script generator application using UNIX shell scripting and PL/SQL. Loaded the order, Products, Customers & Sales data into the data ware house tables using SQL Loader script. Extensively did performance tuning on poor performing SQL using Explain plan, sql trace, Awr reports. Responsible for loading unstructured data into Hadoop File System (HDFS). Importing and exporting data into HDFS and Hive using Sqoop. Designed and implemented Incremental Imports into Hive tables. Written complex Oracle SQL queries with complexity using Inner/Outer joins, Union All, Intersect & Minus set operators. Created indexes on tables and optimized procedure queries. Prepared UNIX Shell Scripts and these shell scripts will be scheduled in AUTOSYS for automatic execution at the specific timings. Responsible for Unit, System and UAT testing the data and provided the test evidence reports. Involved in UAT testing, Bugs fixing and the code was sent to Production. Education Masters of Science: Electrical and Computer Engineering - Southern Illinois University Carbondale (SIUC) Bachelor of Technology: Electrical and Communications Engineering - Jawaharlal Nehru Technological University (JNTU) Certifications Oracle PL/SQL Certified, Oracle Nov 2017 SnowPro Core Certified, Snowflake - Jun2024 AWS Cloud Practitioner Jul2024 DataBricks Fundamentals Jul2024 Keywords: cprogramm continuous integration continuous deployment business intelligence sthree database information technology microsoft procedural language |