RAJA RAMADUGU - Lead Data Engineer |
[email protected] |
Location: Plano, Texas, USA |
Relocation: Yes |
Visa: H1B |
Professional Summary
Having 10 years of strong experience in all phases of Analysis, Development, Implementation and support of Data Warehousing Applications and having 5+ years in big data Eco Systems along with cloud computing Proficient in analyzing client s business process and delivering the accurate solution to incorporate the same business logic in software product. Experience in database design, entity relationships and database analysis, programming SQL, stored procedures PL/SQL, packages and triggers in Oracle. Working experience in ingesting data into HDFS using Sqoop and Talend. Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS. Exposure of ETL methodology for Data Integration, Extraction, Transformation and Loading into data lake layer using ETL tools. Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing, and analysis of data. Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts. Hands on experience with Big Data core components and Ecosystem (Spark, Spark SQL, Hadoop, HDFS, Map Reduce, YARN, Zookeeper, Hive, Pig, HBase, Sqoop, Python/Pyspark). Development of spark-based application to load streaming data with low latency, using Pyspark programming. Having hands on experience snowflake creating the materialized views and external tables and procedures. Hands on experience in AWS services like Glue, Redshift, Athena, IAM, Ec2, Step function, S3, EMR, RDS, Hive Queries Experienced with AWS services to smoothly manage application in the cloud and creating or modifying the instances. Hands on experience in Serverless technologies like AWS GLUE to perform ETL operations, Lambda Functions to trigger the pipeline. Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS. Good experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions. Created airflow dags based on the requirements to trigger pyspark jobs in Glue and EMR services using managed airflows services in AWS. Experience in working with different data sources like Flat files, XML files and Databases. Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies. Good understanding and knowledge of NOSQL database. Technical Skill Set ETL Tools: Talend ETL Developer, ODI (Oracle Data Integrator) 11g Cloud Services: Amazon AWS (S3, EMR, EC2, Glue, Redshift Athena, Step functions Lambda). Programming Languages: SQL, PLSQL, Scala, Python/Pyspark. Shell Scripting Data Visualization: Tableau Databases: Oracle 10g/11g, MySQL. Hadoop/Bigdata: HDFS, MapReduce, Sqoop, Hive, PIG, HBASE, FLUME, AWS, Cloudera Operating System: Windows, Unix. Orchestration: Apache Airflow, Dataflow, Redwood Data/Stream Processing: Apache Storm, Apache Kafka, Apache Spark Work Experience Worked as Technology Lead in Infosys Limited from November 2018 to October 2023 Worked as Engineer-Technology with Virtusa India Pvt Ltd, Chennai, from March 2015 to November 2018. Worked as software engineer with Ingenious solutions from June 2013 to Feb 2015. Education Summary B. Tech (Computer Science Engineering) from Kakatiya University in 2013 with an Aggregate of 71.6%. Project Summary PROJECT 1: Project Name : Grid Mod Client : Southern California Edition. Duration : April 2020 to Oct 2023 Role : Lead Data Engineer Description: The main aim of this project is to create the data source in Hadoop based on the Grid Sensitive Application data available in the source database (Oracle/NAS) after preprocessing which will be further used for downstream analysis on power transmission and distribution on regular basis and visualization in POWERBI to the user on regular interval. Roles & Responsibility: Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs. Responsible for Big data initiatives and engagement including analysis, POC and architecture. Loaded and transformed large sets of XML files using Hadoop/Big Data concepts. Installed and Configured Apache Hadoop clusters for application development and Hadoop tools. Worked on source data Ingestion to HDFS through Talend. Automated jobs using redwood to run the daily and weekly jobs. Analyzed the provided business requirement and generated the HQL/Pyspark to extract the required data based on the business requirements. Worked in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouse Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform. Implemented the Big Data solution using Hadoop, hive and Talend to pull/load the data into the HDFS system. Involved in converting Hive/SQL queries into Spark transformations using Spark data frames Python. Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations. Worked on performance tuning and enhancements on Hive jobs. Implemented Spark using Python and Spark SQL for faster processing of data. Worked on Unit testing for validating all the basic flows. Developed ETL processes to extract data from various source systems and loaded it into Snowflake. Written complex SQL queries and stored procedures to transform and analyze data within Snowflake. Worked on Tableau for Data Visualization and analysis on large data sets. Managed data storage and retrieval within Snowflake, optimizing storage utilization. Implemented data lifecycle management strategies to efficiently manage historical data and optimize storage costs. Configuring and Setting up Airflow Dag s as per the requirement to run our spark commands. Knowledge in Spark Core and Data Frames and implemented RDD transformations. Build reusable and modular code using Jinja2 Templating in DBT. Maintain data documentation and definitions within DBT as they build and develop lineage graphs. Designed and implemented data warehouse schemas in snowflake, including defining tables, views and Relationships to support reporting and analytics requirements. Worked on data migration activity from Hadoop to snowflake. Worked on snowflake creating the materialized views and external tables and procedures. Supported the team members in resolving various issues faced during development and testing. Client: American Health Insurance Nov 2018 to April 2020 Role: Big data Developer Description: The Federal communication commission in conjunction with Federal Trade Commission (FTC) established strict rules and regulations in telephone calling campaigns for sales and nonsales related information. These rules include enterprise policies and procedures an enterprise consolidated DO NOT CALL registry, express written consent to call cell Phone numbers; the ability to filter out phone numbers based on National and state DNC registries, the project has been developed using spark. Responsibilities: Understanding business functionality & analysis of business requirements. Created shell script and performed some validations on the files given from different vendors. Created the tables in hive and stored the data into hive tables after doing transformations according to the business requirements. Loaded and transformed large sets of structured data using Hadoop/Big Data concepts Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Created Spark Data frames and created logs for all the different vendors. Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala. Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying. Experienced in AWS cloud environment and on S3 storage and EC2 instances. Used Spark for Parallel data processing and better performances using Scala. Loading the data from SQL server to s3 using Sqoop to get good performance using EMR service in AWS Creating the glue pyspark & spark-Scala jobs for Provider and Commercial models based on data models using Glue & EMR services. Created the EMR cluster to run the pyspark jobs in a manner of persistent and step execution as well. Working on jobs performance and RCA to deliver the product without any defect. Worked on Lambda functions to automate the s3 event trigger whenever the file lands in s3 using boto3 library Created DAGS to run the jobs in airflow based on the requirement. Maintaining project documentation/knowledge base. Title: Data Engineer Client: British Telecommunication OCT 2016 to Oct 2018 Data Integrity (DI) Team is to validate the quality of data and generate reports from legacy data like CRM, find out the root cause analysis (RCA) and the tactical fix for the issues in various BT services such as PSTN, Broadband and BT Sport etc. The issues are then categorized into system/DI issues and process issues. Accordingly, enhancements are suggested to improve the customer experience as well as help BT realize revenue. Roles & Responsibilities: Interaction with client in gathering and understanding the requirements. Using Sqoop to import and export data from ORACLE to HDFS and vice-versa Involved in loading the data from Local File System to HDFS. Creation and perform operations on Hive tables for Business Analysis to generate reports. Scheduling Oozie jobs for automation based on the frequency. Implementing with spark-Scala for generating reports. Project: British Telecommunication Data Migration Client: British Telecommunication May 2014 to Sep 2016 Project involves data migration from legacy to WLMS2 system i.e. new harlequin s system. The program involves BT TSO/BTW s requirement to provide services covering successful data migration of ~1.5 M customers from legacy stack to Harlequins Stack. Gathering the requirements by interacting with onsite and after that analyze the requirements those are technically feasible or not. Expertise on deployment activities. Experience on Defect tracking tools like HP Quality Center. Involved in writing Procedures, Functions, Triggers, Packages, Cursors, Views Writing the test cases for unit and integration testing. Checking Fallouts and giving code fix after Migration events. Supporting for Testing team while doing ETL run. Individual Traits Willingness to learn and explore and support external/internal management tasks. Ability to multitask and Team Facilitator. Keywords: sthree information technology hewlett packard procedural language |