suneetha - Big Data Engineer |
[email protected] |
Location: Indianapolis, Indiana, USA |
Relocation: |
Visa: USC |
Suneetha Chebrolu
Big Data Developer Ph: 360-851-9597 [email protected] Work Authorization: US Citizen Professional Summary:nowflake 9+ Years of IT experience in Big data Hadoop/Spark & J2EE including requirements Analysis and Design, Development, implementation, support, maintenance and enhancements in Finance & Insurance domains. 7+ years of experience as Hadoop/Spark Developer with good knowledge of Java Map Reduce, Hive, Pig Latin, Scala and Spark. Organizing data into tables, performing transformations, and simplifying complex queries with Hive. Performing real-time interactive analysis on massive data sets stored in HDFS. Strong knowledge and experience with Hadoop architecture and various components such as HDFS, YARN, Pig, Hive, Sqoop, Oozie, Flume, Spark, Kafka and Map Reduce programming paradigm. Developed many Map/Reduce programs. Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin and experience in developing custom UDF s using Pig and Hive. Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa. Good knowledge in using job scheduling tools like Oozie. Experienced in using IDE Tool like Eclipse 3.x, IBM RAD 7.0 Experience in requirement gathering, analysis, planning, designing, coding and unit testing. Strong work ethic with desire to succeed and make significant contributions to the organization. Strong problem-solving skills, good communication, interpersonal skills and a good team player. Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member. Technical Skills: Hadoop Technologies: Hadoop, HDFS, Hadoop Map-Reduce, Hive, SQOOP, Oozie, AVRO, Pig- Latin, Hue, CDH, Parquet, Scala, Spark, Python, AWS, S3, Athena, EMR, Apache Nifi,lambda, Glue, Terraform No Sql: HBase, DynamoDB IDE/Tools: Eclipse, IntelliJ Web and Application Servers: Web sphere, JBOSS, Tomcat Core Competency Technologies: Java, OOPS, JSP, servlets, JDBC, java 5 / java 6 / java 7, C, C++, shell scripting, Spark, SAS EG, Scala, Spark Streaming, Kafka Testing & Issue Log tools: JUnit 4, Bugzilla, HP Quality Centre SCM/Version control tools: PVCS, CVS, Sub Version, Bit bucket, Git Build and continuous Integration: Maven, SBT Data base: Oracle 8i/9i/10g, DB2 & MySQL 4.x/5.x OS: UNIX, LINUX, Windows Professional Experience: Role: Big Data Developer Aug 2022 - July 2023 Client: Bank of America, North Carolina Bank of America is one of the world's largest financial institutions, serving individuals, small- and middle-market businesses and large corporations with a full range of banking, investing, asset management and other financial and risk management products and services. Gathering requirements and ensuring the required functionalities are delivered as per the requirements Implemented Spark ETL using PySpark to generate outbound data for various Products Used Hive tables to store the ETL data which will be used for different insights Developed the pyspark code for AWS Glue jobs and for EMR Delivering the reports from ETL based on business requirements Enriching the data using multiple validation and rules through Spark application written in PySpark Build and maintain data pipelines using DBT Developed High Speed BI layer on Hadoop platform with Kafka, Apache Spark and Python Design, architecture and development of data analytics and data management solutions through PySpark Experience in developing and scheduling various Spark Streaming / batch Jobs using python (pyspark) Used Autosys to schedule the jobs. Used SQL to store the data tables after the ingestion. Environment: Tableau, DBT, Autosys, SQL, Spark, PySpark, Hadoop, Python, Hadoop, Snowflake, Glue, Hive, Git, Hadoop Project Title: Data Intelligence Team July 2021 - July 2022 Role: Big Data Developer Client: Maxar Technologies, Colorado Project Description: Maxar technologies Inc. Is a space technology company and specializing in manufacturing communications, Earth observation, radar and on-orbit servicing satellites, satellite products and related services. Used AWS Glue for data integration and transformation by running python scripts using the Glue ETL engine and used Redshift to store this transformed data. Use Java/J2EE technologies to develop web applications and add functionalities to existing applications. Build Spark applications on EMR to ingest data into the S3 from various sources. Visualized the transformed data using Tableau connector for AWS Redshift. Helped in the architecture design and automated the end-to-end Data/ETL pipelines using Lambda, Glue, and Python for transition into cloud from RDBMS storage. Designed and built ETL pipelines to automate ingestion of various forms of data for analysis and visualization. Designed and developed automation for SPARK, UNIX and Python Scripts using Airflow DAG Helped move all the raw data as well as transformed/cleaned data into the S3 for further processing. Build a batch process in Spark to generate feeds to downstream apps. Assisted in building/operating distributed systems of data extraction, ingestion, and processing of large data sets from multiple sources. Working on Airflow to schedule the spark jobs Visualized the transformed data using Tableau Environment: AWS (S3, EC2, EMR, Athena), Snowflake, Airflow, Glue, PySpark, Tableau Project Title: Data Migration Sep 2020 - July 2021 Role: Big Data Developer Client: Citi Bank, Dallas Project Description: CitiBank is the consumer division of financial services multinational Citigroup. The objective of the project is to migrate to implementation in spark. Responsibilities: Requirement analysis and mapping document creation. Involved in requirements gathering, analysis, design, development and test. ETL mapping, job development and complex transformations. Analyzed the systems and met with end users and business teams to define the requirements. Developed Spark Applications by using Java and Implemented Apache Spark data processing project to handle data. Written Java program to retrieve data from HDFS and providing it to REST Services. Used MAVEN for building jar files of MapReduce programs and deployed to cluster Implemented business logic by writing UDFs in Java and used various UDFs from other sources. Written multiple programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other codec file format. Lead offshore team and coordinated with onsite team Skills. Create coding architecture that leverages a reusable framework. Provide weekly status updates to the higher management. Ensure time delivery of projects to meet client needs. Conducted unit testing, system testing, performance testing. Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift. Used lambda function for Infrastructure automation: EC2 automated snapshot creation Environment: Java, Spark, Sql, Hive, Pyspark, Bitbucket, Hadoop, Hdfs, lambda, Athena Project Title: Data Analytics Sep 2019 - March 2020 Role: Big Data Developer Client: Nike, Portland, OR Responsibilities: Member of the Nike Consumer Data Engineering (CDE) team responsible for building the data pipelines to ingest the Nike Consumer data Experience designing and developing Cloud solutions for data and analytical workloads such as warehouses, Big Data, data lakes, real-time streaming and advanced analytics Solicit detailed requirements, develop designs with input from the Sr. Data Architect, and develop code consistent with existing practice patterns and standards Responsible for Migrating the process from Cloudera Hadoop to AWS NGAP2.0(Nike Global Analytics Platform) Designed and developed Airflow DAGS to validate the upstream files, Data sourcing and loading into Hive tables built on AWS S3 Working on Airflow to schedule the spark jobs Migrating the existing Hive Scripts to PySpark scripts and optimizing the process Used AWS Glue for the data transformation, validate and data cleansing Created Filters, Groups, Sets on Tableau Reports. Migrated Reports and Dashboards from Cognos to Tableau and Power BI. Environment: AWS (S3, EC2, EMR, Athena), Snowflake, Airflow, Glue, PySpark, Tableau, Hive and YARN. Project Title: Data Analytics Dec 2017 - Sep 2019 Role: Big Data Developer Client: Hilton Worldwide, Mclean VA Project Description: Hilton is forward thinking global leader of hospitality. Hilton is highly data driven company which uses Hadoop Big data architecture to make key business decisions. The ingested data from different upstream applications will be transformed and loaded to Redshift which will be used by Microstrategy reporting team to generate dashboards that aid in key business descisions. Responsibilities: Used NIFI as dataflow automation tool to ingest data into HDFS from different source systems. Developed common methods to bulk load raw HDFS files into data frame Developed common methods to persist data frame into S3, Red shift, HDFS, Hive Prune the ingested data to remove duplicates by applying window functions and perform complex transformations to derive various metrics. Used oozie scheduler to trigger spark jobs Created UDF s in spark to be used in spark sql. Used Spark API to perform analytics on data in Hive using Scala programming. Optimization of existing algorithms in Hadoop using Spark Context, Data Frames, Hive context. Used AWS Glue for the data transformations. Spark Dataframes are created in Scala for all the data files which then undergo transformations. The filtered Dataframes are aggregated and transformed based on the business rules and saved as temporary hive tables for intermediate processing. The RDDs and data frames undergo various transformations and actions and are stored in HDFS/S3 as parquet Files. Copy data from S3 to Redshift tables using shell script Performed performance tuning of spark jobs using broadcast joins ,correct level of Parallelism and memory tuning. Analyze and define client's business strategy and determine system architecture requirements to achieve business goals Environment: Spark 2.2, Hadoop, Hive 2.1, HDFS, Java 1.8, Scala 2.11, HDP, AWS, Glue, Redshift, Oozie, Intellij, ORC, Athena, Shell Scripting, bitbucket, airflow, Python, Pyspark Project Title: Corporate Analytics Apr 2016 - Dec 2017 Role: Big Data Developer Client: CDPHP, Albany, Newyork Project Description: CDPHP is one of the top health insurance companies in Newyork State. The objective of the project is to generate dasboards based on customer claim information to assist the business in rapid decision making. The project introduces big data competency to analyze claims , perform fraud analysis to reduce inappropriate practices from certain PCP s and shared saving programs. It involved developing cloud-based scalable infrastructure to support insight generation. Responsibilities: Involved in loading raw data from AWS S3 to Redshift. Involved in developing & scheduling DAGS using Airflow Involved in implementing transformations on raw data and moved data to cleansed layer in parquet fileformat. Involved in building logic for incremental data ingestion process. Involved in data ingestion from oracle to aws s3. Evaluated Spark jdbc (scala/pyspark/sparkR) vs Sqoop for data ingestion from On-Premise/On-cloud databases to AWS S3. Created shell scripts to invoke pyspark commands. Involved in creating hand shake process between autosys and airflow using autosys api in python Involved in code automation for dags in airflow using jinja templates. Responsibile in performing unit testing. Involved in performing several proof of concepts in building the change data capture using next gen platform tools. Spark RDDs are created for all the data files which then undergo transformations. The filtered RDDs are aggregated and curated based on the business rules and converted into dataframes and saved as temporary hive tables for intermediate processing. The RDDs and dataframes undergoes various transformations and actions and are stored in HDFS as parquet Files. Environment: Spark JDBC, Pyspark, AirFlow, AWS S3, Athena, EMR, Kerberos, Autosys, Avro, Parquet, Github, Python, Oracle, Sqoop Teradata, CloudWatch, Redshift, Confluence, Agile Project Title: PASSPORT April 2013 - Apr 2016 Role: Java Developer Client: Prudential Financial, Iselin, NJ Project Description: Prudential is one of the largest insurance and financial service institutions in the United States of America. PASSPORT (Plan Admin Setup System Portal) Application is built for plan Setup. This helps to setup a plan and maintain tool that handles Vision and peripheral system set-up for defined contribution plans. Responsibilities: Involved in software development on web-based front-end applications. Involved in development of the CSV files using the Data load. Performed unit testing of the developed modules. Involved in bug fixing, writing SQL queries & unit test cases. Used Rational Application Developer (RAD). Used Oracle as the Backend Database. Involved in configuration and deployment of front-end application on RAD. Involved in developing JSP s for graphical user interface. Implemented code for validating the input fields and displaying the error messages. Environment: Java, JSP, Servlets, Apache Struts framework, WebSphere, RAD, Oracle, PVCS, TOAD Education: Bachelor s Degree from Acharya Nagarjuna University. Year pass - 2000 Keywords: cprogramm cplusplus business intelligence sthree information technology hewlett packard New Jersey Virginia |