Home

suneetha - Big Data Engineer
[email protected]
Location: Indianapolis, Indiana, USA
Relocation:
Visa: USC
Suneetha Chebrolu
Big Data Developer
Ph: 360-851-9597
[email protected]
Work Authorization: US Citizen
Professional Summary:nowflake
9+ Years of IT experience in Big data Hadoop/Spark & J2EE including requirements Analysis and Design, Development, implementation, support, maintenance and enhancements in Finance & Insurance domains.
7+ years of experience as Hadoop/Spark Developer with good knowledge of Java Map Reduce, Hive, Pig Latin, Scala and Spark.
Organizing data into tables, performing transformations, and simplifying complex queries with Hive.
Performing real-time interactive analysis on massive data sets stored in HDFS.
Strong knowledge and experience with Hadoop architecture and various components such as HDFS, YARN, Pig, Hive, Sqoop, Oozie, Flume, Spark, Kafka and Map Reduce programming paradigm.
Developed many Map/Reduce programs.
Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin and experience in developing custom UDF s using Pig and Hive.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Good knowledge in using job scheduling tools like Oozie.
Experienced in using IDE Tool like Eclipse 3.x, IBM RAD 7.0
Experience in requirement gathering, analysis, planning, designing, coding and unit testing.
Strong work ethic with desire to succeed and make significant contributions to the organization.
Strong problem-solving skills, good communication, interpersonal skills and a good team player.
Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.

Technical Skills:

Hadoop Technologies: Hadoop, HDFS, Hadoop Map-Reduce, Hive, SQOOP, Oozie, AVRO, Pig- Latin, Hue, CDH, Parquet, Scala, Spark, Python, AWS, S3, Athena, EMR, Apache Nifi,lambda, Glue, Terraform
No Sql: HBase, DynamoDB
IDE/Tools: Eclipse, IntelliJ
Web and Application Servers: Web sphere, JBOSS, Tomcat
Core Competency Technologies: Java, OOPS, JSP, servlets, JDBC, java 5 / java 6 / java 7, C,
C++, shell scripting, Spark, SAS EG, Scala, Spark Streaming, Kafka
Testing & Issue Log tools: JUnit 4, Bugzilla, HP Quality Centre
SCM/Version control tools: PVCS, CVS, Sub Version, Bit bucket, Git
Build and continuous Integration: Maven, SBT
Data base: Oracle 8i/9i/10g, DB2 & MySQL 4.x/5.x
OS: UNIX, LINUX, Windows

Professional Experience:


Role: Big Data Developer Aug 2022 - July 2023
Client: Bank of America, North Carolina

Bank of America is one of the world's largest financial institutions, serving individuals, small- and middle-market businesses and large corporations with a full range of banking, investing, asset management and other financial and risk management products and services.
Gathering requirements and ensuring the required functionalities are delivered as per
the requirements
Implemented Spark ETL using PySpark to generate outbound data for various
Products
Used Hive tables to store the ETL data which will be used for different insights
Developed the pyspark code for AWS Glue jobs and for EMR
Delivering the reports from ETL based on business requirements
Enriching the data using multiple validation and rules through Spark application
written in PySpark
Build and maintain data pipelines using DBT
Developed High Speed BI layer on Hadoop platform with Kafka, Apache Spark and Python
Design, architecture and development of data analytics and data management solutions through PySpark
Experience in developing and scheduling various Spark Streaming / batch Jobs using python (pyspark)
Used Autosys to schedule the jobs.
Used SQL to store the data tables after the ingestion.
Environment: Tableau, DBT, Autosys, SQL, Spark, PySpark, Hadoop, Python, Hadoop, Snowflake, Glue, Hive, Git, Hadoop

Project Title: Data Intelligence Team July 2021 - July 2022
Role: Big Data Developer
Client: Maxar Technologies, Colorado

Project Description: Maxar technologies Inc. Is a space technology company and specializing in manufacturing communications, Earth observation, radar and on-orbit servicing satellites, satellite products and related services.

Used AWS Glue for data integration and transformation by running python scripts using the Glue ETL engine and used Redshift to store this transformed data.
Use Java/J2EE technologies to develop web applications and add functionalities to existing applications.
Build Spark applications on EMR to ingest data into the S3 from various sources.
Visualized the transformed data using Tableau connector for AWS Redshift.
Helped in the architecture design and automated the end-to-end Data/ETL pipelines using Lambda, Glue, and Python for transition into cloud from RDBMS storage.
Designed and built ETL pipelines to automate ingestion of various forms of data for analysis and visualization.
Designed and developed automation for SPARK, UNIX and Python Scripts using Airflow DAG
Helped move all the raw data as well as transformed/cleaned data into the S3 for further processing.
Build a batch process in Spark to generate feeds to downstream apps.
Assisted in building/operating distributed systems of data extraction, ingestion, and processing of large data sets from multiple sources.
Working on Airflow to schedule the spark jobs
Visualized the transformed data using Tableau
Environment: AWS (S3, EC2, EMR, Athena), Snowflake, Airflow, Glue, PySpark, Tableau



Project Title: Data Migration Sep 2020 - July 2021
Role: Big Data Developer
Client: Citi Bank, Dallas
Project Description: CitiBank is the consumer division of financial services multinational Citigroup. The objective of the project is to migrate to implementation in spark.


Responsibilities:
Requirement analysis and mapping document creation.
Involved in requirements gathering, analysis, design, development and test.
ETL mapping, job development and complex transformations.
Analyzed the systems and met with end users and business teams to define the requirements.
Developed Spark Applications by using Java and Implemented Apache Spark data processing project to handle data.
Written Java program to retrieve data from HDFS and providing it to REST Services.
Used MAVEN for building jar files of MapReduce programs and deployed to cluster
Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
Written multiple programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other codec file format.
Lead offshore team and coordinated with onsite team Skills.
Create coding architecture that leverages a reusable framework.
Provide weekly status updates to the higher management.
Ensure time delivery of projects to meet client needs.
Conducted unit testing, system testing, performance testing.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
Used lambda function for Infrastructure automation: EC2 automated snapshot creation
Environment: Java, Spark, Sql, Hive, Pyspark, Bitbucket, Hadoop, Hdfs, lambda, Athena

Project Title: Data Analytics Sep 2019 - March 2020
Role: Big Data Developer
Client: Nike, Portland, OR

Responsibilities:
Member of the Nike Consumer Data Engineering (CDE) team responsible for building the data pipelines to ingest the Nike Consumer data
Experience designing and developing Cloud solutions for data and analytical workloads such as warehouses, Big Data, data lakes, real-time streaming and advanced analytics
Solicit detailed requirements, develop designs with input from the Sr. Data Architect, and develop code consistent with existing practice patterns and standards
Responsible for Migrating the process from Cloudera Hadoop to AWS NGAP2.0(Nike Global Analytics Platform)
Designed and developed Airflow DAGS to validate the upstream files, Data sourcing and loading into Hive tables built on AWS S3
Working on Airflow to schedule the spark jobs
Migrating the existing Hive Scripts to PySpark scripts and optimizing the process
Used AWS Glue for the data transformation, validate and data cleansing
Created Filters, Groups, Sets on Tableau Reports.
Migrated Reports and Dashboards from Cognos to Tableau and Power BI.
Environment: AWS (S3, EC2, EMR, Athena), Snowflake, Airflow, Glue, PySpark, Tableau, Hive and YARN.


Project Title: Data Analytics Dec 2017 - Sep 2019
Role: Big Data Developer
Client: Hilton Worldwide, Mclean VA
Project Description: Hilton is forward thinking global leader of hospitality. Hilton is highly data driven company which uses Hadoop Big data architecture to make key business decisions. The ingested data from different upstream applications will be transformed and loaded to Redshift which will be used by Microstrategy reporting team to generate dashboards that aid in key business descisions.

Responsibilities:
Used NIFI as dataflow automation tool to ingest data into HDFS from different source systems.
Developed common methods to bulk load raw HDFS files into data frame
Developed common methods to persist data frame into S3, Red shift, HDFS, Hive
Prune the ingested data to remove duplicates by applying window functions and perform complex transformations to derive various metrics.
Used oozie scheduler to trigger spark jobs
Created UDF s in spark to be used in spark sql.
Used Spark API to perform analytics on data in Hive using Scala programming.
Optimization of existing algorithms in Hadoop using Spark Context, Data Frames, Hive context.
Used AWS Glue for the data transformations.
Spark Dataframes are created in Scala for all the data files which then undergo transformations.
The filtered Dataframes are aggregated and transformed based on the business rules and saved as temporary hive tables for intermediate processing.
The RDDs and data frames undergo various transformations and actions and are stored in HDFS/S3 as parquet Files.
Copy data from S3 to Redshift tables using shell script
Performed performance tuning of spark jobs using broadcast joins ,correct level of Parallelism and memory tuning.
Analyze and define client's business strategy and determine system architecture requirements to achieve business goals
Environment: Spark 2.2, Hadoop, Hive 2.1, HDFS, Java 1.8, Scala 2.11, HDP, AWS, Glue, Redshift, Oozie, Intellij, ORC, Athena, Shell Scripting, bitbucket, airflow, Python, Pyspark

Project Title: Corporate Analytics Apr 2016 - Dec 2017
Role: Big Data Developer
Client: CDPHP, Albany, Newyork

Project Description: CDPHP is one of the top health insurance companies in Newyork State. The objective of the project is to generate dasboards based on customer claim information to assist the business in rapid decision making. The project introduces big data competency to analyze claims , perform fraud analysis to reduce inappropriate practices from certain PCP s and shared saving programs. It involved developing cloud-based scalable infrastructure to support insight generation.

Responsibilities:
Involved in loading raw data from AWS S3 to Redshift.
Involved in developing & scheduling DAGS using Airflow
Involved in implementing transformations on raw data and moved data to cleansed layer in parquet fileformat.
Involved in building logic for incremental data ingestion process.
Involved in data ingestion from oracle to aws s3.
Evaluated Spark jdbc (scala/pyspark/sparkR) vs Sqoop for data ingestion from On-Premise/On-cloud databases to AWS S3.
Created shell scripts to invoke pyspark commands.
Involved in creating hand shake process between autosys and airflow using autosys api in python
Involved in code automation for dags in airflow using jinja templates.
Responsibile in performing unit testing.
Involved in performing several proof of concepts in building the change data capture using next gen platform tools.
Spark RDDs are created for all the data files which then undergo transformations.
The filtered RDDs are aggregated and curated based on the business rules and converted into dataframes and saved as temporary hive tables for intermediate processing.
The RDDs and dataframes undergoes various transformations and actions and are stored in HDFS as parquet Files.
Environment: Spark JDBC, Pyspark, AirFlow, AWS S3, Athena, EMR, Kerberos, Autosys, Avro, Parquet, Github, Python, Oracle, Sqoop Teradata, CloudWatch, Redshift, Confluence, Agile


Project Title: PASSPORT April 2013 - Apr 2016
Role: Java Developer
Client: Prudential Financial, Iselin, NJ

Project Description: Prudential is one of the largest insurance and financial service institutions in the United States of America. PASSPORT (Plan Admin Setup System Portal) Application is built for plan Setup. This helps to setup a plan and maintain tool that handles Vision and peripheral system set-up for defined contribution plans.

Responsibilities:
Involved in software development on web-based front-end applications.
Involved in development of the CSV files using the Data load.
Performed unit testing of the developed modules.
Involved in bug fixing, writing SQL queries & unit test cases.
Used Rational Application Developer (RAD).
Used Oracle as the Backend Database.
Involved in configuration and deployment of front-end application on RAD.
Involved in developing JSP s for graphical user interface.
Implemented code for validating the input fields and displaying the error messages.

Environment: Java, JSP, Servlets, Apache Struts framework, WebSphere, RAD, Oracle, PVCS, TOAD

Education:
Bachelor s Degree from Acharya Nagarjuna University.
Year pass - 2000
Keywords: cprogramm cplusplus business intelligence sthree information technology hewlett packard New Jersey Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];580
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: