Home

Pradeep - Data Engineer
[email protected]
Location: Williamsburg, Virginia, USA
Relocation: Yes
Visa: h1b
Pradeep Reddy

Mobile Number: 972-945-5529
Email: [email protected]

Data Engineer
Professional Summary
Over 10+ years of IT- experience in designing, implementing, and maintaining solutions on Big Data Eco-System.
Adept at implementing E2E solutions on Big Data using the Hadoop framework, executed, and designed big data
solutions on multiple distribution systems like Cloudera (CDH3 & CDH4), Hortonworks.
Expertise in designing data-intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data
engineering, Data Warehouse, Data Visualization, Reporting, and Data Quality solutions.
Expertise in Big Data processing using Hadoop, Hadoop Ecosystem (Map Reduce, Spark, Scala, Hive, Sqoop,
Flume, Pig and HBase, Cassandra, Mongo DB, Kafka Framework) implementation, maintenance, ETL and Big
Data analysis operations.
Experience in Transforming and Processing raw data for further analysis, visualization, and modeling.
Hands-on experience in designing and implementing data engineering pipelines and analyzing data using
Hadoop ecosystem tools like HDFS, MapReduce, Yarn, Spark, Sqoop, Hive, Kafka, Impala, Oozie, and HBase.
Hands-on experience with Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis,
Lambda, EMR, Redshift, DynamoDB.
Used PythonBoto3 to configure the AWS services like EC2, S3, and Redshift.
Worked with monitors, alarms, notifications, and logs for Lambda Functions, Glue Jobs using CloudWatch.
Experience implementing Cloud-based Linux OS in AWS to Develop Scalable Applications with Python.
Experience with Shell scripting to automate various activities.
Experience in design and implementation plans for hosting complex application workloads on MS Azure.
Proficient with Azure Data Lake Services (ADLS), Databricks & iPython Notebooks formats, Databricks Deltalakes
& Amazon Web Services (AWS).
Experience in Azure API Management, Security, Cloud-to-Cloud Integration.
Extensively Worked as an Azure Data Engineer using Azure Cloud, Azure Data Factory, Azure Data Lake Storage,
Azure Synapse Analytics, Azure Analytical services, Azure Cosmos, NO SQL DB, Azure HDInsight, Big Data
Technologies (Hadoop and Apache Spark) and Data bricks
Hands on experience in creating pipelines in Azure Data Factory V2 using activities like Move &Transform, Copy,
filter, for each, Get Metadata, Lookup, Data bricks etc.
Expertise in design and development of various web and enterprise applications using Typesafe technologies like
Scala, Akka, and Play framework.
Hands-on experience with message brokers such as Apache Kafka.
Proficient in designing and implementing transformations, jobs, and workflows using Pentaho tools.
Good knowledge in developing data ingestion processes using Flume Agents and Spark Streaming for real-time
and near-real time data analysis.
Experience in developing workflows using Flume Agents with multiple sources like Web Server logs, REST API
and multiple sinks like HDFS sink and Kafka sink.
Knowledge in GCP, Big Query, GCS bucket, G - cloud function, cloud data flow, Pub/suB cloud shell, GSUTIL, BQ
command line utilities, Data Proc, Stack driver.
Knowledge of Google BigQuery and architecting data pipelines from on-prem to GCP.
Knowledge on GCS buckets and GCP DataProc Clusters.

Performed ETL Integration with SSIS and handled FTP functionalities within that
Experience with snowflake utilities, snowSQL, Snowpipe, and Bigdata model techniques using python.
Technical Skills:
Big Data Technologies Hadoop, MapReduce 2(YARN), Hive, Pig, Apache
Spark, HDFS, Sqoop, Cloudera Manager, Kafka,
Amazon Azure, EC2.

Programming/Scripting Languages Scala, Python, REST, Java, XML, SQL, PL/SQL, HTML,

Shell Scripting, PySpark.

Databases HDFS, Oracle 9i, 10g & 11g, MySQL, DB2, HBase, Big

Query
Cloud Environments Azure, AWS, GCP
Tools and Services Jenkins, GitHub, Redmine, Jira, Confluence
Visualization Tableau, Thoughtspot, Plotly, Amazon Quicksight and

MS Excel.

Methodologies Agile Scrum, Waterfall, Design patterns
Operating Systems Windows, Unix, Linux
Professional Experience
Wells Fargo, Charlotte, NC May 2020 to Present
Data Engineer
Responsibilities:
Worked closely with the business analysts to convert the Business Requirements into Technical Requirements
and preparing low and high-level documentation.
Imported required tables from RDBMS to HDFS using Sqoop and used Storm/ Spark streaming and Kafka to get
real time streaming of data into HBase.
Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on
Hive and AWS cloud.
Used AWS Redshift, S3, Spectrum and Athena services to query large amount data stored on S3 to create a Virtual
Data Lake without having to go through ETL process.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3,
ORC/Parquet/Text Files into AWS Redshift.
Wrote HIVE UDF's as per requirements and to handle different Schema s and XML data.
Implemented ETL code to load data from multiple sources into HDFS using spark.
Developed data pipeline using Python, Hive to load data into data link. Perform data analysis data mapping for
several data sources.
Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using
Elasticsearch and loaded data into Hive external tables.

Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and
databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
Designed new Member and Provider booking system which allows providers to book new slots, with sending out
the member leg and provider Leg directly to TP through Datalink.
Open SSH tunnel to Google DataProc to access to yarn manager to monitor spark jobs.
Analyze various type of raw file like Json, Csv, Xml with Python using Pandas, NumPy etc.
Developed Spark applications using Scala for easy Hadoop transitions. And Hands-on experience in
writing Spark jobs and Spark streaming API using Scala and Python.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive developed Spark code
and Spark-SQL/Streaming for faster testing and processing of data.
Automated the existing scripts for performance calculations using scheduling tools like Airflow.
Designed and developed the core data pipeline code, involving work in Java and Python and built on Kafka and
Storm.
Good experience on Partitions, bucketing concepts in Hive, and designed both Managed and External tables in
Hive for optimized performance.
Performance tuning using Partitioning, and bucketing of IMPALA tables.
Hands on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache
Kafka.
Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Worked on NoSQL databases including HBase and Cassandra.
Environment: Map Reduce, HDFS, Hive, HBase, Python, SQL, Sqoop, Impala, Scala, Spark, Apache Kafka, AWS,
Zookeeper, J2EE, Linux Red Hat, Cassandra
PWC, Tampa, FL May 2019 to April 2020
Data Engineer
Responsibilities:
Understand the business needs and objectives of the system and interact with the end client/users and gather
requirements for the integrated system.
Experience in developing data processing tasks using Pyspark such as reading data from external sources, merge
data, perform data enrichment and load in to target data destinations.
Written Python utilities and scripts to automate tasks in AWS using boto3 and AWS SDK. Automated backups
using AWS SDK (boto3) to transfer data into S3 buckets.
Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
Worked in API group running Jenkins in a Docker container with RDS, GCP slaves in Amazon AWS
Use of API, OpenNLP&StanfordNLP for Natural Language Processing and sentiment analysis.
Creating the API with the Serverless framework, in Python 3.6.
Install and configure Splunk DB Connect and support of syslog-ng and rsyslog and Security Operation Centre
(SOC).
Working on different file formats like JSON, CSV, XML using spark SQL.
Implemented incremental load approach in spark for huge amount of data tables.
Used RESTful API with JSON for extracting Network traffic/Memory performance information.
Using Amazon Web Services (AWS) for storage and processing of data in cloud.

Created Incremental eligibility document and developed code for Initial load process.
Performed Transformations and Actions using Spark for improving the performance.
Load the transformation data into Hive/Save As Table in spark.
Designed and developed User Defined Function (UDF) for Hive and Developed the Pig UDF to pre-process the data
for analysis as well as experience in (UDAFs) for custom data specific processing.
Performing transformations using Hive, MapReduce, hands on experience in copying .log, snappy files into HDFS
from Greenplum using Kafka, loaded data into HDFS and extracted the data into HDFS from MYSQL using Sqoop.
Wrote Map Reduce jobs for text mining and worked with predictive analysis team and Experience in working with
Hadoop components such as HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, oozie, Impala and Flume.
Using Build tools like Maven to build projects.
Extensive experience on Unit testing by creating Test Cases.
Using Kafka, Spark Streaming for streaming purpose.
Experience in Development Methodologies like Agile, Waterfall.
Experience in code repositories like GitHub.
Environment: Apache Spark, Scala, Eclipse, HBase, Talend, Python, Pig, Flume, PySpark, Hortonworks, SparkSQL,
Hive, Teradata, Hue, Spark Core, Linux, GitHub, AWS, JSON.
Ascena Retail Group, Patskala, Ohio Feb 2016 to April 2019
Data Engineer
Responsibilities:
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure
Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
Enhancements to Vertex configuration utilizing Tax Assist, Taxability Drivers, creating and editing Taxability
Categories and Tax Rule overrides.
Develop stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.
Designed Azure Data warehouse, Azure blobs, Redshift to feed BI reporting
Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse,
Azure Data Factory, Azure SQL Data warehouse
Good hands-on experience in Data Vault concepts, data models, well versed understanding and implementation
on Data warehousing concepts/Data Vault.
Designed, reviewed, and created primary objects such as views, indexes based on logical design models, user
requirements and physical constraints
Developed procedure for cross-referencing Customer, Vendor and Exemption Certificate Manager Master Data
utilizing Microsoft Access DBMS for importing into Vertex O-series ver7 Taxability Manager System.
Evaluate Snowflake Design considerations for any change in the application
Worked with stored procedures for data set results for use in Reporting Services to reduce report complexity and
to optimize the run time. Exported reports into various formats (PDF, Excel) and resolved formatting issues.
Define virtual warehouse sizing for Snowflake for different type of workloads.
Designed the packages in order to extract data from SQL DB, flat files and loaded into Oracle database.
Responsible for defining cloud network architecture using Azure virtual networks, VPN and express route to
establish connectivity between on premise and cloud

Extensively used SQL queries to check storage and accuracy of data in database tables and utilized SQL for
querying the SQL database
Environment: Azure Data Factory, Spark (Python/Scala), Hive, Jenkins, Kafka, Spark Streaming, Docker Containers,
PostgreSQL, RabbitMQ, Celery, Flask, ELK Stack, MS-Azure, Azure SQL Database, Azure functions Apps, Azure Data
Lake, BLOB Storage, SQL server
Cybage Software Private Limited, Hyd, India July 2013 to Aug 2015
Data Engineer
Responsibilities:
Enhanced data collection procedures to include information that is relevant for building analytic systems.
Maintained and developed complex SQL queries, views, functions and reports that qualify customer requirements
on Snowflake.
Performed analysis, auditing, forecasting, programming, research, report generation, and software integration for
an expert understanding of the current end-to-end BI platform architecture to support the deployed solution
Developed test plans, Validating Test cases, executing Test cases and creating Validation Final and Summary
reports.
Worked with the ETL team to document the transformation rules for Data migration from OLTP to Warehouse
environment for reporting purposes.
Worked on ingestion of applications/files from one Commercial VPC to OneLake.
Worked on building EC2 instances, Creating IAM user s groups and defining policies.
Worked on creating S3 buckets and giving bucket policies as per client requirement.
Understanding the used case for data analytics and big data building solutions using Open source technology such
as Spark, Python.
Engaged with business users to gather requirements, design visualizations and train them to use self-service BI
tools.
Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle etc.
Designed easy to follow visualizations using Tableau software and published dashboards on web and desktop
platforms.
Creating the High Level and Low-Level design document as per the business requirement and working with
offshore team to guide them on design and development.
Continuously monitoring for the processes which are taking longer than expected time to execute and tune the
process.
Carry out necessary research and root cause analysis to resolve production issues during weekend support.
Monitor system life cycle deliverables and activities to ensure that procedures and methodologies are followed
and that appropriate complete documentation is captured.
Environment: SQL, MS Excel, Power BI, GIT, Jenkins Pipelines, Data Models, Shell Scripts, Linux, Python, Agile
Keywords: business intelligence sthree database information technology golang microsoft procedural language Florida North Carolina

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];2396
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: