Home

Prasanth - Data engineer
[email protected]
Location: Denton, Texas, USA
Relocation: ready to relocate
Visa: OPT EAD
Name : Prashant G C
Email: [email protected]
Ph#:
Professional Summary:
Over 6+ years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Data Engineer/Data Developer and Data Modeler.
Experience in SDLC (Software Development Life Cycle) and was involved in all phases in projects.
Experience with Spark Architecture including Spark Core, Spark SQL, Spark Streaming and Spark MLlib.
Experience in Spark using Scala for loading data from local file systems, HDFS, Amazon S3, Relational and NoSQL databases using Spark SQL, import data into RDD, DataFrames and ingesting the data from a range of sources using Spark Streaming.
Experience in Data Warehouse/Data mart, ODS, OLTP and OLAP implementations teamed with project scope, Analysis, requirements gathering, data modelling, Effort Estimation, ETL Design, development, System testing, Implementation and production support.
Experienced in using Tableau Online and Tableau Server.
Hands on experience in working with Tableau Desktop, Tableau Server and Tableau Reader in various versions.
Experience in building and publishing POWER BI reports utilizing complex calculated fields, table calculations, filters, parameters.
Experience in streaming applications using Apache Kafka.
Experience in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch, data quality, metadata management, and master data management.
Experience structural modifications using Map-Reduce and analyze data using visualization/reporting tools (Tableau).
Experienced in using Pig scripts to do transformations, event joins filters and pre-aggregations before storing the data into HDFS.
Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift, and EC2 for data processing.
Experience in Dimensional Modeling using Star and Snowflake schema methodologies of Data Warehouse and Integration projects.
Experience with the Apache Airflow engine that can easily schedule and run my complex data pipelines which will make each task to get executed in a correct order.
Experience in integration of various data sources with multiple Relational Databases like SQL Server, Teradata, and Oracle.
Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.

Technical Skills:
Databases Snowflake, AWS RDS, Teradata, Oracle, MySQL, Microsoft SQL, Postgre SQL.
NoSQL Databases MongoDB, Hadoop HBase and Apache Cassandra.
Programming Languages Python, SQL, Scala, MATLAB.
Cloud Technologies AWS, Docker
Data Formats CSV, JSON
Querying Languages SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL
Integration Tools Jenkins
Scalable Data Tools Hadoop, Hive, Apache Spark, Pig, Map Reduce, Sqoop.
Operating Systems Red Hat Linux, Unix, Windows, macOS.
Reporting & Visualization Tableau, Matplotlib.

Professional Experience:
Client: American Airlines, Dallas, TX Jul 2022 Till Date
Role: Data Engineer
Responsibilities:
Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing and transforming.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
Developed predictive analytics using Apache Spark Scala APIs.
Developed of Python APIs to dump the array structures in the Processor at the failure point for debugging.
Designed and Developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.
Developed ETL's for Data Extraction, Data Mapping and data Conversion using SQL, PL/SQL and various ETL scripts.
Involved in ETL processes, Data warehousing methodologies and concepts including star schemas, snowflake schemas, dimensional modeling and reporting tools, Operations Data Store concepts, Data Mart and OLAP technologies.
Implemented Visualized BI Reports with Tableau.
Administered user, user groups, and scheduled instances for reports in Tableau. Monitoring of Tableau Servers for its high availability to users.
Deployed web embedded power BI dashboards refreshed using gateways by using workspace and data source.
Wrote AWS Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models.
Involved in developing DAGS using Airflow orchestration tool and monitored the weekly processes.
Created Stored Procedures to transform the Data and worked extensively in PL/SQL for various needs of the transformations while loading the data.
Worked in Agile methodology, Attended SCRUM meetings, and standup meetings.
Involved in Daily and Weekly Status meetings.

Environment: Spark, Scala, Pyspark, Python, PIG, AWS, Docker, Restful, HDFS, Tableau, Snowflake, Apache Airflow, Power BI, ETL, Agile and SQL.

Client: Paychex, Rochester, NY Oct 2021 Jun 2022
Role: Data Engineer
Responsibilities:
Involved in Requirement gathering phase to gather the requirements from the business users to continuously accommodate changing user requirements.
Developed Spark scripts by using Python in PySpark shell command in development.
Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
Responsible for Writing the Data Quality checks, based on the existing source code, using Python and PySpark dataframe work in Databricks platform (which Improved process time).
Responsible for extracting the data and loading the data using the Python.
Design and develop efficient PySpark programs using cloud-based data platforms (EMR) to extract/transform/load data in between various data warehouse applications.
Developed ETL procedures to transform the data in the intermediate tables according to the business rules and functionality requirements.
ETL Restarting capability for a date or date range or from point of failure or from beginning
Created report schedules on Tableau server.
Validated the Tableau insight center reports to make sure all the data is populated as per requirements.
Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
Involved on creating multiple kind of Report in Power BI and present it using Story Points.
Designed the data marts in dimensional data modeling using star and snowflake schemas.
Performed analysis on the unused user navigation data by loading into HDFS and writing MapReduce jobs.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
Involved in developing and writing Pig scripts to store unstructured data into HDFS
Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
Involved in Agile methodologies, daily scrum meetings, spring planning.
Actively participated and provided feedback in a constructive and insightful manner during weekly Iterative review meetings to track the progress for each iterative cycle and figure out the issues.

Environment: Spark, Scala, Pyspark, Python, PIG, AWS, Docker, Restful, HDFS, Tableau, Snowflake, Apache Airflow, Power BI, ETL, Agile and SQL.

Client: GrowByData, Nepal Sep 2019 Jul 2021
Role: Data Engineer
Responsibilities:
Gathering business requirements, business analysis and design various data products.
Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
Developed Simple to complex MapReduce Jobs using Hive and Pig.
Wrote pre-processing queries in python for internal spark jobs
Reviewed basic SQL queries and edited inner, left, and right joins in Tableau Desktop by connecting live/dynamic and static datasets.
Performed Tableau type conversion functions when connected to relational data sources.
Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with Data Frames in Spark.
Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
Created Map Reduce programs to handle semi/unstructured data like xml, json and sequence files for log files.
Worked with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas.
Worked on SQL queries in dimensional data warehouses and relational data warehouses. Performed Data Analysis and Data Profiling using Complex SQL queries on various systems.
Followed agile methodology and involved in daily SCRUM meetings, sprint planning, showcases and retrospective.

Environment: Spark, Scala, PySpark, MapReduce, Pig, Python, ETL, Tableau, Power BI, AWS, JSON, Snowflake, Airflow, SQL, Agile and Windows.

Client: Skeinsoft, Nepal Sep 2017 Aug 2019
Role: Data Engineer
Responsibilities:
Performed Data analysis, Data Profiling and Requirement Analysis.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Performed Spark jobs with the Spark core, SparkSQL libraries for processing the data.
Worked on migrating MapReduce programs into Spark transformations using Scala.
Integrated data quality plans as a part of ETL processes.
Building data pipelines and complex ETL to process external client data using Python, Spark.
Performed Data cleaning and Preparation on XML files.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms
Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and analysis.
Wrote reports using Tableau Desktop to extract data for analysis using filters based on the business use case.
Performed Tableau type conversion functions when connected to relational data sources.
Worked with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas.
Use SQL queries and other tools to perform data analysis and profiling.
Analyzed the SQL scripts and designed the solution to implement using PySpark.
Used Agile methodology named SCRUM for all the work performed.
Involved in weekly walkthroughs and inspection meetings, to verify the status of the testing efforts and the project as a whole.

Environment: Spark, Scala, Hive, JSON, AWS, MapReduce, Hadoop, Python, XML, NoSQL, HBase, and Windows.
Keywords: cprogramm business intelligence sthree information technology procedural language Delaware New York Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1780
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: