Home

Gouatham - Data Engineer
[email protected]
Location: Baltimore, Maryland, USA
Relocation: Only Remote
Visa: H1B
Goutham Marthi
Data Engineer
Phone: (908) 745-9441
Email: [email protected]
SUMMARY
Around 6+ years of highly qualified professional experience in Analysis, Design, Development, and
Implementa>on as a Data Engineer
Experience in analyzing data using Python, SQL, MicrosoJ Excel, Hive, PySpark, Spark SQL for
Data Mining, Data Cleansing and Machine Learning.
Ability in making Spark Applica>ons using Python (PySpark) and Scala.
Experienced in Amazon AWS Cloud infrastructure services like EC2, VPC, S3, SNS, Glue, Cloud
Watch, Cloud Front, ElasYc Load Balancers.
Hands on experience working Amazon Web Services (AWS) using ElasYc Map Reduce (EMR),
RedshiJ, and EC2 for data processing.
Hands on experience on Unified Data Analy>cs with Databricks, Databricks Workspace User
Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.
Experience in Machine Learning with large data sets of Structured and Unstructured Data, Data
Acquisi>on, Data Valida>on, Predic>ve modeling, Data Visualiza>on.
Experience in developing Spark applica>ons using Spark - SQL in Databricks for data extrac>on,
transforma>on, and aggrega>on from mul>ple file formats for analyzing & transforming the data
to uncover insights into the customer usage paUerns.
Experience working with Snowflake Mul> cluster and virtual warehouses in Snowflake.
Exper>se in crea>ng Spark Applica>ons using Python (PySpark) and Scala.
Good understanding of data modeling (Dimensional & Rela>onal) concepts like Star-Schema
Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
Excep>onal skills in SQL server reporYng services, analysis services, Tableau, and data
visualizaYon tools.
SeYng up data in AWS using S3 bucket and configuring instance backups to S3 bucket.
Expert in working with Hive Data Warehouse tool-crea>ng tables, data distribu>on by
implemen>ng par>>oning and bucke>ng, wri>ng, and op>mizing the Hive SQL queries.
Experience with MapReduce, Pig, Programming Model, Installa>on and Configura>on of Hadoop,
HBase, ETL, Sqoop and Flume using Unix commands.
Responsible for building scalable distributed data solu>ons in both batch and streaming mode on
Big Query using Kaba, Spark and Core Java.
Substan>al experience working with big data infrastructure tools such as Python and RedshiJ also
proficient in Scala, Spark, Spark Streaming.
Excep>onal skills in SQL server repor>ng services, analysis services, Tableau, and data
visualizaYon tools.
Strong skills in analy>cal, presenta>on, communica>on, and problem-solving with the ability to
work independently as well as in a team and had the ability to follow the best prac>ces and
principles defined for the team.
Databases My SQL, NoSQL, SQL Server, Cassandra Mongo DB, SQL loader,
Teradata, Cassandra, PostgreSQL, Oracle
BI/ReporYng Tools SSRS, SSIS (ETL), Tableau, SAP Crystal Reports
Programming/ ScripYng Languages SQL, T-SQL, Python, Java, Spark (PySpark, Scala), JavaScript, Shell, JSON
Big Data Ecosystem HDFS, Hive, HBase, MapReduce, Kaaa, Sqoop, Map Reduce, Airflow
Cloud CompuYng Tools Amazon AWS (EMR, EC2, S3, RDS, Redshib, Snowflake, Glue, Elas>c
search, kinesis), MicrosoJ Azure (Data Lake, Data Storage, Data Bricks,
Azure Data Factory, Machine Learning, data pipeline, data analy>cs),
Snowflake, SnowSQL
Web Technologies HTML, XML, JSON, CSS, jQuery, JavaScript
SoJware/IntegraYon Tools Docker, Jenkins, GitHub, Kubernetes
Web Services/API s: SOAP, JMS, Apache Tomcat, Web API, Apache HTTP Server,
EDUCATION
Master of Science, Computer and InformaYon Systems Wilmington, DE
Wilmington University Dec 2022
PROFESSIONAL EXPERIENCE
Pure Insurance Sept 2022 Present
Role: Data Engineer
ResponsibiliYes:
Worked on EMR cluster to run PySpark Jobs for Data Inges>on.
Develop a Data pipeline using Airflow and Python to ingest current data and historical data in the data
staging area.
Responsible for defining data flow in ASW s3 bucket and from there to Curated bucket.
Wrote PySpark scripts for data inges>on into AWS redshib tables.
Developed Airflow digs for data inges>on from source systems to the data warehouse.
Developed Airflow data for data inges>on ac>vi>es from AWS s3 to AWS redshib tables.
Developed PySpark code to convert the .csv files to parquet.
Developed frameworks that involve PySpark code to bring the data from different source systems into
AWS s3 and to move data from AWS s3 staging area to Curated area.
Working experience on AWS Databricks cloud to organizing the data into notebooks and making it
easy to visualize data using dashboards.
Worked on SnowSQL and Snowpipe, also converted SQL Server mapping logic to Snow SQL queries.
Built different visualiza>ons and reports in tableau using Snowflake data.
Recreated exis>ng SQL Server objects in snowflake.
Implement One->me Data Migra>on of Mul>state level data from SQL server to Snowflake by using
Python and SnowSQL.
Extensively worked on Spark Streaming and Apache Kaba to fetch live stream data.
Experience in wri>ng queries in SQL and R to extract, transform and load (ETL) data from large datasets
using Data Staging.
Implemented CI/CD pipelines using Jenkins and build and deploy the applica>ons.
A highly immersive Data Science program involving Data Manipula>on & Visualiza>on, Machine
Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
Experience in SSIS script task, look-up transforma>ons, and data flow tasks using T- SQL.
Conver>ng SQL codes to Spark codes using Scala, PySpark, and Spark -SQL for faster tes>ng and
processing of data.
Good knowledge in seYng up batch intervals, split intervals, and window intervals in Spark Streaming.
Exper>se in building ETL Solu>ons end to end using different Source systems like SQL, Oracle, and
Filers and performed different types of Transforma>ons using Python modules and func>ons and
Loaded into Global Data warehouse (Hadoop and Teradata) and data marts.
Developed, configured, and monitored Apache Hadoop, HDFS, SQL databases and for monitoring,
administra>ng, and implemen>ng performance tuning func>ons, and database queries on distributed
systems.
Designed AWS Cloud Forma>on templates to create VPC, subnets to ensure successful deployment of
Web applicaYons and database templates.
Modernized data analy>cs environment by using cloud-based Hadoop plakorm and Version control
system GIT.
Responsible for implemen>ng monitoring solu>ons in Terraform, Docker, and Jenkins.
Designed and Implement test environment on AWS.
Environment: Spark SQL, Python, Scala, Tableau, AWS, Hive, NoSQL, R, ETL, Cassandra, MongoDB, Hadoop,
Docker, Jenkins, Github, Map Reduce, Snowflake, Teradata.
Harrison Walker and Harper Sept 21 Sept 22
Role: SoJware Engineer
Responsibility:
Created database objects like Tables, Views, Stored procedures, FuncYons, Joins, Indexes and Triggers
in T-SQL based upon the func>onalspecifica>ons.
ImplementedperformancetuningofslowrunningqueriesandstoredproceduresusingSQLProfilerand
Execu>onplan.
CreatedandscheduledSSISpackagestopulldatafromvariousdatasourceslikeExcelSpreadsheets, Flat File
and load them to SQLServer
Used SSIS to create ETL Packages to validate, extract, transform and load data
Hands on development of database objects, ETL packages and data valida>on rou>nes implemented
the concept of Incremental Data Load and Slowly Changing Dimension (SCD) Type
2forpackagestomovedatafromstagingtodimensiontablesinSSIS.
Scheduled SQL jobs to run SSIS packages on a daily, weekly, monthly basis using MS SQL Server
Integra>on Services (SSIS) and DBT (Data Build Tool).
Created ETL Projects using SSIS 2015 in Package Deployment Model as well as Project Deployment
Model, also efficiently used SSIS catalog for ETLMonitoring
Extensively worked on OLAP Cube Life CycleProcess,
Experience in building tabular model in cubes and used various DAX expressions to implement cubes
based on the businessrequirement
Created shared dimensiontables,measures,hierarchies,levels,cubesand aggrega>onsonMSOLAP/
Analysis Server(SSAS)
Wrote PYTHON Scripts to execute SSIS Packages.
Worked in CI/CD environment u>lizing AWS to facilitate SSIS package execu>on without failure
Worked on crea>ng Tabular Data Model Databases and created SSRS reports on top of it Involved in
Crea>ngParameterized, Cascaded,Drill-down,Cross-tabandDrill-throughReportsusingSSRS
Environment: MS SQL Server 2014,2012,2008 R2, TSQL, SQL Integra>on Services (SSIS), SQL Repor>ng Services
(SSRS), Data Warehousing(SSAS),DAX, MDX ,MS Access 2003/2005,VB.NET,C#, SQL Profiler, AWS, Neo4j, Python,
Azure.
Ola Aug 2018 Dec 2020
Role: Data Engineer
ResponsibiliYes:
Involved in designing and deploying mul>->er applica>ons using all the AWS services (EC2, S3,
RDS, Dynamo DB, SNS, SQS, IAM) focusing on high availability, fault tolerance, and auto-scaling in AWS
Cloud Forma>on
Using Spark, performed various transforma>ons and ac>ons and the result data is saved back to HDFS
from there to the target database Snowflake
Migrated an exis>ng on-premises applica>on to AWS. Used AWS services like EC2 and S3 for small
data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
Worked in building ETL pipeline for data inges>on, data transforma>on, and data valida>on on cloud
service AWS, working along with data steward under data compliance.
Worked on scheduling all jobs using Airflow scripts using python and added different tasks to DAG,
and LAMBDA.
Implemented a CI/CD pipeline using Jenkins, Airflow for Containers from Docker, and Kubernetes.
Experience in moving high and low volume data objects from Teradata and Hadoop to Snowflake
Used Pyspark for extrac>ng, filtering and transforming the Data in data pipelines.
Used Data Build Tool for transforma>ons in the ETL process, AWS lambda, AWS SQS
Experience in Developing Spark applica>ons using Spark - SQL in Databricks for data extrac>on,
transforma>on, and aggrega>on from mul>ple file formats for analyzing & transforming the data to
uncover insights into the customer usage paUerns.
Responsible for implemen>ng monitoring solu>ons in Terraform, Docker, and Jenkins.
Designed and implement a test environment on AWS.
Responsible for Account Management, IAM Management and Cost management.
Designed AWS Cloud Forma>on templates to create VPC, subnets to ensure successful deployment of
Web applicaYons and database templates.
Crea>ng S3 buckets also managing policies for S3 buckets and U>lizing S3 buckets and Glacier for
storage and backup on AWS.
Experience to manage IAM users by crea>ng new users, giving them limited access as per needs, assign
roles and policies to a specific user.
Responsible for es>ma>ng the cluster size, monitoring and troubleshoo>ng of the Spark data bricks
cluster.
Created Unix Shell scripts to automate the data load processes to the target Data Warehouse.
Implemented Apache-spark code to read mul>ple tables from the real->me records and filter the data
based on the requirement.
Stored final computa>on result to Cassandra tables and used Spark-SQL, spark-dataset to perform
data computa>on.
Used Spark for data analysis and stored final computa>on results to HBase tables.
Troubleshoot and resolve complex produc>on issues while providing data analysis and data
validaYon.
Environment: SQL Server, Hadoop, ETL opera>ons, Data Warehousing, Data Modelling, Teradata,
Cassandra, Snowflake, AWS Cloud compu>ng architecture, EC2, S3, Python, Spark, Scala, Spark-SQL,
Keywords: cprogramm csharp continuous integration continuous deployment business intelligence sthree database rlang information technology microsoft Alabama Delaware Massachusetts

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];4248
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: