Home

Chetan - Data Engineer and Data Bricks
[email protected]
Location: Flagstaff, Arizona, USA
Relocation: Yes
Visa: OPT EAD
Chetan S

Professional Summary
Experienced Data Engineer with 7 years of hands-on experience in designing, implementing, and optimizing data pipelines across industries such as healthcare, e-commerce, and finance.
Skilled in Cloud Technologies like Databricks, AWS, Azure, Cosmos DB and GCP, specializing in delivering scalable data solutions, including Data Warehouses, Big Data analytics pipelines, and Data Visualization and Reporting tools.
Proficient in Batch and Streaming Data Processing, using tools like Apache Spark, Sqoop, Spark Streaming, and Apache Kafka for both real-time and batch data ingestion and processing.
Experience with Migration Projects from on-premise to Cloud platforms (AWS, Azure, GCP), as well as building features and solutions from scratch for new business initiatives.
Business Insights and SQL Expertise, translating complex business requirements into efficient SQL queries to extract actionable insights from structured and unstructured datasets.
Expert in Real-Time Data Processing, designing and implementing pipelines using Kafka, Spark Streaming, and AWS services to ensure low-latency, real-time data ingestion and processing.
Big Data Ecosystem Proficiency, with expertise in tools like Hadoop, PySpark, and Hive, as well as optimizing ETL workflows for handling large-scale datasets.
Performance Optimization experience, achieving a 30% improvement in data processing efficiency, a 25% reduction in pipeline downtime, and a 20% faster data retrieval through techniques such as Z-order clustering and partitioning.
Data Governance and Security Expert, utilizing Databricks Unity Catalog and AWS IAM to enforce data governance.
Advanced Analytics and Visualization, leveraging tools like Amazon QuickSight, Power BI, and Tableau to build interactive dashboards that provide actionable insights, such as customer segmentation improvements.
Collaborative Team Player, working closely with cross-functional teams including data scientists, analysts, and IT professionals to deliver high-quality data solutions that support business decision-making.
Automation and Workflow Optimization, developing automated ETL workflows with AWS Step Functions, Apache Airflow, and Azure Data Factory, reducing manual intervention by 30%.
Data Modeling Expertise, optimizing data models for both structured and unstructured datasets, resulting in a 20% improvement in query performance and faster business insights.
CI/CD Pipeline Automation, building and automating CI/CD pipelines using Jenkins, GitLab, and Terraform for seamless ETL job deployments, improving deployment frequency by 40%.
Containerization with Docker and Kubernetes, deploying containerized data pipelines to improve scalability, reduce infrastructure complexity, and simplify data solution deployment.
Monitoring and Alerting Systems, integrating real-time monitoring tools like AWS CloudWatch and Datadog, reducing pipeline failure detection times by 50% and improving overall system reliability.
Fine-Tuning and Optimization, experienced in optimizing distributed data solutions for batch and stream processing using Apache Spark, Apache Flink, and Kafka, enhancing processing efficiency by 30%.
Cloud Architecture and Data Pipelines, skilled in building large-scale, enterprise-grade cloud-based data solutions using AWS, Azure, and GCP for both batch and streaming data.
Problem Solver with a Focus on Data Quality, improving data accuracy by 15% through validation scripts and governance enhancements across projects.
Strong experience in on-premise to cloud migrations using Azure, AWS, and GCP, specifically migrating data from sources like MySQL Server and Oracle DB into cloud-based systems.
Earned the Azure Data Engineer Associate certification from Microsoft, demonstrating expertise in designing and implementing data solutions on Microsoft Azure.


Technical Skills
Programming Language Python, SQL, Java, R
IDE s PyCharm, Jupyter Notebook
Cloud Technologies Azure (Azure Data Factory, Data Lake, Synapse, Blob Storage, Cosmos DB, Azure DevOps, Databricks), GCP (BigQuery, Google Cloud Storage, Dataflow, Dataproc, Cloud Composer, Cloud Monitoring, Google Cloud Deployment Manager, Google Cloud Data Catalog), AWS (EC2, S3, RDS, ECS, ECR, Batch, Lambda, Glue, Athena, AWS Pipeline, Redshift, Database Migration Service, CloudWatch, SQS, SNS, SageMaker)
Big Data Ecosystem Spark, DataBricks, Apache Airflow, Hadoop, Hive, Apache Kafka, Apache Flink, SQOOP, Informatica
Visualizations Power BI, Tableau, Excel
Packages & Data Processing NumPy, Pandas, Matplotlib, Seaborn, TensorFlow, PySpark, Data Pipelines, Jenkins
Version Control & Database GitHub, Git, SQL Server, PostgreSQL, MongoDB, DynamoDB, MySQL, Snowflake

Certifications
Microsoft Certified: Azure Data Engineer Associate
Databricks Certified: Databricks Lakehouse Fundamentals
Professional Experience
Data Engineer
Live Nation, Los Angles, California May 2023 Present

Technologies: Databricks, Azure Data Factory (ADF), Azure Cloud, PySpark, SparkSQL, Medallion Architecture, Databricks Unity Catalog, Delta Live Tables, Azure Data Lake Storage (ADLS), Z-order Clustering, SQL.
Collaborating within a team to develop and maintain scalable data pipelines using Databricks, Azure Data Factory (ADF), and Azure Cloud technologies, ensuring seamless data integration and processing.
Designing and managing data workflows in Azure Data Factory (ADF), automating ETL processes to handle complex data transformations and orchestrations.
Integrating Azure Databricks with other Azure services, such as Azure Data Lake Storage, Azure SQL Data Warehouse, and Azure Cosmos DB, for seamless data flow and processing.
Writing and optimizing data processing tasks using PySpark and SparkSQL in Databricks, improving the performance and scalability of data pipelines.
Currently working on the staging and dimension layers within the Medallion architecture, focusing on data organization and retrieval to enhance analytical capabilities.
Implementing and automating data workflows in ADF to streamline ETL processes, ensuring data accuracy and consistency across the pipeline.
Utilizing Databricks Unity Catalog for centralized data governance, managing metadata, and enforcing data security and compliance across the organization.
Building and managing Delta Live Tables in Databricks to create reliable and high-quality data pipelines with automated data lineage tracking and monitoring.
Integrating and optimizing data storage in Azure Data Lake Storage (ADLS), ensuring efficient data retrieval and reduced latency for analytics workloads.
Optimizing code performance and storage efficiency by implementing Z-order clustering, resulting in faster query execution and reduced data retrieval times.
Leveraging Kafka to stream and process data collected from various sources, including Oracle DB and MySQL Server, for real-time analytics and event-driven data processing.
Data Engineer
Cognizant Technology SolutionsClient Kredietbank and CERA, Chennai, INDIA Mar 2021 - Dec 2022
Worked as part of a team to develop and optimize scalable data pipelines using Databricks on Azure, resulting in a 30% improvement in data processing efficiency. Implemented Delta Lake tables, ensuring data consistency and reliability.
Leveraged Databricks Autoloader for efficient data ingestion and schema enforcement, improving data loading performance.
Used Z-Order indexing optimized data retrieval and reduced query latency in Delta Lake tables.
Employed partitioning strategies to enhance data management and performance within Delta Lake.
Mounted external storage in Databricks for seamless access to Azure Data Lake Storage and AWS S3 data.
Contributed to the implementation of Time Travel features in Delta Lake for easy data versioning and historical data access.
Applied the Medallion Architecture to organize data into Bronze, Silver, and Gold layers for improved data quality and query performance.
Assisted in setting up Databricks Clusters for scalable and efficient data processing.
Participated in setting up Databricks Workflows for automated data pipeline execution and orchestration.
Engaged in real-time data streaming using Kafka, enabling timely insights and fast decision-making.
Contributed to monitoring and optimizing Delta Cache to improve the performance of data queries.
Supported Delta Live Tables for real-time data processing and updating.
Worked with Databricks Unity Catalog to manage and govern data access and metadata.
Assisted in applying the Star Schema for effective data modeling and reporting.
Contributed to implementing vacuum commands to clean up old data and optimize Delta Lake storage.
Utilized PySpark and SparkSQL for data transformation and querying, enhancing data processing efficiency and scalability.

Project 1
Technologies: Databricks, Azure Synapse Analytics, Azure Data Factory, PySpark, SparkSQL, Azure Data Lake Storage (ADLS), Azure SQL Database, Python
Assisted in ingesting and processing customer data from Azure Data Lake Storage (ADLS), working with CSV, Parquet, and JSON files to prepare the data for loading into Azure Synapse Analytics using Databricks for scalable analytics.
Contributed to the development of ETL pipelines in Azure Data Factory, supporting data extraction from ADLS, transforming data using PySpark in Databricks, and loading it into Azure SQL Database, helping reduce pipeline execution time by 25%.
Worked with the team on building complex data models and analytical queries in Azure Synapse Analytics, using Databricks for processing large datasets, which supported more accurate customer segmentation for targeted marketing campaigns.
Participated in data processing by refining raw customer data from various file formats in ADLS using PySpark and SparkSQL within Databricks, helping improve data accuracy by 10% through consistent data cleaning and standardization.
Assisted in implementing data security measures in Databricks, ensuring sensitive customer data was protected during data transformation processes, contributing to compliance with banking industry regulations.
Supported the automation of data pipelines, collaborating on workflows that ingested data from ADLS into Azure SQL Database using Azure Data Factory, reducing manual intervention by 30%.
Worked alongside marketing teams to validate and ensure the processed data met their analytics needs, contributing to more personalized customer engagement strategies.
Helped optimize data processing jobs within Databricks, improving PySpark job efficiency and helping reduce data pipeline runtime by 25%, resulting in faster data availability for analysis.
Used PySpark and SparkSQL in Databricks as part of the team s efforts to perform large-scale data transformations and load processed data into Azure SQL Database, improving the data quality and accessibility.

Project 2
Technologies: Databricks, AWS Glue, AWS S3, PySpark, SparkSQL, Amazon QuickSight, Kafka, AWS Redshift, Delta Lake
Designed a real-time data pipeline for processing and analyzing e-commerce transaction data using Databricks and AWS Glue.
Implemented Delta Lake tables on Databricks, managing both structured and unstructured data to ensure efficient data storage and retrieval.
Configured real-time data streaming and batch processing using Kafka and AWS Glue to handle high-velocity data inputs, improving the timeliness and accuracy of insights.
Developed interactive dashboards using Amazon QuickSight to visualize key performance metrics, such as sales trends and customer behavior, which led to actionable business insights.
Tuned the pipeline to enhance performance, achieving a 20% reduction in data processing time.
Collaborated with business analysts to translate data into actionable insights, improving decision-making processes.
Implemented security measures to protect sensitive customer data throughout the pipeline.
Set up continuous monitoring of the data pipeline using AWS CloudWatch, ensuring timely detection and resolution of issues.
Utilized PySpark for data processing tasks and SparkSQL for efficient querying of the e-commerce data.
Automated data pipeline workflows using AWS Step Functions to streamline processing and reduce manual intervention.
Integrated data from multiple sources, including AWS S3 and AWS Redshift, to provide a comprehensive view of e-commerce operations and performance.
Conducted regular performance reviews and optimizations, resulting in improved scalability and reliability of the data processing infrastructure.
Junior Data Engineer
MSR Cosmos / Client - Syneos Health May 2017 Feb 2021

Contributed to a team project migrating large volumes of patient data (over 10 million records) from Oracle databases to Hadoop Distributed File System (HDFS), ensuring data integrity throughout the process.
Assisted in developing and maintaining ETL pipelines using Informatica and SQOOP for automating data ingestion, transformation, and loading into Hadoop, reducing manual workload by 50%.
Supported the implementation of data quality checks, working closely with senior engineers to identify and address inconsistencies, which helped improve data accuracy by 15%.
Participated in analyzing patient data using SQL and Python, contributing to insights that aided medical research teams in identifying key trends and metrics.
Helped create visualizations (charts, graphs) using Power BI under the guidance of senior team members, making it easier for stakeholders to interpret data findings.
Collaborated with research teams, providing technical support for their data-driven investigations and learning about data processing in clinical trial contexts.
Gained hands-on experience with Hadoop components like HDFS, MapReduce, and Hive, supporting data processing tasks and improving system performance under the supervision of senior engineers.
.
Keywords: continuous integration continuous deployment business intelligence sthree database rlang information technology

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];3933
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: