Home

Ashish Jain - Data Engineer
ashishhastimal@gmail.com
Location: Orlando, Florida, USA
Relocation: Open
Visa: H1-B
Ashish Jain
Email : ashishhastimal@gmail.com| 980-485-4544*104

SUMMARY

Data Engineer with around 7 years of expertise in data processing, ELT pipelines, and cloud architecture. Deep hands-on experience designing and developing batch and real-time data solutions leveraging PySpark, Databricks, Kafka, Airflow, Snowflake, SQL, and leading cloud platforms Azure and AWS. Proven ability to optimize data systems for scalability, cost, & performance.

SKILLS

Languages & OS: Python, SQL, Java, JavaScript, Scala, HTML, CSS, JSON, XML, Shell Scripting; Window, Linux, MacOS, Go
Cloud Technologies: S3, IAM, EC2, AWS, Cloud Watch, Redshift, Kafka, Lambda, Kubernetes, Glue, EMR, Kinesis, Athena, SNS, SQS, Firehose, Snowflake, GCS Buckets, Bigquery, Cloud Composer, Functions
Azure: Data factory, ADL G2, Key Vault, VM, Elasticsearch, Functions, ACI, AKS, Databricks, PowerBI
Logging, Monitoring, Version Control: Dynatrace, Elasticsearch, Splunk, gitlab, Harness, Git, Github, Bitbucket, Tableau
Databases & Tools: SQL Server, Postgres SQL, NoSQL, Redshift, SSIS, Cassandra, Teradata, Airflow, Spark, Hive, Oracle

PROFESSIONAL EXPERIENCE

Disney, Orlando, FL | Sr. Data Engineer Apr 2023 - Present
Tech Stack Databricks, Snowflake, DynamoDB, AWS, Python, SQL, Splunk, Alation, Pyspark, Airflow, Oracle, PostgreSQL, Docker, Terraform, Azure, DBT, Lambda
DCL and WDW Parks Project
Designed & implemented scalable data pipelines to automate real-time ELT workflows for Adobe Clickstream API data feeds to load data into Databricks consumed by ML algorithm for making near real-time inferences from model for Parks & DCL.
Built Databricks delta lake tables from different schemas using CDC process as part of data preprocessing for DCL & Parks transactions consolidated from seaware tables improving query performance by 20% through partitioning and clustering strategies.
Leveraged GAM, DScribe, availability, pricing, admin, forecaster, eligibility, virtual Queue API s ensuring seamless data ingestion and alignment with business logic for feeding data into ML driven recommendations model hosted on Google cloud Vertex AI service.
Containerized recommender applications for training, inference, data processing built using docker and deployed on serverless workloads on AWS lambda reducing latency & optimizing costs by 15%
Collaborated with product managers and cross-functional teams to gather requirements, prioritize tasks in agile sprints, and deliver solutions that meet SLAs for data freshness and accuracy.
Developed interactive data visualizations using Python libraries, SQL, and Streamlit within Databricks workbooks to analyze large-scale datasets, enabling business stakeholders to identify trends, validate data quality, and drive actionable insights for Disney Cruise Line (DCL) initiatives.
Developed ETL jobs on Snowflake to ingest and load different Membership files across WDW and DCL into MDM GAM/ Keyring tables thereby enhancing system performance by 25%.
Leveraged Snowpipe to ingest data from Azure storage location in real time into snowflake datawarehouse for high priority business requirements using snow sql achieving near real time data accessibility.
Led data curation efforts aligning with star schema data model, optimizing data structure to meet business needs, and facilitating accessibility for data science and analytics teams.
Authored Triggers, Stored Procedures, and Functions using Transact-SQL (TSQL) to facilitate robust healthcare data operations.
Led data curation efforts aligning with star schema data model, optimizing data structure to meet business needs and facilitating accessibility for data science and analytics teams.
Developed robust pipelines to seamlessly transfer data from PostgreSQL to Snowflake and S3, empowering the recommendation system with timely and accurate data insights.
Orchestrated ELT workloads from various sources to Snowflake using Airflow facilitating data transformation, governance, testing, and quality control via GitHub versioning and PyDeequ.
Ensured code quality and documentation excellence to enhance readability and facilitate efficient debugging processes, contributing to streamlined operations and enhanced team collaboration.



Blue Cross Blue Shield, Little Rock, AR | Data Engineer Dec 2021 Mar 2023
Tech Stack Pyspark, Kafka, Azure, Databricks, Docker, DB2, Snowflake, MongoDB, Python, SQL, DBT, Java, Azure Datafactory, VM
Radiant Project
Developed complex SQL transformation jobs on Databricks to ingest and load different Membership files, claims for various payers, providers & MDM tables thereby enhancing system performance by 25%.
Migrated Ab Initio ETL workloads to Databricks facilitating data transformation, governance, testing, and quality control via GitHub versioning and PyDeeque.
Designed high-volume transactional systems, leveraging partitioning strategy, clustering keys, and materialized views to optimize query performance by 40%
Automated CI/CD pipelines using GitHub Actions for seamless deployment of airflow code and Snowflake scripts by keeping config and code isolated for easier understanding and debugging downstream issues.
Optimized query performance by analyzing execution plans and indexes, reducing resource consumption and processing time.
Migrated 5+ TB of legacy on-premises data (SQL Server, Oracle) to Databricks, using Airflow for continuous ingestion and scheduled jobs for complex transformations.
Responsible for estimating the cluster size, monitoring, and troubleshooting the Spark data bricks cluster.
In-depth, understanding of Apache spark job execution components like DAG, lineage graph, DAG Scheduler, Task scheduler, and Stages and worked on relational, NoSQL databases including Postgre SQL, and Mongo DB.
Skilled in creating and managing tables in Snowflake, ensuring proper indexing, partitioning, and clustering for optimized query performance and cost efficiency.
Constructed and enhanced data models and schemas utilizing Snowflake technologies to facilitate efficient storage and retrieval of data for analytics and reporting objectives.
Created ELT/ETL pipelines using Python and Snowflake SnowSQL to streamline data movement to and from the Snowflake data warehouse.
Implemented changing data capture (CDC) and slowly changing dimensions (SCD) techniques within Snowflake to maintain historical tracking of data changes over time.


Kafka Real time streaming application
Spearheaded the development of a robust Kafka Connect Application on docker using Java for various health insurance products and deployed on EC2 instances, achieving a 30% cost reduction while ensuring secure, faster processing of sensitive PII/PHI data from various sources.
Dramatically improved operational efficiency for largest customer with providing real time policy enrollment and usage for its employees using Kafka Rest API s that helped save $10 million in revenue.
Processed avro serialized streaming data with Databricks Spark Streaming, enabling seamless handling of member enrollment, policy card dispatch, and payments. Successfully integrated with multiple subscribers including Salesforce CCI, sales, and marketing teams, ensuring timely access to critical information.
Optimized Snowflake data warehousing performance through clustering keys, materialized views, stream/task automation, and query optimization best practices.
Ensured data security and governance within Snowflake by implementing role-based access controls, row/column-level security policies, and dynamic data masking.
Developed reusable SQL functions, procedures, and views in Snowflake to encapsulate common data transformation logic.


Tata Consultacy Services, Dallas, TX Sep 2018 Aug 2021
Tech Stack Spark, Hadoop, AWS EMR, Lambda, Python, SQL, Airflow, Elastic Search, Firehose, Flume, QuickSight
Onprem datastores to AWS Cloud Migration Project
Architected and deployed python serverless application on AWS, utilizing Lambda, SNS, and SQS to address monitoring and alerting requirements across API endpoints, file transfers, and application logging. Integrated with Elasticsearch for enhanced observability and Kibana for intuitive dashboard visualization.
Orchestrated ETL-related data pipelines using Apache Airflow, leveraging a variety of Airflow operators for seamless and automated data processing workflows. This automation led to a 40% reduction in manual intervention and a 25% improvement in overall ETL process efficiency.
Maintained data quality at landing tables, involving data cleaning, transformation, and integrity checks within a relational environment. Achieved a 98% accuracy rate in data quality, reducing errors by 20%.
Designed and setup Enterprise Data Lake to provide support for various use cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing data.
Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect.


Hadoop Project
Developed hive scripts for processing complex data pipelines, designing data modeling, and orchestrated workflows using oozie.
Implemented Hadoop ecosystem technologies such as HDFS, MapReduce, YARN, Hive, and HBase to enable scalable and efficient processing of large volumes of healthcare data.
Worked with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.
Developed ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE and other compressed file formats Codecs like gzip, Snappy, Lazo.
Strong understanding of Partitioning, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
Orchestrated ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
Prepared Tableau dashboards to visualize insights derived from product-specific data, engagement metrics, and sales performance, facilitating data-driven decision-making.

EDUCATION

Masters in Computer Science, University of Central Florida, Orlando
Bachelor of Engineering in Computer Science, University of Mumbai, IN
Keywords: continuous integration continuous deployment artificial intelligence machine learning sthree database golang Arkansas Florida Texas

To remove this resume please click here or send an email from ashishhastimal@gmail.com to usjobs@nvoids.com with subject as "delete" (without inverted commas)
ashishhastimal@gmail.com;5069
Enter the captcha code and we will send and email at ashishhastimal@gmail.com
with a link to edit / delete this resume
Captcha Image: