Resume View

Home

Mourya Chandra - DATA ENGINEER

Location: South Bend, Indiana, USA

Relocation:

Visa:

Resume file: Mourya_DataEngineer_1776881553748.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.

MOURYA CHANDRA MITTAPALLI
+1 (872) 225-2669
[email protected]

PROFESSIONAL SUMMARY
Data Engineer with 5+ years of experience designing, building, and maintaining large-scale data platforms and pipelines for financial services, supply chain, and e-commerce companies.
Worked extensively with AWS (S3, Glue, EMR, Redshift, Kinesis, Lambda, Step Functions, SageMaker, Lake Formation, MWAA), Databricks, and GCP for multi-cloud data solutions.
Palantir Foundry specialist experienced with Ontology SDK, Pipeline Builder, Contour, Quiver, Code Repositories, Data Lineage, and Workshop for building semantic data layers and operational applications.
Built ETL/ELT pipelines in PySpark and Spark SQL on medallion architectures (bronze/silver/gold) with Delta Lake, Unity Catalog, Delta Live Tables, and Autoloader for incremental ingestion.
Set up CI/CD pipelines using Jenkins, Code Pipeline, GitHub Actions, Docker, and Terraform for Glue jobs, Foundry transforms, Databricks notebooks, and infrastructure-as-code deployments.
Built real-time streaming pipelines with Kafka, Kinesis, and NiFi for payment processing, inventory tracking, market data feeds, and event-driven microservice architectures.
Orchestrated hundreds of Airflow DAGs on MWAA and Cloud Composer with SLA monitoring, retry policies, cross-system dependency management, and alerting integrations.
Worked with Snowflake, Redshift, BigQuery, PostgreSQL, and DynamoDB. Tuned queries, set up materialized views, clustering keys, distribution strategies, and row-level security policies.
Built dbt transformation layers on Snowflake and Redshift with model dependencies, incremental builds, automated testing, and self-service documentation for analytics teams.
Designed enterprise data lakes on S3 with multi-zone storage, schema evolution, Lake Formation access controls, and Glue Data Catalog for governed discovery and analytics.
Experienced with Step Functions for orchestrating multi-step serverless workflows with parallel branches, error handling, retry logic, and dead-letter queue integration.
Implemented data quality frameworks using Great Expectations and custom validation logic in Glue and Foundry, blocking bad data before it reached downstream consumers.
Built ML feature pipelines with SageMaker Pipelines, Spark MLlib, and Scikit-learn for demand forecasting, fraud detection, risk scoring, and anomaly detection use cases.
Proficient in PySpark performance tuning including partition pruning, broadcast joins, AQE, skew handling, caching strategies, and Spark UI-based bottleneck analysis.
Deployed containerized workloads on Kubernetes (EKS, GKE) with auto-scaling, resource quotas, health checks, and rolling deployments for data and ML serving applications.
Built monitoring and alerting with CloudWatch custom metrics, SNS notifications, PagerDuty integrations, and Foundry health checks for end-to-end pipeline observability.
TECHNICAL SKILLS
Cloud AWS S3, EC2, EMR, Lambda, Glue, Athena, Redshift, DynamoDB, CloudWatch, CloudFormation, Step Functions, IAM, Code Pipeline, SageMaker, Lake Formation, MWAA, SNS, SQS,
Cloud GCP BigQuery, Dataflow, Dataproc, Cloud Composer, Pub/Sub
Palantir Foundry Pipeline Builder, Ontology SDK, Ontology Manager, Code Repositories, Data Lineage, Workshop
Big Data Apache Spark (PySpark, Spark SQL), Databricks, Delta Lake, Kafka, Airflow, Data Lake Architecture
Databases Snowflake, Redshift, PostgreSQL, DynamoDB, MongoDB, Oracle, MySQL, Star/Snowflake Schema
ETL / ELT AWS Glue, Airflow, dbt, Google Dataflow, Apache Beam
AI / ML SageMaker Pipelines, MLflow, Scikit-learn, Spark MLlib, TensorFlow, PyTorch
Languages Python, PySpark, SQL, Java
DevOps & CI/CD Docker, Kubernetes, Jenkins, Git/GitHub, Terraform, AWS CodePipeline, GitHub Actions
Visualization Tableau, Power BI, QuickSight
Compliance Agile/Scrum, SOX, GDPR, HIPAA, PCI-DSS, ISO 20022, Data Governance
PROFESSIONAL EXPERIENCE
Amazon, South Bend, IN October 2025 Present
Senior Data Engineer
Leading data engineering for a Palantir Foundry-integrated supply chain visibility platform on AWS, unifying fulfillment, inventory, and 3PL data across dozens of warehouses into a single trusted source for operations teams.
Designed end-to-end ETL pipelines in AWS Glue and PySpark to ingest warehouse shipment logs, purchase orders, and inventory snapshots from 40+ sources, transforming through bronze/silver/gold medallion layers on S3 with Delta Lake.
Built Palantir Foundry ontology models connecting warehouses, shipments, SKUs, vendors, and purchase orders into a semantic graph using the Ontology SDK, enabling ops teams to run analyses on supply chain data without writing SQL.
Set up real-time streaming with Kafka and Kinesis for stock movements, shipment status updates, POS events, and warehouse capacity signals, with Palantir Foundry connectors syncing ontology objects within seconds of source changes.
Migrated legacy Glue pipelines to Databricks Workflows with Delta Live Tables and Autoloader for incremental S3 ingestion, reducing pipeline authoring time by 40% and eliminating manual dependency management across teams.
Wrote PySpark transformations with window functions for rolling inventory metrics (7-day, 30-day, 90-day), custom UDFs for vendor name normalization, and multi-level array flattening on nested 3PL JSON payloads across 100 s of data sources.
Deployed anomaly detection for inventory discrepancies using Palantir Foundry AIP Logic linked to SageMaker endpoints, surfacing predictions and confidence scores in Workshop apps so ops managers could act on alerts in real time.
Used NiFi to ingest data from legacy vendor systems that only supported FTP, SFTP, and flat-file exports, building flow pipelines that routed, validated, and transformed records before landing them into the S3 data lake.
Configured Lake Formation column and row-level access controls on the S3 data lake, mirrored in Palantir Foundry markings so permissions stayed consistent whether teams queried through Athena, Redshift, or Foundry Contour.
Deployed Spark batch and ML serving workloads on Kubernetes (EKS) with Horizontal Pod Autoscaler, resource quotas, liveness probes, and namespace isolation to keep inference latency low without overspending on compute.
Created CI/CD pipelines with Jenkins and Code Pipeline for all pipeline code, Glue jobs, and Palantir Foundry transforms, with Great Expectations quality gates, unit tests, and automatic rollback on validation failures.
Built monitoring dashboards with CloudWatch custom metrics, SNS alerts, and Palantir Foundry health checks covering pipeline failures, data freshness SLAs, data quality drops, and ontology sync issues across the entire platform.
Optimized Redshift with distribution keys, sort keys, late-binding views, and materialized views on high-traffic supply chain tables, cutting dashboard query times by 65% on multi-billion-row datasets.
Built Palantir Foundry Workshop apps giving ops managers real-time visibility into supply chain KPIs, anomaly alerts, fulfillment bottlenecks, and vendor performance scorecards, replacing fragmented spreadsheet reporting entirely.
Designed Step Functions Express Workflows for the document processing pipeline: S3 event trigger, Lambda text extraction, Glue schema validation, and Redshift loading with retry logic, timeout handling, and dead-letter queues.
Implemented Databricks Unity Catalog across all workspaces for centralized governance, data discovery, access auditing, and lineage tracking, ensuring consistent security policies across the entire Databricks environment.
Designed a slowly changing dimension (SCD Type 2) framework in PySpark for tracking vendor and warehouse attribute changes over time, enabling historical analysis of supply chain configuration changes.
Built parameterized Glue jobs with bookmark-based incremental processing for high-volume source tables, reducing daily processing costs by 50% compared to full-table scans.
Implemented data compaction jobs on Delta Lake tables using OPTIMIZE and Z-ORDER on frequently filtered columns, reducing Athena and Redshift Spectrum query times by 40% on the supply chain data lake.

UBS, Jersey City, NJ January 2024 September 2025
Data Engineer
Built the data engineering layer for UBS s Volante VolPay cross-border payments platform, delivering real-time ISO 20022 pipelines on AWS with SWIFT message validation and Step Functions orchestration for payment lifecycle management.
Developed ETL pipelines with Glue and PySpark ingesting data from trading platforms, VolPay message queues, compliance screening systems, and client databases into Redshift and Snowflake for regulatory reporting and audit readiness.
Integrated Palantir Foundry into the payment s ecosystem, building ontology models linking trades, payment messages, KYC records, counterparty profiles, and regulatory filings so compliance teams could run audits without support.
Engineered a PySpark fraud scoring pipeline on EMR processing 50M+ daily payment transactions, computing 200+ behavioral features over rolling windows, with outputs sent to SageMaker Feature Store and Palantir Foundry.
Built Step Functions workflows for multi-step payment validation covering XML parsing, OFAC screening, AML rule checks, duplicate detection, and routing, with parallel branches, retry logic, and DynamoDB error state logging for audit trails.
Created a config-driven ingestion framework where new payment message types could be onboarded by updating a YAML configuration file with schema mappings and validation rules, cutting integration time from weeks to hours.
Used dbt on Snowflake for the transformation layer with model dependencies, incremental strategies, automated tests on every merge, checks, and generated documentation the analytics and compliance teams could browse independently.
Implemented Great Expectations validation suites in Glue and Palantir Foundry Pipeline Builder for schema drift detection, null checks, referential integrity, and value range validation, blocking bad data from the ontology and sending Slack alerts.
Used Spark to build a transaction clustering model using K-means on payment amount, frequency, and counterparty features, pushing assignments into Palantir Foundry as ontology properties to help the AML team prioritize investigations.
Engineered PySpark pipelines on EMR for ISO 20022 message transformation with recursive XML flattening, namespace-aware parsing, custom partitioning by message type and settlement date, and Delta Lake schema merge.
Migrated batch reconciliation from legacy scripts to Databricks with Delta Live Tables for declarative pipeline definitions, Delta Lake ACID transactions for exactly-once processing guarantees, and Unity Catalog for fine-grained data governance.
Established a full Palantir Foundry SDLC across dev, staging, and production branches with transform promotion gates requiring peer review, data quality checks, and regression testing before changes reached the production ontology.
Set up Palantir Foundry Data Lineage tracking across the full payment s lifecycle, giving auditors and regulators a clickable map from raw VolPay messages through every transformation step to final regulatory reports and filings.
Built Palantir Foundry Workshop apps for compliance and payments ops teams, surfacing payment flow health, SLA breach alerts, sanctions screening results, and KYC status with one-click drill-down to individual transaction details.
Created Airflow DAGs on MWAA for end-of-day payment reconciliation across VolPay, Redshift, and Snowflake, with cross-system dependency sensors, SLA monitoring, and automated notifications to treasury operations on reconciliation breaks.
Implemented Snowflake time-travel and fail-safe policies for critical payment tables, enabling point-in-time recovery for regulatory audits and providing a safety net against accidental data modifications by downstream processes.
Designed Redshift stored procedures for end-of-day settlement aggregation across currencies, counterparties, and corridors, with performance tuning through workload management queues and concurrency scaling for reporting hours.
Built a data archival pipeline that moved aged payment records from Redshift hot storage to S3 Glacier based on retention policies, with Athena external tables maintaining query access to archived data for regulatory lookback requests.
Created PySpark data quality scorecards that computed completeness, accuracy, timeliness, and consistency metrics for each payment data source daily, publishing scores to Palantir Foundry dashboards for data stewards to monitor.

McKesson, Irving, TX January 2023 December 2023
Data Engineer
Built an investment analytics platform on AWS and GCP giving portfolio managers access to fund performance, risk metrics, and ML-driven insights across mutual fund and ETF products spanning multiple asset classes.
Designed ETL pipelines with Glue and PySpark for daily fund transactions, holdings snapshots, NAV calculations, and benchmark comparisons through medallion layers on S3, loading into Snowflake and Redshift for reporting.
Developed Spark Streaming jobs with Kafka and GCP Pub/Sub for real-time market data ingestion and portfolio valuations, storing results in Redshift and BigQuery for live dashboards consumed by portfolio managers and risk analysts.
Used Palantir Foundry Code Repositories and Ontology SDK to build a semantic layer across fund accounting, holdings, compliance rules, and benchmark data with typed object types, link types, and action types for self-service analytics.
Designed ML feature pipelines with Spark MLlib and Scikit-learn for rolling risk metrics, volatility scores, Sharpe ratios, beta calculations, and factor exposures, publishing features to Palantir Foundry for Contour analysis and portfolio construction.
Designed cross-cloud orchestration between AWS (MWAA) and GCP (Cloud Composer) with shared DAG libraries in Git, Pub/Sub and SNS event triggers, unified SLA monitoring dashboards, and cross-cloud failure alerting.
Used BigQuery and Dataflow for serverless analytics workloads, building Apache Beam pipelines for market data transformations, corporate action processing, and dividend calculations feeding the portfolio analytics layer.
Migrated legacy Hive batch jobs from on-prem Hadoop to PySpark on EMR, rewriting HiveQL to optimized Spark SQL, cutting processing times by 70%, and removing all HDFS/YARN maintenance overhead.
Stored fund metadata, user preferences, and portfolio configuration in MongoDB, choosing a document store over relational databases to handle the constantly evolving schema as new fund types and asset classes were added.
Deployed ML model serving containers on GKE with Horizontal Pod Autoscaler, serving real-time risk predictions and portfolio optimization results to the dashboard with P99 latency under 200ms.
Implemented Snowflake secure data sharing between the analytics database and the compliance team, replacing nightly CSV exports with live read-only access, row-level security, and dynamic data masking on sensitive fields.
Provisioned all GCP infrastructure with Terraform modules, managing Dataproc clusters, BigQuery datasets, GKE clusters, and Cloud Composer environments as code with separate dev and prod workspaces and plan-based drift detection.
Wrote complex SQL and stored procedures in Snowflake and Big Query for performance attribution calculations, time-weighted return computations, and regulatory reporting, keeping response times under a second on multi-year datasets.
Implemented row-level security and dynamic data masking in Snowflake so different compliance tiers (internal audit, fund managers, external auditors) could access the same tables with appropriate field-level redaction.
Set up Palantir Foundry Data Lineage and Quiver for the investment team, giving analysts a clickable graph from raw custodian files through every transform to final NAV calculations and regulatory submissions.
Built a fund onboarding automation pipeline in PySpark that ingested custodian setup files, validated fund attributes against reference data, created schema entries, and initialized backfill jobs, reducing onboarding from days to hours.
Implemented Delta Lake on Databricks for the fund analytics layer with ACID transactions, time travel for point-in-time NAV queries, and Autoloader for incremental ingestion of daily custodian files from S3.

Tetra Soft, Hyderabad, India January 2021 June 2022
Data Analyst
Led migration of legacy on-prem data workflows to AWS (S3, Glue, Athena), cutting infrastructure costs and making data directly accessible to business users for self-service analytics and ad-hoc querying.
Developed ETL workflows in Python and SQL pulling from Oracle, MySQL, and PostgreSQL source systems, applying rules, data cleansing logic, and referential integrity checks before loading clean data into a centralized data warehouse.
Built Tableau and Power BI dashboards for sales trends, customer segments, operational KPIs, and inventory metrics, replacing manual Excel reporting and saving teams hours of effort each week.
Ran exploratory analysis with Pandas and NumPy to find revenue trends, seasonal patterns, and customer behavior insights that directly shaped quarterly sales strategy and marketing spend allocation.
Designed star schema models with fact and dimension tables, indexing strategies, and date-based partitioning, and set up automated pipelines with Python and CRON for daily feature extraction and data refresh.
Built ML prototypes with Scikit-learn for churn prediction and sales forecasting using gradient boosting and logistic regression, validated model accuracy with cross-validation, and documented results before handing off.
Performed database tuning on PostgreSQL and MySQL with index optimization, query plan analysis, slow query log review, and table partitioning that cut average report query times in half.
Worked closely with business stakeholders to translate vague reporting requests into concrete SQL queries, KPI definitions, and dashboard designs, often iterating through several versions before finalizing requirements.
Set up a basic CI/CD workflow with Git and shell scripts for the ETL codebase, replacing manual script copying between dev and production servers and introducing code review practices for the analytics team.
Automated daily and weekly report generation with Python scripts scheduled via CRON, generating formatted Excel and PDF exports and distributing to stakeholders by email without manual intervention.
Built a Flask-based internal data catalog backed by PostgreSQL where analysts could search datasets, view column descriptions, check data freshness timestamps, and submit access requests.
Built PySpark prototypes on local Spark clusters for large CSV and JSON data exports, applying window functions, UDFs, and aggregation logic that became the basis for later EMR pipelines after the AWS migration.
Created a centralized logging framework for all ETL jobs that captured run metadata (start time, duration, row counts, errors) into a tracking table, enabling trend analysis on pipeline health and SLA compliance.

EDUCATION
Master of Science in Computer Science | Rowan University, Glassboro, NJ, USA 2022 2023

Bachelor of Technology in Computer Science | ICFAI Foundation for Higher Education, India 2018 2022

CERTIFICATIONS & RESEARCH
AWS Certified Data Engineer Associate
Salesforce Certified AI Associate
Palantir Foundry Data Engineering Certification
Research Paper: Measuring LLM Generalization via Contextual Behavior Prediction: A Personalization-Centric Evaluation Framework
Keywords: continuous integration continuous deployment artificial intelligence machine learning user interface business intelligence sthree active directory information technology New Jersey Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];7227

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: