Home

charitha - Data engineer
[email protected]
Location: Herndon, Virginia, USA
Relocation:
Visa:
Professional Summary:
Senior Data Engineer with 10+ years of experience designing, building, and optimizing large-scale cloud data platforms with strong expertise in AWS ecosystem including S3, Redshift, Glue, EMR, Lambda, Kinesis, IAM, and CloudWatch. Proven ability to develop high-performance batch and streaming pipelines using Spark (PySpark, Spark SQL), Snowflake, Redshift, and Databricks to support enterprise analytics, machine learning, and business intelligence initiatives.
Extensive experience building AWS-based data lakes and cloud-native data warehouse architectures, migrating legacy on-premise systems to scalable AWS environments using services such as S3, Redshift, Glue, EMR, and Athena.
Owned end-to-end data lifecycle including governance, quality, security, and compliance across enterprise data platforms.
Implemented enterprise data governance frameworks ensuring data accuracy, consistency, and regulatory compliance.
Configured job dependencies, calendars, and resource constraints for batch processing.
Strong expertise in Snowflake and Amazon Redshift including advanced data modeling, workload management, query optimization, and cost management strategies.
Experienced in designing and implementing serverless and event-driven architectures using AWS Lambda, Kinesis, and Step Functions to support real-time data ingestion and analytics.
Proficient in developing ETL/ELT pipelines using Python, PySpark, and SQL within distributed AWS environments such as EMR, Glue, and Databricks.
Worked with EBCDIC encoded datasets, handling conversions between EBCDIC and ASCII/Unicode using standardized code pages (e.g., 037, 1047).
Designed and managed Control-M job workflows including job creation, scheduling, and execution.
Processed fixed-width mainframe files, ensuring accurate field alignment and schema mapping.
Extensive hands-on experience in real-time data streaming using Apache Kafka and AWS Kinesis, enabling scalable event-driven data processing pipelines.
Processed and integrated location-based datasets (regional/customer segmentation) to enable geo-level analytics and reporting.
Designed data pipelines supporting region/state-level aggregations for business insights and downstream analytics.
Optimized data models to support geographic drill-down reporting (state, city, region-level metrics).
Strong expertise in AWS-based big data ecosystems including EMR, S3, Glue, Athena, Redshift, and DynamoDB, supporting high-volume enterprise workloads across finance, healthcare, telecom, and retail domains.
Designed and developed ETL pipelines using Matillion for data ingestion and transformation in cloud environments.
Ingested and processed third-party/vendor datasets into enterprise data platforms.
Built pipelines to integrate data from external APIs, flat files, and partner systems.
Integrated curated datasets with downstream applications and CRM systems to support business operations and reporting
Developed and maintained modular data transformation pipelines using DBT, enabling scalable and reusable ELT workflows.
Designed and implemented data archiving strategies based on retention policies and business rules.
Built purge frameworks to safely delete data (soft/hard deletes) while maintaining referential integrity.
Implemented data validation and transformation logic for external data sources.
Built and optimized Glue PySpark jobs to process large-scale structured and semi-structured data (JSON, Parquet, Avro, CSV) from Amazon S3.
Designed and implemented end-to-end ETL pipelines using AWS Glue, leveraging Glue Jobs, Crawlers, and Data Catalog for scalable data processing.
Worked with Azure Synapse, Cosmos DB, and Azure Data Factory to build scalable enterprise data solutions.
Worked with VSAM and sequential flat files for data extraction and transformation.
Interpreted and utilized COBOL copybooks to define schema for mainframe data ingestion.
Exposure to Microsoft Fabric for unified analytics and lakehouse architecture
Integrated Kubernetes clusters with AWS services such as S3, RDS, Lambda, and CloudWatch for data storage, event-driven processing, and real-time monitoring of container workloads.
Integrated Matillion with Snowflake and AWS S3 for efficient data loading and processing.
Well-versed in Infrastructure as Code (IaC) using Terraform and AWS CloudFormation, enabling automated provisioning and management of scalable cloud data infrastructure.
Experienced in implementing secure AWS data platforms using IAM, KMS encryption, VPC configurations, and role-based access controls to ensure compliance with industry regulations.
Hands-on experience implementing data pipeline orchestration using Apache Airflow integrated with AWS services to manage complex ETL workflows.
Managed and tracked development tasks, incidents, and change requests using JIRA in Agile/Scrum environments.
Developed complex T-SQL stored procedures with robust transaction handling and error management.
Optimized query performance using indexing strategies, query tuning, and execution plan analysis.
Delivered analytics-ready datasets for Tableau and Power BI dashboards, enabling self-service reporting.
Designed and optimized data models to support high-performance BI queries and dashboards.
Automated data pipeline deployments using DBT in CI/CD workflows.
Skilled in collaborating with data scientists, analytics teams, DevOps engineers, and business stakeholders to deliver scalable AWS-based data solutions.

Core strengths include:
AWS Data Lake Architecture (S3, Glue, EMR, Athena)
Real-Time Data Streaming (Kafka, AWS Kinesis)
Data Warehouse Optimization (Redshift, Snowflake)
Distributed Data Processing (Spark / PySpark on EMR)
Infrastructure as Code (Terraform / CloudFormation)
Serverless Data Pipelines (Lambda, Step Functions)
ETL / ELT Pipeline Engineering
Cloud Data Security & Governance
Keywords: continuous integration continuous deployment business intelligence sthree database

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];7207
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: