| Sai Monica - Sr. Data Engineer / AI Engineer (Azure, Databricks, Snowflake, PySpark,, LLMs) |
| [email protected] |
| Location: Austin, Texas, USA |
| Relocation: Yes |
| Visa: H1B |
|
MONICA RAMINENI
Data Engineer | AI Engineer | https://www.linkedin.com/in/monica-ramineni-440911135/ | Austin, TX PROFESSIONAL SUMMARY Data Engineer & AI Engineer with 8+ years of experience building scalable, high-performance data pipelines and analytics solutions that drive enterprise decision-making. I am skilled in end-to-end ETL/ELT engineering using SSIS, Informatica PowerCenter/IICS, Azure Data Factory, and Azure Databricks with deep proficiency in SQL, Python, PySpark, Snowflake, SQL Server, and Azure Synapse Analytics across cloud, on-premises, and hybrid environments. Delivers trusted, analytics-ready data products through Power BI, DAX, and Power Query backed by a strong focus on data quality frameworks, governance, compliance standards, and end-to-end data lineage. On the AI side, currently engineering LLM evaluation pipelines, agentic workflows, and RAG-grounded AI systems at Hireko.ai spanning multi-agent orchestration, MCP server integrations, A2A Protocol, and AI safety guardrails bridging enterprise data engineering with production-grade AI. CORE COMPETENCIES Scalable Data Pipeline Design Data Quality & Governance ETL/ELT Architecture & Development Data Warehousing & Cloud Architecture High-Volume Data Processing Pipeline Health & Performance Monitoring Cloud & On-Premises Data Integration Compliance & Security Controls Data Modeling & Schema Design Cross-Team Collaboration & Influence Analytics & BI Delivery Power BI Report Development & Optimization Azure Synapse Analytics Agentic AI, RAG & Multi-Agent Systems LLM Evaluation & AI Guardrails Enterprise Data Integration TECHNICAL SKILLS ETL & Data Integration: SSIS, Informatica PowerCenter, Informatica IICS/CDI, Azure Data Factory (ADF), Azure Databricks Databases & Data Platforms: Snowflake, Azure SQL Database, SQL Server, Azure Data Lake Storage Gen2, Azure Synapse Analytics Languages: SQL, T-SQL, Python, PySpark, SparkSQL, DAX BI & Reporting: Power BI, DAX, Power Query, Power Automate, Pipeline Health Dashboards, KPI Monitoring, Microsoft Excel Orchestration & Streaming: Apache Airflow, Apache Kafka, Control-M, SQL Server Agent AI & Agent Frameworks: LangChain, LangGraph, LangSmith, OpenAI Agents SDK, MCP, A2A Protocol, OpenAI, LLMs, Prompt Engineering, Generative AI RAG & Vector DBs: Qdrant, OpenAI Embeddings, Cohere Embeddings, BM25, Multi-Query Retrieval, Ensemble Retrieval AI Safety & Evaluation: RAGAS-style metrics, LangSmith evaluators, PII detection, jailbreak prevention, guardrails Tools & Practices: Git, SSMS, Postman, Jupyter, Visual Studio, ALM Toolkit, Agile/Scrum PROFESSIONAL EXPERIENCE AI / Data Engineer Aug 2025 Present Hireko.ai | USA Built an LLM-based candidate assessment pipeline combining semantic similarity, concept coverage, and factual validation designing end-to-end data flows from grounding through scoring and structured storage in DynamoDB. Integrated Google Vertex AI Search grounding to generate real-time reference material, improving scoring accuracy, and reducing hallucinations in pipeline outputs. Implemented skill-level scoring by aggregating question-level evaluations and benchmarking against human ratings using RMSE, MAE, and correlation preparing data for prescriptive modeling use cases. Applied modular pipeline architecture (grounding, scoring, semantic analysis, coverage checks) with structured logging across all workflow steps for full observability, auditability, and reliability. Monitored DynamoDB read/write patterns and tuned pipeline for cost and latency efficiency; partnered with stakeholders to iteratively refine data models and scoring logic across multiple skills. Architected a multi-agent system with specialized agents handling distinct pipeline stages transcript processing, grounding retrieval, scoring, and orchestration enabling modular, independently testable workflows with clean separation of concerns and reliable end-to-end assessment execution. Designed and deployed Power BI dashboards sourcing structured assessment data from DynamoDB surfacing pipeline usage trends, assessment quality metrics, and scoring performance to enable stakeholders to track AI system health and drive data-driven decisions on model and rubric improvements. Environment: Python, LLMs, Prompt Engineering, Google Vertex AI Search, AWS DynamoDB, REST APIs, Embeddings, Evaluation Metrics, Power BI, Git Data Engineer / Analyst Nov 2021 Dec 2024 Farmers Insurance / Accenture | India Built and delivered scalable Azure-native ETL/ELT pipelines using ADF, Azure Databricks, Azure Synapse Analytics, Azure SQL Database, and ADLS Gen2 processing high-volume data from ingestion through transformation to analytics-ready outputs. Built scalable PySpark pipelines in Azure Databricks to process high-volume enterprise data optimized using partitioning, caching, and broadcast joins for consistent performance and reliable downstream delivery. Managed end-to-end data integration workflows across ingestion, transformation, and publishing layers maintaining data lineage, transformation logic, and pipeline documentation for full traceability. Enforced data quality through validation, profiling, reconciliation, and cleansing; established governance standards, stewardship practices, and compliance controls including Row-Level Security and role-based access. Built Power BI semantic models, KPI dashboards, and pipeline health views with DAX, Power Query, and RLS automated report refresh and distribution using Power Automate and implemented Incremental Refresh to reduce refresh times and improve performance. Conducted ad hoc SQL and Power BI analysis on Snowflake to uncover trends, data anomalies, and operational inefficiencies translating findings into actionable insights for business stakeholders. Drove SIT/UAT/Production validation, root cause analysis, and Dev/Test/Prod pipeline promotions using Git and ALM Toolkit within Agile/Scrum delivery. Environment: Azure Data Factory, Azure Data Lake Storage Gen2, Azure Databricks, Azure Synapse Analytics, Azure SQL Database, PySpark, Snowflake, Power BI, DAX, Power Query, Power Automate, Git, ALM Toolkit, Rally Data Engineer May 2018 Nov 2021 XPO Logistics / Infosys | India Designed and maintained end-to-end SSIS ETL workflows across heterogeneous sources flat files, CSV, Excel, and relational databases ensuring reliable, repeatable data movement with strong source-to-target integrity. Scheduled, monitored, and troubleshot SSIS packages to ensure timely data delivery proactively identified and resolved pipeline failures, data anomalies, and performance bottlenecks. Developed and maintained Informatica PowerCenter assets mappings, mapplets, sessions, and workflows leveraging core transformations (Joiner, Lookup, Aggregator, Expression, Router) to implement complex business rules; contributed to PowerCenter to IICS/CDI migration covering mapping redesign, validation, and production cutover. Authored and optimized SQL stored procedures, views, and complex queries to support ETL processing, data reconciliation, and analytics-ready datasets. Orchestrated and monitored ETL jobs using Control-M and SQL Server Agent performed root cause analysis on failures, implemented corrective actions, and tuned workflows to improve throughput and SLA adherence. Built Power BI dashboards and reports for operational data analysis investigated and resolved data discrepancies, traced issues back to source systems, and validated fixes end-to-end. Applied data validation, profiling, unit testing, and cleansing across pipeline stages to ensure accuracy, completeness, and end-to-end data integrity from source to target; delivered within Agile/Scrum sprints. Environment: SSIS, Informatica PowerCenter, Informatica IICS/CDI, SQL Server, T-SQL, Control-M, SQL Server Agent, SSMS, Power BI, Git, Agile/Scrum EDUCATION Bachelor's Degree Computer Science and Engineering Sri Venkateswara University College of Engineering CERTIFICATIONS & AWARDS Microsoft Certified: Azure Fundamentals (AZ-900) Accenture Celebrates Excellence (ACE) The Extra Mile Award (Individual, FY22 Q2) Accenture Celebrates Excellence (ACE) Shared Success Catalyst Award (Team, FY23 Q1) Anthropic Academy Certified (17 courses, 2025) Claude, MCP, Agentic AI, Generative AI, Cloud Integrations AI Makerspace Certified AI Engineer (AIE7, 2025) KEY PROJECTS CareBridge Healthcare AI Capstone (AI Makerspace AIE7): AI-powered health assistant delivering evidence-based medical guidance grounded in trusted sources (Mayo Clinic, FDA, CDC, NIH, WHO); features triage indicators, lab report ingestion, privacy-safe architecture (no PHI stored), and an enterprise edition with provider/patient dashboards and auditable point-of-care summaries. Stack: LangGraph, LLMs, Multi-Agent Systems, Qdrant, Advanced RAG Retrieval | GitHub: https://github.com/Monica-Ramineni/CareBridge Keywords: artificial intelligence access management business intelligence active directory Arizona Texas |