| Sujitha C - AI / ML Engineer |
| [email protected] |
| Location: Mclean, Virginia, USA |
| Relocation: Yes |
| Visa: Green Card |
| Resume file: Sujitha_C_Resume__1776862887277.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Sujitha Cherukuthota
Senior AI / ML Software Engineer | Machine Learning | Applied AI | Python | Automation, Testing & Scalable Systems +1 (757) 936-9318 | [email protected] Summary: Senior AI/ML Software Engineer with 9 plus years of experience in software engineering and data systems using Python, building scalable applications that integrate machine learning models into real world enterprise workflows. Built and integrated machine learning models and LLM based systems into applications, enabling intelligent automation, document processing, and data driven decision making across healthcare and financial domains with reliable outputs. Strong experience in software engineering and system design, developing microservices, APIs, and backend services that support reliable deployment and seamless integration of AI solutions into production environments. Hands on experience with data science workflows including data preparation, feature engineering, model training, and validation, ensuring high quality input and stable performance of machine learning systems across structured datasets. Experienced in delivering solutions across AWS, Azure, and GCP, applying cloud specific services within individual projects to support scalable data processing and production level machine learning applications. Skilled in building AI driven automation workflows and data pipelines, enabling requirement generation, validation logic, and system integration to support reliable AI powered development and testing use cases. Technical Skills: Programming Languages: Python, SQL Machine Learning & Data Science: Machine Learning, Supervised Learning, Regression, Classification, Time Series Analysis, Feature Engineering, Model Training, Model Evaluation, Data Analysis, Statistical Techniques Artificial Intelligence & Generative AI: Large Language Models (LLMs), Retrieval Augmented Generation (RAG), Prompt Engineering, NLP, Embeddings, Semantic Search, Context Handling, LLM Integration Software Engineering & Backend Development: Object Oriented Programming, Data Structures, REST APIs, Microservices Architecture, Backend Services, Modular Design, System Design Automation & Testing: Selenium, Test Automation, Test Case Generation, QA Automation, Python Scripting, Validation Logic, Automation Workflows, Business Rule Validation Development Tools & Environments: Visual Studio Code, Git, GitHub, Copilot AI, JIRA, Confluence Data Engineering & Processing: PySpark, ETL Pipelines, Data Ingestion, Data Transformation, Batch Processing, Distributed Data Processing, Spark SQL, Data Modeling, Data Aggregation Cloud Platforms: AWS (S3, EMR, Glue, Redshift, EKS, Bedrock), Azure (Blob Storage, Data Factory, Synapse, AKS, Azure OpenAI), GCP (BigQuery, Dataflow, GKE, Cloud Storage) DevOps & Deployment: Docker, Kubernetes, CI CD Pipelines, Jenkins, GitHub Actions, Azure DevOps Databases & Storage: Amazon Redshift, Azure Synapse, BigQuery, MySQL, Oracle, DynamoDB Monitoring & Performance: Logging, Pipeline Monitoring, Debugging, Performance Optimization, Query Optimization Engineering Practices: Agile Scrum, Data Lifecycle Management, Reusable Components, Scalable System Design Experience: HCA Healthcare Senior AI / Machine Learning Engineer Richmond, VA | Feb 2024 - Present Developed a machine learning driven document intelligence system using Python and NLP techniques, enabling automated classification and extraction of structured clinical data from healthcare documents for downstream processing workflows. Built and integrated LLM based systems using AWS Bedrock and contextual retrieval logic, enabling healthcare staff to query internal documents and retrieve accurate information without manual navigation across enterprise systems. Designed and implemented backend services using FastAPI and microservices architecture, enabling seamless integration of machine learning models into production applications used by clinical and operational teams. Developed scalable data ingestion pipelines using Python and REST APIs, collecting structured and unstructured clinical documents and storing them in Amazon S3 for centralized processing and analytics workflows. Implemented data preprocessing pipelines using PySpark and AWS Glue, performing cleaning, normalization, and transformation of healthcare datasets to ensure high quality inputs for machine learning and NLP models. Built feature engineering workflows using Pandas and domain specific transformations, enabling improved representation of clinical data and enhancing performance of downstream classification and prediction models. Developed embedding generation pipelines using transformer models and vector indexing techniques, enabling semantic representation of clinical documents for efficient similarity based retrieval in RAG workflows. Implemented retrieval augmented generation pipelines using RAG and vector search, combining embedding based retrieval with LLM response generation to provide context aware and accurate answers for healthcare queries. Designed prompt engineering strategies using LLMs and structured context inputs, ensuring generated outputs are grounded in retrieved data and aligned with healthcare domain specific requirements. Built a machine learning based document classification model using scikit learn and feature engineering, categorizing clinical documents into structured types and improving organization and retrieval efficiency of healthcare datasets. Developed API driven services using REST APIs and Python frameworks, enabling real time access to machine learning predictions and AI driven insights across enterprise healthcare applications. Collaborated with cross functional teams in Agile environments and Scrum workflows, translating business requirements into scalable machine learning solutions aligned with operational and clinical use cases. Designed scalable storage architecture using Amazon S3 and Redshift, organizing raw, processed, and metadata layers to support efficient querying and integration with machine learning pipelines. Implemented validation workflows using data validation and business rules, ensuring consistency, accuracy, and reliability of machine learning outputs before deployment into production systems. Built reusable components using object-oriented programming and modular design principles, improving maintainability and enabling extension of machine learning and AI driven workflows across applications. Containerized applications using Docker and deployed services on AWS EKS, enabling scalable and consistent execution of machine learning pipelines and backend services across environments. Implemented CI CD pipelines using GitHub Actions and Jenkins, enabling automated testing, deployment, and version control of machine learning models and application services. Monitored system performance using CloudWatch and logging frameworks, ensuring visibility into pipeline execution, model behavior, and stability of production AI systems. Optimized data processing pipelines using performance tuning and efficient resource utilization, improving execution efficiency and reducing latency in large scale healthcare data workflows. Provided technical guidance and conducted code reviews for machine learning solutions and backend services, ensuring high quality implementations and improving development cycle efficiency by 30 percent. Environment: Python, FastAPI, REST APIs, Machine Learning, NLP, scikit-learn, Pandas, PySpark, AWS S3, AWS Glue, AWS Redshift, AWS Bedrock, AWS EKS, Docker, GitHub Actions, Jenkins, CloudWatch, Microservices Architecture, RAG, LLMs, Prompt Engineering, Vector Search, Data Validation, Agile, Scrum Sallie Mae Bank AI / Machine Learning Engineer Newark, DE | May 2022 - Jan 2024 Designed a machine learning based reconciliation system using Python and anomaly detection models, identifying mismatches between loan records and transaction data and enabling analysts to investigate only flagged discrepancies. Developed classification pipelines using supervised learning and feature extraction techniques, enabling automated tagging of financial documents and improving routing accuracy across loan servicing workflows. Built backend processing services using Flask and modular architecture, enabling seamless integration of machine learning outputs into financial systems used for reporting, auditing, and reconciliation processes. Engineered ingestion workflows using SQL and data pipelines, consolidating loan data, payment histories, and audit logs into Azure storage layers for unified processing and downstream analytical use. Implemented transformation logic using PySpark and distributed processing, resolving inconsistencies across datasets and standardizing financial records for reliable comparison and model input preparation. Applied feature engineering techniques using Pandas and domain specific rules, capturing transactional patterns and temporal behavior to improve model sensitivity in detecting financial anomalies. Structured data storage using Azure Synapse Analytics and optimized schemas, enabling efficient access to processed datasets for both machine learning workflows and business reporting requirements. Developed predictive models using classification algorithms and statistical validation, supporting automated identification of irregular transaction patterns across large scale financial datasets. Built evaluation pipelines using model validation techniques and historical benchmarks, ensuring model outputs aligned with analyst reviewed results before deployment into operational workflows. Designed rule based validation layers using data quality checks and threshold logic, ensuring accuracy of outputs and preventing propagation of incorrect results into financial reporting systems. Created API based services using REST architecture and Python frameworks, enabling internal applications to consume model predictions and integrate outputs into reconciliation dashboards. Collaborated with finance and risk teams in Agile delivery models, translating reconciliation challenges into technical solutions and refining models based on real audit scenarios and feedback. Developed reusable pipeline components using object oriented programming and modular design, enabling efficient onboarding of new datasets into the reconciliation and analytics workflows. Containerized services using Docker and deployed on Azure Kubernetes environments, ensuring consistent runtime execution and scalability of machine learning and data processing pipelines. Automated deployment processes using Azure DevOps pipelines and version control systems, enabling controlled releases and reducing manual effort in managing application updates. Monitored pipeline health using logging mechanisms and execution tracking, identifying bottlenecks and ensuring stability of data processing and machine learning workflows in production. Improved performance of data workflows using query optimization and execution tuning, reducing processing delays and enabling faster turnaround for reconciliation analysis tasks. Integrated AI assisted query interpretation using Azure OpenAI, enabling analysts to retrieve relevant financial insights through natural language inputs without writing complex queries. Designed combined retrieval approaches using structured filtering and contextual search, enabling more accurate lookup of transaction records and improving efficiency of investigation workflows. Supported team development by reviewing implementations of machine learning pipelines and backend services, ensuring code quality, consistency, and maintainability across collaborative development efforts. Environment: Python, Flask, REST APIs, Machine Learning, Supervised Learning, Classification Models, Anomaly Detection, scikit-learn, Pandas, PySpark, SQL, Azure Blob Storage, Azure Data Factory, Azure Synapse Analytics, Azure Kubernetes Service (AKS), Azure DevOps, Azure OpenAI, Docker, Microservices Architecture, Data Pipelines, Data Validation, Query Optimization, Agile, Scrum State of California, San Francisco, CA Machine Learning Engineer / Data Scientist Feb 2020 - Apr 2022 Built a data driven analytics platform using Python and machine learning models, enabling large scale analysis of public datasets to identify program usage patterns, regional trends, and support informed decision making across departments. Developed forecasting models using time series techniques and regression approaches, enabling prediction of program demand across regions and supporting long term planning decisions for resource allocation and operational strategy. Designed backend data processing workflows using GCP services and modular architecture, enabling structured handling of high volume datasets for analytics, reporting, and integration with downstream machine learning applications. Engineered ingestion workflows using APIs and automated pipelines, collecting datasets from multiple public sources and storing them in Google Cloud Storage for centralized access, transformation, and analytical processing. Implemented transformation pipelines using Dataflow and distributed processing techniques, cleaning, merging, and standardizing datasets from different agencies to ensure consistency and usability across analytics workflows. Applied feature engineering techniques using Pandas and statistical methods, deriving meaningful attributes from raw datasets to improve accuracy and performance of forecasting and classification models used in reporting. Structured analytical datasets using BigQuery and optimized schema design, enabling efficient querying and supporting both ad hoc analysis and scheduled reporting requirements across multiple public sector teams. Developed predictive models using supervised learning and regression algorithms, enabling estimation of participation trends and identification of key factors influencing outcomes across different public programs. Built validation workflows using model evaluation metrics and historical comparisons, ensuring predictions aligned with expected reporting patterns and met accuracy requirements before integration into reporting systems. Designed aggregation logic using data summarization techniques and grouping operations, enabling generation of metrics such as usage trends, growth patterns, and regional comparisons across large datasets. Developed lightweight query interfaces using Python services and API endpoints, enabling analysts to retrieve insights from datasets without directly interacting with complex data infrastructure and backend systems. Collaborated with policy analysts and reporting teams in Agile environments, translating analytical requirements into scalable data solutions aligned with public sector workflows and evolving reporting expectations. Created reusable data processing modules using object oriented programming and structured pipelines, improving maintainability and enabling reuse across multiple analytics and reporting use cases. Containerized data services using Docker and deployed workloads on Google Kubernetes Engine, enabling consistent execution and scalability of data processing and analytics pipelines across environments. Automated deployment workflows using CI CD pipelines and version control systems, enabling reliable updates and controlled releases of analytics and machine learning components across development stages. Monitored data processing jobs using logging frameworks and execution tracking tools, ensuring visibility into pipeline performance and stability of analytics workflows in production environments. Improved processing efficiency using query optimization and execution tuning, reducing runtime of batch jobs and enabling faster generation of reports and analytical outputs for business stakeholders. Implemented semantic retrieval using text indexing and search techniques, enabling analysts to locate relevant sections within large reports and datasets without manual exploration or scanning. Designed hybrid analytics workflows combining machine learning models and rule based logic, enabling more accurate interpretation of patterns and anomalies in large scale public datasets. Supported team development by reviewing implementations of data pipelines and analytical models, ensuring consistency, correctness, and maintainability across shared project components. Environment: Python, REST APIs, Machine Learning, Supervised Learning, Regression, Time Series Analysis, Feature Engineering, Model Evaluation, Pandas, Google Cloud Storage (GCS), Dataflow, BigQuery, Google Kubernetes Engine (GKE), Docker, CI/CD Pipelines, Git, Data Pipelines, Data Modeling, Data Aggregation, Query Optimization, Logging & Monitoring, Text Indexing, Semantic Search, Agile, Scrum Walmart Global Tech Data Engineer (Python / Data Systems) Bentonville, AR | Oct 2016 - Dec 2019 Processed large scale retail datasets using Python and PySpark, handling daily sales, inventory, and product data across multiple sources and ensuring consistent availability of structured datasets for reporting and business analysis workflows. Executed distributed batch processing jobs using AWS EMR and scalable data frameworks, enabling efficient handling of high volume transactional data generated from store operations and ensuring timely data availability across regions. Maintained ingestion workflows using SQL and data connectors, extracting structured data from relational systems and organizing raw datasets in Amazon S3 for centralized storage and downstream processing use cases. Applied transformation logic using Spark SQL and data processing techniques, cleaning inconsistencies, standardizing product attributes, and preparing datasets for analytics and machine learning support workflows. Prepared structured datasets using data aggregation and feature preparation workflows, enabling generation of store level metrics and historical data required for forecasting models and analytical reporting. Applied data quality checks using validation rules and consistency checks, identifying missing values, duplicate records, and anomalies to ensure reliability of processed datasets before consumption by reporting systems. Organized processed datasets using Amazon Redshift and data modeling practices, enabling efficient querying and supporting reporting dashboards used by business users and analytics teams. Automated recurring data processing tasks using Python scripting and batch workflows, reducing manual effort and improving consistency in execution of data pipelines across development and production environments. Improved execution performance using query optimization and resource tuning, reducing processing time of large scale batch jobs and ensuring timely availability of updated datasets for reporting workflows. Monitored pipeline execution using logging mechanisms and debugging techniques, identifying failures, tracking execution status, and ensuring stability of data workflows supporting daily operational reporting. Environment: Python, PySpark, Spark SQL, SQL, AWS EMR, Amazon S3, Amazon Redshift, ETL Pipelines, Data Ingestion, Data Transformation, Batch Processing, Distributed Data Processing, Data Aggregation, Data Modeling, Data Validation, Query Optimization, Python Scripting, Logging & Monitoring, Debugging, Agile, Scrum Citibank Python Developer Hyderabad, India | Aug 2015 - Sep 2016 Developed data processing applications using Python and SQL, handling financial transaction records and generating structured outputs to support internal reporting workflows and downstream operational systems used by business teams. Implemented ETL workflows using data extraction and transformation logic, pulling financial data from relational databases and converting raw records into structured datasets for validation and reporting purposes. Wrote optimized SQL queries using joins and aggregation functions, enabling consolidation of transaction data across multiple tables and ensuring accuracy in financial reporting and reconciliation workflows. Applied data cleaning techniques using Pandas and validation logic, handling missing values, duplicate entries, and inconsistent formats to ensure reliability of datasets used in reporting processes. Developed reusable modules using object oriented programming in Python, improving maintainability and enabling consistent implementation of data processing logic across multiple scripts and applications. Built backend services using Flask and REST API design, enabling controlled access to processed financial data and supporting integration with internal reporting tools and downstream systems. Automated recurring reporting workflows using Python scripting and scheduled jobs, reducing manual effort and ensuring timely generation of financial reports required by business and compliance teams. Applied data validation rules using business logic checks and consistency verification, ensuring correctness of processed financial datasets before they were consumed by reporting systems. Supported debugging and issue resolution using log analysis and error handling techniques, identifying failures in data processing jobs and ensuring stable execution of workflows in production environments. Collaborated with analysts and developers in Agile environments, understanding reporting requirements and implementing data processing solutions aligned with financial systems and evolving business needs. Environment: Python, SQL, Pandas, Flask, REST APIs, ETL Workflows, Data Extraction, Data Transformation, Data Validation, Data Cleaning, Object Oriented Programming, Python Scripting, Query Optimization, Data Aggregation, Logging, Debugging, Error Handling, Agile, Scrum Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning sthree active directory bay area Arkansas California Delaware Virginia |