Home

Sujitha C - AI/ML Engineer
[email protected]
Location: Mclean, Virginia, USA
Relocation: Yes
Visa: Green Card
Resume file: Sujitha_Ch_Resume_1777383817905.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Sujitha Cherukuthota
Senior AI/ML Engineer | Generative AI | LLMs | RAG | MLOps | Python | PySpark
+1 (757) 936-9318
[email protected]


Summary:
Senior AI / ML Software Engineer with 10 years of experience designing and deploying scalable AI systems using Python, building enterprise-grade applications that integrate machine learning models into real world workflows.
Architected and implemented machine learning and generative AI solutions including LLM-based systems, enabling intelligent automation, document processing, and predictive analytics across large-scale data-driven environments.
Strong expertise in software engineering and system design, developing microservices, REST APIs, and distributed systems that support reliable deployment and seamless integration of AI solutions into production applications.
Hands-on experience across the end-to-end AI lifecycle, including data ingestion, feature engineering, model training, validation, and deployment, ensuring high quality inputs and stable performance of machine learning systems.
Proven experience building AI-driven automation and data platforms on cloud environments, leveraging AWS services, containerization, and CI/CD pipelines to deliver scalable, secure, and high-performance enterprise solutions.
Experienced in collaborating with cross-functional stakeholders, translating business requirements into technical solutions, and delivering AI implementations aligned with operational needs, compliance requirements, and real-world use cases.

Technical Skills:
Programming Languages: Python, SQL, Scala, Bash/Shell Scripting, Java (Basic), R (Basic)
Artificial Intelligence & Machine Learning: Machine Learning, Supervised Learning, Unsupervised Learning, Classification, Regression, Clustering, Ensemble Methods, Random Forest, XGBoost, Gradient Boosting, Feature Engineering, Model Evaluation, Cross-Validation, Hyperparameter Tuning, Anomaly Detection, Time Series Forecasting, Statistical Modeling
Generative AI & LLMs: Generative AI, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Prompt Engineering, Few-shot / Zero-shot Learning, LangChain, LlamaIndex, Hugging Face Transformers, OpenAI GPT, Anthropic Claude, Semantic Search, Embeddings, LLM Evaluation, Guardrails, Hallucination Detection, LLM Observability
Deep Learning Frameworks: PyTorch, TensorFlow, Keras, Neural Networks (CNN, RNN, LSTM), Sequence Modeling, Transfer Learning, Fine-Tuning, Model Optimization
Data Engineering & Big Data: PySpark, Apache Spark, Spark SQL, Hadoop, Hive, HDFS, ETL / ELT Pipelines, AWS Glue, Databricks, Apache Kafka, Data Ingestion, Data Transformation, Data Modeling, Parquet, Avro, ORC
Cloud Platforms: AWS (Primary): S3, SageMaker, Bedrock, Lambda, EKS, ECS, Glue, Redshift, Athena, Step Functions, CloudWatch | Azure: Azure OpenAI, Azure ML, Data Factory, Synapse | GCP: Vertex AI, BigQuery, Cloud Storage, Dataflow
MLOps / LLMOps / DevOps: MLflow, Docker, Kubernetes (EKS / AKS), CI/CD Pipelines, Jenkins, GitHub Actions, GitLab CI, Terraform, Model Serving, Model Versioning, Prompt Versioning, Monitoring Pipelines
APIs & Backend Development: FastAPI, Flask, REST APIs, Microservices Architecture, API Integration, Async Processing, Object-Oriented Programming (OOP), Design Patterns
Vector Databases & Search: Pinecone, FAISS, Weaviate, Chroma, Elasticsearch, OpenSearch, Hybrid Search, Semantic Retrieval
Automation & Enterprise Tools: Power Automate, RPA Concepts, UiPath (Conceptual), Automation Anywhere (Conceptual), ServiceNow, Splunk, SharePoint, Okta
Databases & Storage: PostgreSQL, MySQL, MongoDB, DynamoDB, Amazon Redshift, Snowflake
Monitoring & Observability: CloudWatch, Prometheus, Grafana, Datadog, Logging, Alerting, Model Monitoring, Data Drift Detection, Concept Drift Tracking
Data Analysis & Visualization: Pandas, NumPy, Matplotlib, Seaborn, Tableau, Power BI, Exploratory Data Analysis (EDA), Reporting Dashboards
Security & Governance: IAM (AWS), RBAC, HIPAA Compliance, Secure API Design, Data Privacy, Encryption Standards
Tools & Methodologies: Agile, Scrum, SDLC, JIRA, Confluence, Git, GitHub, Bitbucket, Linux / Unix Environments

Experience:
HCA Healthcare
Senior AI / Machine Learning Engineer Richmond, VA | Feb 2024 - Present
Developed an AWS-based clinical risk prediction platform using Python, PySpark, and AWS SageMaker, enabling early identification of high-risk patients and improving patient risk stratification across healthcare systems.
Developed an AI-driven healthcare automation platform using Python, AWS services, and AWS Bedrock Claude, enabling intelligent clinical workflow automation and reducing manual intervention across hospital operations.
Designed a microservices-based architecture integrating ML models and REST APIs, enabling seamless orchestration of healthcare workflows across cloud-based enterprise systems.
Collaborated with CIO, CISO, and cross-functional stakeholders in Agile environments to define automation workflows and deliver secure enterprise AI solutions aligned with compliance requirements.
Built automated data ingestion pipelines integrating FHIR APIs and Amazon S3, enabling continuous ingestion of structured and unstructured healthcare datasets for analytics workflows.
Developed scalable data transformation pipelines using PySpark and AWS Glue, enabling efficient processing of large healthcare datasets supporting machine learning and automation systems.
Designed healthcare data lake architecture using Amazon S3 and Redshift, enabling efficient storage, retrieval, and integration of datasets across AI-driven healthcare platforms.
Implemented workflow automation using Python scripting and Power Automate, automating repetitive healthcare processes and improving operational efficiency across enterprise clinical systems.
Integrated enterprise systems with ServiceNow and Splunk, enabling automated alert monitoring, incident detection, and intelligent ticket generation across healthcare infrastructure environments.
Optimized automation pipelines using distributed processing and performance tuning, improving scalability, reducing latency, and enhancing system reliability across AI workloads.
Integrated machine learning and NLP models into healthcare workflows, enabling intelligent document processing, anomaly detection, and predictive automation for clinical and operational use cases.
Developed feature engineering pipelines supporting patient risk prediction use cases, improving data quality and enabling more accurate decision-making across AI-driven healthcare systems.
Deployed scalable AI applications using Docker, Kubernetes (EKS), and CI/CD pipelines, enabling secure and reliable execution of healthcare AI workflows across distributed environments.
Implemented monitoring using CloudWatch, Splunk, and Prometheus, enabling real-time tracking of model performance, system health, and anomaly detection across production AI systems.
Ensured secure enterprise deployment by integrating Okta-based authentication and access controls, aligning healthcare AI systems with HIPAA and governance security standards.
Environment:
Python, PySpark, AWS SageMaker, AWS Bedrock (Claude), NLP, Machine Learning, Feature Engineering, FHIR APIs, REST APIs, Microservices Architecture, Amazon S3, AWS Glue, Amazon Redshift, Data Lake Architecture, ETL Pipelines, Data Ingestion, Data Transformation, Docker, Kubernetes (EKS), CI/CD Pipelines, GitHub Actions, Jenkins, CloudWatch, Splunk, Prometheus, ServiceNow, Okta, Data Validation, Performance Tuning, Distributed Processing, Agile, Scrum

Sallie Mae Bank
AI / Machine Learning Engineer Newark, DE | May 2022 - Jan 2024
Engineered an Azure-based financial document intelligence platform using Python and Scikit-learn with PyTorch, enabling automated classification and routing of loan documents across high-volume servicing workflows.
Built AI-powered automation workflows using Python, NLP, and Azure OpenAI GPT-4, enabling automated document processing and reducing manual handling across financial servicing systems.
Designed scalable data pipelines using PySpark and Azure Data Factory, enabling large-scale processing of financial datasets supporting automation and analytics workflows.
Integrated enterprise automation systems with ServiceNow and SharePoint, enabling automated document routing, ticket generation, and workflow orchestration across banking operations.
Applied BPM principles to redesign servicing workflows, improving turnaround time, reducing manual bottlenecks, and enhancing operational efficiency across enterprise financial systems.
Deployed automation pipelines using Docker, CI/CD workflows, and Azure Kubernetes Service (AKS), ensuring scalable, secure, and reliable execution across production environments.
Designed end-to-end document processing pipelines where customer files were ingested into Azure Blob Storage, followed by preprocessing, feature extraction, and model inference services for near real-time classification.
Collaborated in Agile/Scrum environments with product teams, compliance analysts, and backend engineers to deliver secure machine learning solutions aligned with regulatory and servicing requirements.
Built scalable ingestion pipelines integrating REST APIs, enterprise databases, and document repositories, enabling continuous intake of structured and unstructured financial data for downstream analytics.
Developed transformation workflows using Pandas, PySpark, and Azure Data Factory, standardizing raw financial documents into normalized datasets optimized for classification model training.
Designed scalable storage architecture using Azure Blob Storage and Synapse Analytics, enabling efficient indexing, retrieval, and querying of large financial document datasets.
Implemented advanced text feature engineering pipelines including tokenization, vectorization, and embedding generation, improving document representation and classification accuracy across multiple categories.
Developed machine learning models using Logistic Regression and Random Forest with Scikit-learn, supporting document classification, request categorization, and prioritization across servicing workflows.
Applied NLP techniques such as text normalization and semantic representation, improving document understanding and enabling higher-quality insights across diverse financial document formats.
Built deep learning-based text classification models using PyTorch, improving contextual understanding and classification accuracy for complex document structures and unstructured text datasets.
Performed model optimization using hyperparameter tuning and cross-validation, improving classification accuracy, reducing variance, and ensuring stable performance across multiple document categories.
Built reusable machine learning pipelines using Scikit-learn and PyTorch, enabling reproducible training workflows and efficient experimentation across multiple modeling approaches.
Conducted comprehensive model evaluation using precision, recall, F1-score, and confusion matrix analysis, ensuring reliability before deploying models into production environments.
Containerized document processing services using Docker, packaging preprocessing workflows, trained models, and APIs into portable environments for consistent deployment across systems.
Deployed inference services through Azure Functions and AKS, enabling scalable, event-driven processing of document classification requests across distributed high-throughput environments.
Implemented CI/CD pipelines using Azure DevOps and GitHub Actions, enabling automated testing, deployment, and version control of machine learning models and document processing pipelines.
Environment:
Python, Scikit-learn, PyTorch, NLP, Machine Learning, Text Classification, Feature Engineering, Tokenization, Vectorization, Embeddings, Azure OpenAI (GPT-4), PySpark, Pandas, SQL, Azure Data Factory, Azure Blob Storage, Azure Synapse Analytics, Azure Kubernetes Service (AKS), Azure Functions, REST APIs, Microservices Architecture, ETL Pipelines, Data Ingestion, Data Transformation, Docker, CI/CD Pipelines, Azure DevOps, GitHub Actions, ServiceNow, SharePoint, Data Validation, Model Evaluation, Hyperparameter Tuning, Agile, Scrum

State of California
Data Scientist / Machine Learning Engineer San Francisco, CA | Feb 2020 - Apr 2022
Developed a GCP-based public program analytics platform using Python and Scikit-learn with PyTorch, enabling analysis of large administrative datasets and supporting data-driven planning across government systems.
Designed end-to-end analytics pipelines where agency data was ingested into Google Cloud Storage, followed by transformation workflows, feature engineering, model training, and batch inference processes for reporting use cases.
Worked in Agile/Scrum environments with analysts, engineers, and stakeholders to deliver secure machine learning solutions supporting forecasting, reporting, and operational decision-making across public sector programs.
Built scalable data ingestion pipelines integrating REST APIs, structured databases, and reporting systems, enabling centralized access to large government datasets for downstream analytics workflows.
Developed distributed transformation workflows using PySpark and GCP Dataflow with Pandas, converting raw administrative data into standardized datasets optimized for analytics and forecasting models.
Designed scalable data storage systems using BigQuery and Google Cloud Storage, enabling efficient querying, retrieval, and management of large datasets across reporting and machine learning workflows.
Implemented advanced feature engineering pipelines extracting temporal, operational, and program-level indicators, improving data quality and enabling more accurate predictions across time-based analytics use cases.
Developed machine learning models using XGBoost and Scikit-learn, supporting regression, classification, and anomaly detection across structured government datasets.
Applied time-series forecasting techniques including trend analysis, seasonal decomposition, and rolling window modeling to predict program demand and support long-term planning decisions.
Built deep learning models using TensorFlow and PyTorch, enabling pattern recognition and capturing complex relationships within structured administrative datasets.
Performed model optimization using hyperparameter tuning and cross-validation techniques, improving model performance, reducing overfitting, and ensuring stability across varying data distributions.
Built reusable machine learning pipelines using Scikit-learn and PyTorch, enabling standardized experimentation workflows and efficient comparison of multiple modeling approaches across use cases.
Implemented MLflow experiment tracking for model versioning and reproducibility, improving visibility into model training workflows across development and production environments.
Designed batch inference workflows using Vertex AI services, enabling scheduled model execution and integration of predictions into reporting systems and operational dashboards.
Containerized machine learning workflows using Docker, enabling portability and consistent deployment across development, testing, and production environments.
Deployed batch processing pipelines using Kubernetes (GKE) and GCP infrastructure, enabling scalable execution of machine learning workloads across distributed datasets.
Implemented CI/CD pipelines using Jenkins, enabling automated deployment and version control of ML pipelines and production model updates.
Managed infrastructure provisioning using Terraform, ensuring consistent cloud resource configuration and reproducible deployments across environments.
Monitored system performance using Cloud Monitoring and Prometheus, tracking pipeline execution, system health, and production issues impacting machine learning workflows.
Developed unit and integration tests using PyTest, ensuring correctness, stability, and reliability of data pipelines and model workflows across continuous development cycles.
Environment:
Python, Scikit-learn, XGBoost, TensorFlow, PyTorch, Machine Learning, Time Series Forecasting, Regression, Classification, Anomaly Detection, Feature Engineering, Pandas, PySpark, GCP Dataflow, Google Cloud Storage (GCS), BigQuery, Vertex AI, MLflow, Docker, Kubernetes (GKE), Terraform, Jenkins, CI/CD Pipelines, REST APIs, ETL Pipelines, Data Ingestion, Data Transformation, Batch Processing, Distributed Data Processing, Model Evaluation, Hyperparameter Tuning, PyTest, Logging & Monitoring, Google Cloud Monitoring, Prometheus, Agile, Scrum

Walmart Global Tech
Data Engineer Bentonville, AR | Oct 2016 - Dec 2019
Developed an AWS-based demand forecasting platform using Python and PySpark with Scikit-learn, enabling data-driven merchandising decisions and improving product demand prediction across large-scale retail datasets.
Designed scalable data pipelines integrating transactional databases, product catalogs, and REST APIs, enabling high-volume ingestion of retail data for downstream analytics and machine learning workflows.
Worked in Agile/Scrum environments with product managers, data engineers, and analysts to deliver data-driven solutions supporting inventory planning, customer segmentation, and retail forecasting use cases.
Built distributed data ingestion pipelines using Apache Spark and Hadoop ecosystems, enabling efficient processing of large-scale retail datasets across multiple enterprise data sources.
Developed large-scale transformation workflows using PySpark, converting raw transactional data into structured datasets optimized for analytics, feature engineering, and machine learning model training.
Designed scalable data storage systems using Amazon S3 and Redshift, enabling efficient storage, querying, and retrieval of historical retail data across analytics workflows.
Implemented advanced feature engineering techniques extracting customer behavior, product-level, and transactional patterns, improving data quality and enabling more accurate forecasting models.
Developed machine learning models using Random Forest and Gradient Boosting, supporting demand forecasting, classification, and customer segmentation across retail datasets.
Built recommendation systems using collaborative filtering and machine learning algorithms, enabling personalized product suggestions and improving customer engagement across retail platforms.
Performed model optimization using cross-validation and hyperparameter tuning techniques, improving model accuracy, reducing overfitting, and ensuring stable performance across seasonal demand trends.
Built reusable data and machine learning pipelines using PySpark and Scikit-learn, enabling standardized workflows and efficient experimentation across multiple retail analytics use cases.
Environment:
Python, PySpark, Apache Spark, Hadoop, Scikit-learn, Machine Learning, Random Forest, Gradient Boosting, Recommendation Systems, Collaborative Filtering, Feature Engineering, Pandas, SQL, REST APIs, Amazon S3, Amazon Redshift, ETL Pipelines, Data Ingestion, Data Transformation, Batch Processing, Distributed Data Processing, Data Modeling, Data Aggregation, Model Evaluation, Hyperparameter Tuning, Docker, CI/CD Pipelines, Logging & Monitoring, Query Optimization.

Citibank
Python Developer Hyderabad, India | Aug 2015 - Sep 2016
Developed data processing applications using Python and SQL, supporting financial transaction analysis, reporting workflows, and regulatory data processing across high-volume banking systems.
Designed scalable ETL workflows and automation scripts to extract, transform, and load data from relational databases and external APIs, enabling structured reporting and analytics across enterprise banking platforms.
Built data ingestion and transformation pipelines using SQL and Pandas, performing data cleaning, validation, and normalization to ensure consistency of financial datasets used for downstream reporting workflows.
Developed reusable modules using object-oriented programming (OOP) in Python, improving maintainability, modularity, and scalability of ETL and financial data processing applications.
Optimized complex SQL queries and processing logic, improving execution performance, reducing runtime, and enabling efficient handling of large-scale financial transaction datasets.
Developed Flask-based REST APIs and implemented PyTest test cases to improve system reliability and maintainability and enable secure integration of banking data workflows with internal applications.
Environment:
Python, SQL, Pandas, Flask, REST APIs, PyTest, ETL Pipelines, Data Extraction, Data Transformation, Data Ingestion, Data Cleaning, Data Validation, Object Oriented Programming (OOP), Python Scripting, Query Optimization, Data Aggregation, Logging, Debugging, Error Handling, Agile, Scrum

Education:
Sreyas Institute of Engineering and Technology
Bachelor of Technology in Computer Science Hyderabad, India | Aug 2011 - June 2015
Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree rlang bay area Arkansas California Delaware Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];7254
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: