Rahul - Data Scientist |
[email protected] |
Location: Buffalo, New York, USA |
Relocation: Open |
Visa: OPT |
Name: Rahul Contact no: 703-745-8917
Data Scientist with over all 6 years of experience specializing in machine learning and artificial intelligence across various industries I hold a master s degree in computer science and engineering from the University at Buffalo and a BTech in Electronics Engineering from IIT (BHU) Varanasi. My expertise extends to developing and deploying scalable AI solutions, with a significant focus on generative AI, natural language processing, anomaly detection systems, and particularly large language models (LLMs). I have demonstrated the ability to lead cross-functional teams in designing and implementing comprehensive systems that enhance operational efficiency and decision-making processes. Awarded multiple patents and published in peer-reviewed conferences, my career reflects a deep commitment to advancing AI research and its practical applications. I am adept at leveraging advanced AI technologies, including LLMs, to deliver impactful solutions that drive business growth and innovation. Currently, I am seeking opportunities to bring my technical leadership and strategic insights to a forward-thinking organization. EDUCATION University at Buffalo Buffalo, New York, US Master of Science in Computer Science and Engineering Aug. 2022 Dec. 2023 Coursework: NLP, Computational Linguistics, Pattern Recognition, Machine Learning, Algorithms Design GPA: 3.82 / 4 Indian Institute of Technology (Banaras Hindu University), Varanasi Uttar Pradesh, India Bachelor of Technology in Electronics Engineering Jul. 2014 May 2018 SKILLS Languages : Python, Java, C++, C, PLSQL, SQL, MongoDB Technologies : Retrieval Augmented Generation(RAG), Large Language Models(LLMs), Natural Language Processing(NLP), Ma- chine Learning, Deep Learning, Web Services, Data Structures, Algorithms, Prompt Engineering, Indexing, Quantization Cloud : Amazon Web Services (AWS), Azure, Google Cloud Platform (GCP), Oracle, Atlas, OpenAI Frameworks and Libraries : Pytorch, TensorFlow, Apache Spark, AWS Sage maker, Azure MLOps, Kera s, MLFlow, Kafka, Sentence- Transformers(embeddings, re-rankers), Databricks, Snowflake, PySpark, Langchain, Llama Index, Vector DBs(Chroma DB, Faiss, elas- ticsearch), MLFlow, Git, Jenkins, Big Data, Hadoop, Flask, Docker, Kubernetes, Pandas, Streamlit, Flask-RESTful, FastAPI, YOLO, Django, XGBoost, GAN, ActiveMQ, Springboot, Dask, spaCy, NLTK, Knowledge Graph EXPERIENCE Hilabs Washington, D.C. Feb.2024 May. 2024 Senior Data Scientist GenAI Driven Contract Analyzer: Streamlining Claim Processing Skills: Layout LM, OCR, AWS Cloud, Hugging Face, Python, OpenSearch, LLM, Mistral AI, AWS SQS, Private Data, Image Processing, AWS EKS, Docker, GenerativeAI, Reterival, Langchain, POC, Claim Processing, Insurance Led development of scalable end-to-end system for processing contract documents of insurance providers, emphasizing entity extraction to facilitate pricing configuration in claim processing applications. Engineered custom document processing pipeline with Layout extracting entities with images, tables, and text elements. Developed Langchain service for indexing extracted elements and managed metadata within OpenSearch to optimize retrieval. Implemented a robust infrastructure leveraging self-hosted fine-tuned Mistral AI LLM and asynchronous processing via SQS to efficiently handle requests, while prioritizing the utmost security and privacy of sensitive legal contracts Successful POCs followed by integration for 3 clients to validate solution effectiveness and potential for widespread adoption. Apexanalytix Remote, New York May.2023-Feb.2024 Data Scientist | ML Engineer Intern | Generative Knowledge Specific Chatbot Skills: Hugging Face, Python, Azure Cloud, Azure Devops, Azure OpenAI, GenerativeAI, Reterival, Langchain, ChromaDB, Retrieval Augmented Generation (RAG), Production, LLM, Chatbot, Re-Rankers Developed advanced Retrieval Augmented Generation(RAG) Chatbot with LLM for intelligent knowledge access across teams. Engineered efficient retrieval pipeline with Parent Child document indexing using LangChain and Chroma VectorDB Designed agile RAG pipeline, integrating MMR scoring for chunk retrieval and Azure OpenAI LLM for response generation. Implemented user feedback collection to monitor chatbot performance and gather data for iterative refinement and fine-tuning. Achieved 89% approval in human evaluations and integrated technology into 12 internal and 18 external client applications. Oracle Bengaluru, India Sept.2020 Aug.2022 Senior Application Engineer(ML) Preemptive Anomaly Prediction in Corporate Billing Skills: Oracle Cloud, Oracle Financial Services, PLSQL, SQL, Anomaly Detection, US Patent, Big Data, Parallel Processing - Implemented in-memory multivariate anomaly prediction system for corporate billing, addressing monthly billing challenges Leveraged Oracle in-database ML for Semi-Supervised classification with both local and global model explainability. Optimized service with indexing and parallelism, processing 1.2M bills and 5M segments in 20 mins with 92% precision Integrated services with Oracle Revenue Management and Billing (ORMB) product, USPTO patent granted [US17/710745] Wipro Limited Bengaluru, India June 2018 Sept 2020 Project Engineer(AI) Chatbot Services for Employee Helpline Portal s Ticketing System Skills: PyTorch, Hugging Face, AWS Cloud, Python, Retrieval, BERT, FAISS, Chatbot, User Experience, Automation, Semantic Search Developed Employee Helpline Portal Chatbot with effective retrieval of historical ticket resolutions for enhanced user support. Implemented query intent classification and BERT-powered semantic search to deliver precise responses. Successfully integrated the chatbot into the portal and achieved the target human agent intervention reduction of 70%. Achieved 79% accuracy score in evaluation and decreased wait times from 18 to 4 minutes, improving overall user experience. PATENTS and PUBLICATIONS Paper Published: Virtual Conversation with Real-Time Prediction of Body Moments/Gestures ICMLIP 2019 [Link] US Patent Granted: Method And System For Multimodal Analysis Based Emotion Recognition US16/795840 [Link] US Patent Granted: Anomaly Detection for Bill Generation US17/710745 [Link] US Patent Granted: Technology System For Assisting Financial Institutions In Debt Collection US17/659017 [Link] PROJECTS Generative Empathetic Chatbot (BabbleGo) [code] [report] [slides] Skills: PyTorch, Hugging Face, Python, OpenAI, GenerativeAI, Reterival, Haystack, Elasticsearch, Jupyter Notebook Deployed the application as web service hosted on Streamlit, utilizing vectorstore on ElasticSearch for efficient storage and retrieval of conversation data. Implemented versatile RAG chatbot capable of delivering information and engaging in emotion-aware casual conversations. Developed an intelligent dialog management system for effective user interaby 35% while maintaining resource efficiency. Network-based Intrusion Detection System (NIDS) [code] [results] Skills: PyTorch, TensorFlow, Data Analysis, Deep Learning, Python, Flask Research Assistant, Dr. Hongxin Hu Engineered an Intrusion Detection System using deep neural detectors to efficiently identify and respond to potential security threats in the network, mitigating the risk of data loss and downtime. Conducted analysis on 12 network attack datasets with 4 deep neural detectors, identified method effectiveness and limitations. Conducted comprehensive analysis on 12 different network attack datasets and evaluated the performance of 4 deep neural detectors. Reported insights into the limitations and effectiveness of methods. LLM-Powered SQL DB agent [code] [slides] Skills: Lang Chain, OpenAI, LLM, Python, SQL, APIs, SQL Alchemy Designed, and implemented an LLM-powered SQL Database Agent, enabling intuitive natural language interactions with SQL databases Seamlessly integrated Lang Chain to extract comprehensive table descriptions and contextual information directly from SQL databases. This context was then leveraged to enhance the generative capabilities of the OpenAI model. Introduced a robust query execution layer within the agent, proficiently managing SQL query execution and proficiently handling database validation errors to ensure accurate and reliable query results. Text Chat Application [Code] [Live Demo] Skills: C++, Networking, Linux, Socket Programming, Protocols, Chat Application, Server, Client Developed a client-server chat application following the conventional client-server model, providing the capability for numerous clients to log in, establish their identity, and communicate with each other via the central server. Utilized socket programming to establish robust connections between the server and clients, ensuring reliable message trans- mission. Implemented a buffering system to manage message storage and retrieval, particularly for clients who were offline at the time of message receipt Conducted comprehensive testing of the application in a live production environment to assess its accuracy and efficiency, guaranteeing that it met the performance requirements and functioned reliably under real-world conditions. Intelligent Search for Offers [Code] [Live Demo] Skills: Streamlit, Sentence Transformers, Rerankers, Python, Pandas Independently designed and developed the Offers Search Engine, implementing hybrid semantic and exact search techniques for improved offer retrieval Spearheaded the integration of search and reranking pipelines, resulting in a 40% more accurate search and 70% reduction in retrieval times for extensive datasets. Successfully deployed the service with a user-friendly interface using Streamlit, enhancing user engagement and ensuring a seamless experience for offer searches. Keywords: cprogramm cplusplus artificial intelligence machine learning access management database information technology Massachusetts |