Home

Mrigonav - Data Scientist, Machine Learning Engineer
sumit@chabeztech.com
Location: Markleysburg, Pennsylvania, USA
Relocation:
Visa:
MRIGONAV


Summary
Data Scientist with over 10 years of experience in developing predictive models, conducting advanced analytics, and driving data-driven decision-making. Expertise in utilizing big data technologies such as Spark, Hadoop, and Snowflake, alongside machine learning frameworks to optimize business performance and resilience. Proven ability to lead complex data projects, automate workflows, and deliver actionable insights that impact organizational growth. Skilled in cloud computing platforms like AWS and Azure, with a strong commitment to innovation and continuous professional development.
Technical Skills
Python
SQL
R
Scala
Linux Shell Scripting
Hadoop Ecosystem: HDFS, Hive, Pig, MapReduce, Sqoop, Oozie
Apache Spark
Kafka
NoSQL Databases: MongoDB, Cassandra, HBase
Data Lakes: Experience designing and managing data lakes
ETL Tools: Apache Airflow, Talend, Informatica, Pentaho
Data Warehousing: Snowflake, Redshift, BigQuery
Cloud Platforms: AWS (S3, EMR, Lambda, Redshift, RDS), Azure (Data Factory, Databricks) , Google Cloud Platform (BigQuery, Dataflow)
Version Control: Git, Bitbucket
Monitoring: CloudWatch, Splunk Workflow Orchestration: Jenkins, Control-M
Machine Learning Frameworks: Scikit-learn, TensorFlow, Keras, PyTorch
Algorithms: Regression, Clustering, Random Forest, Gradient Boosting, SVM, Neural Networks, Time Series Analysis
NLP: Text processing and modeling (e.g., spaCy, NLTK)
Model Deployment: Flask, FastAPI, Docker
Data Manipulation: Pandas, Numpy, Scipy, Dplyr (R)
Visualization: Tableau, Power BI, matplotlib, Seaborn, Plotly
Statistical Techniques: Hypothesis testing, A/B testing, Bayesian analysis
RDBMS: MySQL, PostgreSQL, MS-SQL Server, Oracle, Teradata
Data Modeling: Dimensional modeling, ERD design, Snowflake schemas
Containerization and CI/CD: Docker, Kubernetes, Jenkins
Project Management: Agile, Scrum
Professional Experience
Sr. Data Scientist 01/2023 to 01/2025
T-Mobile Plano, Texas, USA

Developed and deployed predictive and prescriptive models leveraging machine learning techniques to optimize pricing, inventory management, and customer segmentation strategies.
Conducted deep analysis of customer transaction data to identify trends, improve personalization, and enhance the customer shopping experience.
Built scalable solutions to optimize supply chain and logistics operations, reducing delivery time by 15% through data-driven decision-making.
Governed ML models in production, adhering to MPLC principles for security, fairness, and compliance with organizational policies.
Designed and trained deep neural networks (DNNs) using frameworks like TensorFlow and PyTorch for image recognition, NLP, and time-series forecasting tasks.
Developed and deployed end-to-end ML workflows in Databricks using PySpark, MLFlow, and Delta Lake.
Optimized pipelines for efficiency and reliability by leveraging advanced MLOps tools and techniques.
Designed reward structures and policies for multi-agent reinforcement learning environments using OpenAI Gym.
Implemented explainability techniques such as SHAP and LIME to interpret ML models for business stakeholders.
Maintained documentation and reports for AI systems to ensure transparency and ease of maintenance.
Automate repetitive tasks using Python scripts and scheduling tools like Airflow or Cron.
Designed and implemented ETL pipelines for large-scale data processing using tools like Spark, Hive, and Snowflake, ensuring data availability and reliability for downstream analytics.
Managed the end-to-end Machine Learning Development Lifecycle (MDLC), including data collection, preprocessing, model training, validation, deployment, and monitoring.
Created dashboards and reports in Tableau and Power BI to communicate actionable insights to senior management, resulting in a 20% increase in data-driven decision-making.
Partnered with cross-functional teams, including engineering, product management, and marketing, to align business strategies with data-driven insights.
Developed and implemented Retrieval-Augmented Generation (RAG) pipelines to enhance the performance of generative AI models by incorporating real-time external data retrieval
Designed and conducted experiments to evaluate the effectiveness of promotional campaigns, pricing strategies, and UI/UX enhancements, delivering a 10% increase in conversion rates.
Mentored junior data scientists and analysts, providing guidance on best practices in statistical analysis, machine learning, and data engineering.
Developed internal tools for automated data extraction, feature engineering, and model evaluation to streamline the data science lifecycle.
Ensured compliance with Walmart's data governance policies and ethical standards while handling customer and transactional data.

Sr. Data Scientist 03/2020 to 12/2022
JPMorgan Chase & Co. Plano, Texas, USA

Led the migration of a document summarization tool to AWS Lambda, enhancing scalability and reducing deployment time by 30%, enabling efficient large-scale document processing.
Developed and deployed a machine learning-based content classification system, increasing categorization accuracy by 25%, streamlining data intake processes across departments.
Partnered with the Business Intelligence team to create predictive models, improving sales forecast accuracy by 15% and driving data-driven product strategies.
Applied Named Entity Recognition (NER) and text classification techniques to analyze customer feedback, boosting sentiment analysis accuracy by 20% and uncovering actionable insights.
Implemented real-time model monitoring and drift detection using AWS CloudWatch, ensuring robust performance through automatic retraining.
Built and deployed supervised and unsupervised ML (machine learning) models for predictive analytics and clustering.
Achieved a 40% reduction in resource consumption by optimizing inference times through model quantization and pruning, enhancing efficiency in cloud environments.
Spearheaded the integration of NLP and AI tools into enterprise workflows, improving operational processes and enabling data-driven decision-making.
Mentor junior data scientists and provide guidance on best practices for Python programming and data analysis.
Optimized cluster performance and job scheduling in Databricks for cost efficiency and processing speed.
Conducted experiments to evaluate RAG system performance, fine-tuning embeddings and similarity metrics to improve precision and recall.
Deployed machine learning models into production environments following MPLC protocols to ensure scalability and efficiency.
Collaborated with data scientists to ensure effective validation and deployment of machine learning models.
Conducted regular audits of AI systems to evaluate performance and compliance with organizational standards.
Designed and deployed a scalable recommendation engine using collaborative filtering, increasing user engagement by 30% through personalized content delivery.
Conducted A/B testing on recommendation algorithms, enhancing content relevance and boosting user satisfaction scores by 10%.
Designed and trained deep neural networks (DNNs) using frameworks like TensorFlow and PyTorch for image recognition, NLP, and time-series forecasting tasks.
Collaborated with UX/UI teams to improve the recommendation engine dashboard, increasing user adoption and satisfaction with personalized features.
Built a robust data pipeline using Apache Airflow, ensuring seamless integration of models, databases, and cloud services, reducing time-to-insight by 40%.
Leveraged advanced feature engineering and dimensionality reduction techniques to improve model accuracy by 10%, reducing overfitting and enhancing operational efficiency.
Developed real-time dashboards in Power BI and Tableau to present model predictions and trends, enabling stakeholders to derive actionable insights effortlessly.
Streamlined data ingestion workflows, reducing errors by 25% and improving consistency for analytics and reporting.
Monitored and maintained RAG pipelines in production environments, ensuring uptime, reliability, and continuous model improvement.
Facilitated performance reviews and retrospectives using Agile Scrum, fostering continuous team improvement and reducing cycle times by 20%.
Created an automated reporting solution to deliver daily performance summaries and metrics, empowering senior management with actionable insights.

Data Scientist 10/2018 to 03/2020
Discover Riverwoods, USA
Built a recommendation engine leveraging customer preferences and behavioral data, driving a 15% increase in product cross-selling through personalized financial product offerings.
Designed and implemented a real-time feedback loop using AWS Lambda and AWS S3, enhancing customer satisfaction prediction accuracy and improving response times for customer service teams by 20%.
Developed and deployed reinforcement learning models to optimize decision-making processes, such as dynamic pricing or inventory management.
Deployed machine learning models to forecast customer behavior and satisfaction with 92% accuracy, enabling proactive adjustments to marketing and sales strategies.
Developed advanced anomaly detection models to identify outliers in customer feedback, addressing negative sentiment quickly and improving customer retention by 10%.
Utilized NLP techniques to analyze unstructured customer feedback, extracting actionable insights and trends, empowering customer service teams to resolve recurring issues effectively.
Led the migration of a customer satisfaction prediction system to a microservices architecture, increasing scalability and reducing service downtime by 30%.
Partnered with marketing teams to design campaigns based on predictive customer satisfaction scores, increasing engagement by 18% through targeted financial solutions.
Improved customer segmentation models using unsupervised learning and feature selection techniques, resulting in a 20% boost in targeted marketing campaigns and customer engagement.
Implemented clustering algorithms like DBSCAN and hierarchical clustering for refined customer segmentation, enhancing targeted offers and increasing conversion rates by 12%.
Created custom metrics and KPIs for monitoring customer satisfaction and sentiment trends in real time, enabling data-driven decisions by senior management to enhance service quality.
Conducted feature engineering and optimized model selection processes, reducing error rates of customer satisfaction models by 18% and enhancing predictive performance.
Deployed a sentiment analysis tool across customer service channels, enabling real-time sentiment monitoring and reducing customer churn by 15%.
Developed real-time analytics dashboards to track customer sentiment trends, providing actionable insights to operational teams and increasing efficiency by 20%.
Data Scientist 01/2017 to 09/2018
Capital One Richmond, USA

Led the design and implementation of a unified data pipeline to streamline front-to-back-office processes for financial products, enabling efficient processing of credit card and loan data.
Automated data validation and processing workflows using Python and SQL, reducing manual intervention by 40% and improving system accuracy and efficiency.
Developed and tested predictive models for customer credit risk assessment and fraud detection, ensuring compliance with regulatory standards and enhancing model accuracy by 20%.
Partnered with product and engineering teams to define requirements, establish project milestones, and ensure alignment with Capital One s data-driven business objectives.
Enhanced the performance of machine learning pipelines by implementing advanced hyperparameter tuning and optimization techniques, reducing runtime by 25%.
Designed and tested APIs for seamless integration of machine learning models into enterprise systems using tools like POSTMAN and Swagger, ensuring reliable performance.
Established robust data quality frameworks and implemented monitoring mechanisms to ensure compliance with Capital One s data governance policies.
Guided a team of junior data scientists in developing and deploying scalable data solutions, providing technical mentorship and fostering professional growth.
Built advanced analytics dashboards using Tableau and Power BI to visualize credit card usage trends and customer insights, empowering stakeholders with actionable data.
Conducted deep-dive analysis on critical system issues, identifying root causes and implementing solutions that reduced resolution time by 25%.
Created reproducible machine learning pipelines using tools like Airflow and MLflow, improving collaboration and deployment efficiency.
Utilized clustering algorithms to refine customer segmentation strategies, enabling personalized product recommendations and improving customer satisfaction by 15%.
Designed an A/B testing framework to evaluate marketing strategies and optimize customer engagement, driving a 10% increase in campaign success rates.
Ensured adherence to financial data security and compliance standards in all modeling and data processing activities.

ETL/Data Analyst 01/2014 to 07/2015
GGK Tech Hyderabad, India

Designed and implemented ETL workflows using tools like Informatica, Talend, and SSIS to extract, transform, and load data from diverse sources into centralized data warehouses, ensuring data integrity and accuracy.
Maintained and optimized relational databases (SQL Server, Oracle, MySQL), improving query performance and reducing data processing times by 20%.
Conducted comprehensive data validation, cleansing, and transformation, ensuring high-quality datasets for analytics and reporting.
Developed interactive dashboards and reports using tools like Tableau, Power BI, and Excel, enabling stakeholders to gain actionable insights and make informed decisions.
Collaborated with business users to gather requirements, analyze data needs, and design solutions that aligned with business objectives.
Streamlined data ingestion processes from multiple internal and external sources, ensuring seamless integration and reducing data inconsistencies by 25%.
Fine-tuned ETL jobs and database queries to enhance system performance and ensure timely data delivery for critical business operations.
Provided ad hoc data analyses to support business units, delivering insights into trends and metrics that drove process improvements.
Automated repetitive data extraction and reporting tasks, reducing manual efforts by 30% and improving overall team productivity.

Education
M.S: IT & Cybersecurity 12/2024
New England College Henniker, NH
M.S: Data Science & Analytics 12/2022
New England College Henniker, NH
M.S: Systems & Engineering Management 12/2017
Texas Tech University Lubbock, TX
B.Tech 05/2014
National Institute of Technology Silchar, India
Certifications
Data Science A-Z Hands-on Exercises and ChatGPT Prize [2024]
Advanced Python: Working With Data
Advanced Snowflake: Deep Dive Cloud Data Warehousing and Analytics
Machine Learning A-Z: Hands-on Python & R in Data Science
Amazon Web Services: Data Services
Apache Spark Essential Training: Big Data Engineering
Data Engineering Pipeline Management with Apache Airflow
Keywords: continuous integration continuous deployment artificial intelligence machine learning user interface user experience business intelligence sthree active directory rlang information technology microsoft Colorado New Hampshire Texas

To remove this resume please click here or send an email from sumit@chabeztech.com to usjobs@nvoids.com with subject as "delete" (without inverted commas)
sumit@chabeztech.com;4763
Enter the captcha code and we will send and email at sumit@chabeztech.com
with a link to edit / delete this resume
Captcha Image: