Home

Sunil Laudari - Data Scientist
[email protected]
Location: Haddonfield, New Jersey, USA
Relocation: Yes
Visa: GC-EAD
[email protected]
Haddonfield, NJ, 08033 https://www.linkedin.com/in/sunillaudari/
https://github.com/sunil7634



PROFESSIONAL SUMMARY:
Data Scientist with 7+ Years of experience in working with vast data sets to break down information, gather relevant data points and solve advanced business problems. Skilled in designing and implementing solutions for analytical, engineering and visualization needs of the data. Advances proficiency in Machine learning, Data mining with structured & Unstructured data, Data validation, Predictive modeling.
Experienced in structured and unstructured data analytics, large datasets on distributed databases and developing Machine Learning algorithms to gain operational insights and present them to the leadership.
Extensively involved in Data preparation, Exploratory analysis, Feature engineering using Supervised and unsupervised modeling.
Well versed with Linear/non-linear, regression and classification modeling predictive algorithms.
Actively involved in Model selection, Statistical analysis, analyzing Correlations and similarities.
Expert at working with Statistical Tests: two-way independent & paired t-test, one-way & two-way ANOVA along with Non-parametric tests: Chi-squared tests.
Handled the imbalanced dataset, exploring the uses of under-sampling, over-sampling, and SMOTE.
Proficient in Ensemble Learning using Bagging, Boosting (LightGBM, XGBoost) & Random Forests.
Created dashboards as part of Data Visualization using Tableau, Seaborn, Matplotlib, ggplot2.
Performed preliminary data analysis using descriptive statistics, introducing dummy variables, and handled anomalies such as removing duplicates and imputing missing values using various Imputation methods.
Strong experience and knowledge in data visualization with Tableau creating line and scatterplots, Bar-charts, Histograms, Pie-chart, Boxplots, Timeseries, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
Validate the consolidated data and develop the model that best fits the data. Interpret data from multiple sources, consolidate it, and perform data cleansing using R Studio.
Performed multiple Data Mining techniques and derive new insights from the data.
Working Knowledge of Cloud services like Microsoft Azure for building, training, and deploying scalable models.
Strong decision-making ability with the aid of data analysis, and expert judgments. A quick learner.
Team player with good logical reasoning ability, coordination and interpersonal skills.
Team builder with excellent communications, time & resource management & continuous client relationship development skills.


SKILLS:
Programming: Python, R, SQL, Bash
Machine Learning: Cross Validation, PCA, Logistic Regression, KNN, Random Forest, Gradient Boosting, SVM, text analysis (NLTK)
Data Science Tools: Pandas, NumPy, Scikit-learn, Keras, TensorFlow, PyTorch, Spacy
Visualization: Matplotlib, Seaborn, Plotly, Power BI
DevOps Tools: Jira, Confluence, Jenkins, ELK, Git, GitHub, SourceTree, postman, PyCharm, AWS EC2/S3, Snowflakes, Azure DataBricks








PROFESSIONAL EXPERIENCE:
Client: Comcast Corporation Sep 2022 Current
Data Scientist/Python Developer Philadelphia, PA
Customized channel distribution using SQLite, increasing solver speed by 3x and reducing compute costs for Wi-Fi mesh network.
Spearheaded predictive analytics initiatives, saving $1M in workforce expenses by accurately forecasting Wi-Fi radio channels.
Architected an ensemble model integrating Scikit-learn random forest and XGBoost, achieving 97% accuracy in predicting pipe seam type.
Optimized ML models via cross-validation, boosting accuracy and efficiency through feature engineering. Achieved 95% info retention by reducing data dimensionality from 27 to 15 features for 30k points with PCA.
Engineered optimization model for home Wi-Fi devices, eliminating reliance on third-party software and slashing costs by over 40%.
Delivered actionable insights to senior management through compelling data visualization and comparative analysis of 1M+ observations using Power BI.
Implemented Data models, Algorithms to machine learning solutions using Python and AWS sagemaker, Jupyter notebook as member of data science team.
Worked on specific test data like before test, after 1st test, during test of gene data for lung cancer type and analyzed using.
Worked on Gene vs lung cancer KPI dashboards along with specific medicine to understand changes with regards to medicine and change in tissue/cancer.
Data manipulation is performed to create new variables, revalue existing variables and treat missing values to make our data ready for modeling stage using Pyspark
Performed data analysis on data applied stats like alpha test, T test, anova test, correlation between dependent and independent features.
About more than 1000 forms per day are sent out manually by extracting signature and by automating this process, saved $300K each quarter.
worked on a Gen AI project for a Chatbot using Langchain, chat GPT LLM, and used Streamlit as UI.
Worked on Fine tuning using RAG, Google vertex, compared results of before fine tuning and after fine tuning used RAG pipeline in Google vertex service, and ran models on GPU.
Developed POCs on Gen AI - RAG using subset data for stakeholders and business partners and explaining them on development, evolution and deployment on Google cloud using vertex and Json response as input data.
Use the supervised data to develop LSTM review systems for patients and physicians. achieved this by applying NLP techniques like word to vector conversion, TF IDF vectorization,stop words.
Analyzed financial and accounting related information across entities within cross teams to track all intercompany transactions as per different products and prepared tableau dashboard to draw insights and automate validations
Performed data transformation method for rescaling by normalizing variable
Delivered Interactive visualizations/ Tableau dashboards and using ggplot, matplotlib, seaborn, to present analysis outcomes in terms of patterns, anomalies and predictions
To determine unique patterns developed Time series forecasting models using Arima and LSTM to observe trend on different stages for patient data using python language
Exploring power of different machine learning algorithms by performing Multiple Regression, Random Forest predicts more accurate results
Deployed models in using Flask and AWS Elastic Beanstalk, EC2, AWS Sagemaker
Applied regression and classifications models on different POC's validate them and discuss with clients and stakeholders.
Monitored deployed model using AWS cloud trail, logs, visualizations and performance of the model, model behavior to new data.
A/B Testing of new solutions, hypothesis testing using python and Pyspark for organizational impact
Developed modular coding in vs code IDE and created different pipelines at each stage of the CI & CD, Docker and container files
Participation in scrum meetings and communicated with Delivery Lead, Migration team and Business SPOC
Tracking data flow in both Production and Acceptance dashboard.



University of Alabama in Huntsville (UAH) Aug 2020 - May 2022
Research Aide/Research Assistant Huntsville, AL
Featured on the cover page of Nature Astronomy, showcasing a significant galaxy mosaic crafted with Python libraries including Pandas, Seaborn, and Plotly, leveraging 100 Gigabytes of Hubble Space Telescope Data.
Built a machine learning model (Laplacian Edge Detection Algorithm) to remove Cosmic rays and artifacts from Hubble Space telescope data, reducing computation time by 10 min per filter (image).
Developed a tracking system for nearby galaxies with Astroquery (like SQL) to improve catalogue accuracy, reducing error by 20%.
Enhanced data quality by 17% through cleaning and preprocessing using a comprehensive suite of Python libraries, including NumPy, Pandas, Scikit-learn, and additional tools.
Formulated predictive models to forecast product category wise order volumes, season wise color and style choices so that departmental buyers can make educated and data driven decisions using python and/or Pyspark
Worked on image classification using CNN and computer vision, Implemented Hyper parameter tuning for scaling performance and achieved over 87% accuracy Image identifying
Applied various machine learning algorithms and statistical modeling techniques like decision trees, Na ve Bayes,Principal Component Analysis, regression models, Artificial Neural Network, clustering, SVM to identify Volume using scikit-learn packages in Python, Pyspark
Developed keyword extraction models using range of tools including TF-IDF, word2vec, NLTK and other NLP packages
Implemented parallelized data processing operations using DASK framework to clean and filter text data using python and/or Pyspark
Implemented multiple Time Series Forecasting models (ARIMA) to predict trends of fuel consumptions for different flight engines
Responsible for SQL Server Reporting Services Planning, Architecture, Training, Support, and Administration in Development, Test and Production Environments
Application of various Artificial Intelligence (AI)/Machine Learning algorithms and statistical modeling like decision trees, text analytics, Image and Text Recognition using OCR tools like, natural language processing (NLP), supervised and unsupervised, regression models
Proficient in SQL databases like MySQL, MS SQL, and PostgreSQL
Created data processing pipelines for training, testing, validation using Pyspark
Participation in scrum meetings and communicated with Delivery Lead, Migration team and Business SPOC
Tracking data flow in both Production and Acceptance dashboard.


Client: BBVA Bank Feb 2017 - May 2019
Junior Data Scientist Birmingham, AL
Automated classifier models like Random Forest, SVM for specific segments of a customer base, saving 22 hours of labor per month.
Constructed operational reporting and data visualization tools, reducing contractor scheduling costs by10% in the annual budget.
Deployed Auto-Sklearn to automate machine learning model selection, reducing modeling time by 2 hours per session.
Devised scalable solutions for Azure Databricks cloud environments, boosting storage efficiency by 20% and accelerating data analysis tools processing speed by 10%.
Adapted configurations to align with client requirements, resulting in a positive increment in system functionality and a 7% improvement in overall performance.
Conducted structured data preprocessing and analysis in R, Python.
Collaborated with team of 4 to create a Rule-based recommendation engine to recommend mutual funds and ETFs for a trillion-dollar asset management client.
Performed A/B testing and Post hoc analysis of a system to improve the existing algorithm accuracy by 40%. Reason for recommendation is also provided to user, making the systems reach increase by 30%.
Developed collaborative filtering and content-based recommendation system with Python to suggest mutual funds to financial advisors; increased click-through rate by 150% and average sales by $80K per advisor.
Designed insights generation model leveraging Python and SQL to find similarities and unique prepositions of a mutual fund in the US market by clustering historical and current data, doubling the searches.
Developed and Deployed a portfolio size estimation using Boosting algorithms (XGBOOST, CATBOOST).


Pragmatic Institute May 2022 - July 2022
Data Science Fellow Huntsville, AL
Employed NLTK on thousands of scraped Reddit posts to train classification models, reaching 92% accuracy with the top-performing model (Na ve Bayes with Count vectorize).
Forecasted the success of bank marketing campaigns using various machine learning techniques. The best model (Logistic regression) achieved 92% accuracy, 93% precision, and 97% recall.
Developed multiple ML models for predicting customer churn in the European banking industry, with the Random Forest model
demonstrating the best performance (F1=87%, recall=83%, precision=91%).
Achieved an average classification accuracy of 90% using Natural Language Processing (NLP) techniques, including Count Vectorizer/Hash Vectorizer, Term Frequency-Inverse Document Frequency (TF IDF), Tokenizing/Stemming, Multinomial Na ve Bayes for categorizing into various genres.


EDUCATION
University of Alabama in Huntsville Aug 2017 - May 2022
Ph.D. in Physics (Astrophysics) Huntsville, AL
G.P.A. 4.0/4.0
Relevant coursework: Data Analysis Math I & II
Keywords: continuous integration continuous deployment artificial intelligence machine learning user interface business intelligence sthree rlang information technology microsoft Alabama New Jersey Pennsylvania Wisconsin

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];2535
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: