Home

Sucheta - Data Scientist/Analyst
[email protected]
Location: Chicago, Illinois, USA
Relocation: Yes
Visa: H1B
Sucheta V S
469-663-0081 | [email protected]
SUMMARY:
9 years experienced Data Scientist/analyst with background in data analysis, cleansing, data modeling, ETL and Python developer working with cross functional teams.
Experienced in Data analysis, Reporting, Statistical analysis, Machine learning, Time series analysis and Predictive modeling to create meaningful Data Visualizations- Dashboards, and predict future trends from data.

TECHNICAL SKILLS:

Programming Python (Pandas, NumPy, Matplotlib, Scikit-learn, Seaborn, TensorFlow, NLTK, Kera s, Flask, SciPy, Stats model, BeautifulSoup), R, SAS, Scala, PySpark, Java, C, C++, HTML, XML, CSS, JavaScript
Statistical skills Data Analysis, Statistics, Machine Learning (Supervised and Unsupervised), Data Mining, Predictive Modeling, Statistical Analysis, Hypothesis testing, Time Series Analysis and Forecasting, Natural Language Processing, Deep learning, Inferential Statistics, Descriptive Statistics, Experimental design, REST APIs and JSON processing
Big Data Spark, Hadoop (HDFS, MapReduce), Hive, Scoop, Spark MLlib
Visualization Tableau, Plotrix, ggplot2, Plotly, Power BI, Tableau
IDE tools & cloud Jupyter Notebook, RStudio, Docker, Kubernetes, AWS sage maker, MS Azure, JIRA
Databases SQL, NoSQL, MySQL, SQL Server, MongoDB, MS Access, SSIS Version
Control Git, GitHub

EXPERIENCE:

TIAA Financial Services, Chicago, IL Feb 2021 Apr 2023
Data Scientist/Analyst

Identified customers for retention and deletion based on CCPA law for all the customers across all 50 States. Impacted business by saving heavy charges occurred in penalty.
Analyzed customer behavior in python and deleted customer records from the database when no longer active in the system.
Performed data extraction from different sources such as Teradata, Oracle, cleaned and analyzed, created dashboards and presented insights to business leaders
Participated in all phases of project life cycle including data collection, data mining, data cleaning, developing models, validation and creating reports.
Performed data cleaning on a huge dataset which had missing data and extreme outliers from Hadoop workbooks and explored data to draw relationships and correlations between variables.
Performed data-preprocessing on messy data including imputation, normalization, scaling, and feature engineering using Scikit-Learn.
Conducted exploratory data analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlations between features.
Build classification models based on Logistic Regression, Decision Trees, Support Vector Machine, XGBoost to predict the probability of a customer using the application.
Employed Ensemble Learning techniques such as Random Forests and Ada Gradient Boosting to improve the model performance by 10%
Deployed ML models into production using AWS

Environment: Python, libraries - Pandas, NumPy, Scikit-learn, Seaborn, Random Forests, XGBoost, SVM, PCA, Docker, AWS, Teradata, Oracle

NetObjex Inc / Remote Sep 2020 - Jan 2021
Data Analyst

Worked on NetObjex s AI platform, an automated tool for ML process
Contributed towards data preparation/wrangling from different data sources such as RDBMS and unstructured data, data manipulation in python
Built data pipelines, improved existing statistical models and ML model s performance by applying hyperparameter tuning and cross validation techniques
Created data visualization dashboards to provide insights for Covid application called COVID PreAuth
Built Machine learning model (decision trees) to automate the response by the application
Applied hypothesis testing (A/B) to app UI to improve user experience
Collaborated with product teams to analyze key requirements, business problems and maintain document

WeWork (Flatiron School), Chicago, IL Aug 2019 - July 2020
Data Scientist

Performed statistical data analysis to understand the Customer behavior - cleaned structured and unstructured data, analyzed & reported destination patterns of 7 years of alumni by departments.
Performed analysis on historical, demographic and behavioral data as features to understand the customer behavior for marketing that offers the right product to the right person.
Performed Data Profiling to learn about behavior with various features such as location, time, Date and Time etc. Integrating with external data sources and APIs to discover interesting trends.
Built Machine Learning models to identify fraudulent applications for loan pre-approvals and to identify fraudulent credit card transactions using the history of customer transactions with supervised learning methods.
Performed Data Cleaning, features scaling, featurization, features engineering.
Used Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-learn in Python at various stages for developing machine learning model and utilized machine learning algorithms such as Logistic regression, Random Forests, Decision Trees, XG Boost to build predictive models.
Involved in various pre-processing phases of text data like Stemming, Lemmatization and converting the raw text data to structured data.
Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text unstructured data to structured data and implemented Natural Language process mechanism for text analysis.
Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.
Implemented Principal Component Analysis (PCA) in feature engineering to analyze high dimensional data.
Used confusion matrix and log loss to validate the model performance.
Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.
Used Spark s Machine learning library to build and evaluate different models.
Lead the Data science program and impacted 41 students lives to transform careers through education
Designed and delivered end to end Full time data science in-person immersive programs featuring Python, SQL
Conducted hands-on application-based coding lectures on Machine Learning, Statistics, Deep Learning and NLP, Big Data - MapReduce, PySpark, SQL, NoSQL, MongoDB and Data Engineering, Time Series Analysis

Environment: Python, Seaborn, Sci-kit learn, Keras, TensorFlow, PyTorch, Machine learning libraries, NLP, Linux, Google cloud, Flask, Docker, PySpark, Jupyter Notebook, Statistical Analysis.

Artha Solutions, Scottsdale, AZ Jan 2019 - July 2019
Data Analyst

Build a decision tree and random forest based on Entropy, Information gain and Gini Impurity for split criteria.
Collaborated with the product manager on the requirements of the project and explored the data from the database querying (SQL) search techniques, web services etc.
Preparing data using techniques like dimensionality reduction for reduction of features (PCA), cleaning the data using libraries of Python.
Applying advanced statistical technique while performing machine learning algorithms on the heterogeneous data.
Used advanced analytical tools and programming languages such as Python (NumPy, pandas, SciPy) for data analysis.
Constructed and evaluated various types of datasets by performing machine learning models using algorithms and statistical modeling techniques such as classification, regression and text mining from Python libraries (Scikit-learn).
Performing Predictive analytics and machine learning algorithms especially supervised (SVM, Logistic Regression, Boosting), Ensemble methods (Random Forests) and Neural Network.
Obtained better predictive performance of 81% accuracy using ensemble methods like Bootstrap aggregation (Bagging) and Boosting (Adaboost, Gradient).
Build a decision tree and random forest based on Entropy, Information gain and Gini Impurity for split criteria.
Used regularization techniques to solve the over-fitting problem by reducing loss function either by adding multiple (LASSO or Ridge) or by performing cross validation.
Repeating steps as required for improving the Scalability, Reliability and performance of our Streaming Data Pipelines which were built on top of Spark.
Productionzed model using Docker containers and Kubernetes.
Deployed NLP and ML model API in Amazon cloud, Used Tableau for data visualizations
Communicated the results with team members and with other data science teams and marketing operations teams for taking best decisions.

Environment: MySQL Workbench, Python, Jupyter notebook, Apache Spark, Tableau, Hive, Hadoop Pandas, Matplotlib, Seaborn, Scikit-Learn, SQL, Linux, Git, Microsoft Excel, PySpark-ML, Random Forests, XGBoost, SVM, PCA, TensorFlow, Keras, Pytorch, NLTK, MongoDB, Docker, Kubernetes

Artha Solutions, Scottsdale, AZ May 2018 - Dec 2018
Data Science Intern

Applied Artificial Intelligence (AI) and Time Series Analysis for product development. Automated the process of model building, model selection and forecasting based on the data. Contributed towards model building and the addition of AI layer
Performed transformations in Spark for the data ingested from different database sources into Hadoop cluster in-order-to compare the source and target files
ETL data manipulation: Used scoop to migrate data from different RDBMS sources to Hadoop Distributed File System (HDFS). Created Hive tables and performed data transformation and loaded the data back into the different Hive tables
Developed a web-based user interface for RFP Engine , NLP based question and answer tool using Python Flask
Implemented Machine learning algorithms in R for a Customer Churn POC. Trained logistic regression, decision tree to classify customers with the best accuracy of 75%. Attained the possible factors resulting in customer churn

Indian Institute of Technology Bombay, Mumbai, India Feb 2014 - Aug 2017
Software Engineer - Python developer

Prepared legacy data by performing data cleansing and various data transformations.
Established relationships between the tables using primary and foreign key constraints using SQL triggers.
Performed ETL process to Extract, Transform & Load the data from OLTP tables into staging tables & data warehouse Merges datasets, cleaned constructed datasets, produced summary statistics, conducted difference in means tests.
Prepared and analyzed the data, includes, locating, profiling, cleansing, extracting, mapping, importing.
Communicated status, issues and impacts to business team leaders, relevant client sponsors and process owners (stakeholders) and provided Post Go-live support.
Collaborated with a cross-functional team of engineers, business managers to resolve business related system issues.

EDUCATION:
Michigan Technological University, Houghton, MI December 2018
Master of Science, Data Science (major - Statistics)
Keywords: cprogramm cplusplus artificial intelligence machine learning user interface business intelligence rlang golang microsoft Arizona Illinois Michigan

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];156
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: