Home

Minnervir - Data Scientist/ Data Analyst
[email protected]
Location: Remote, Remote, USA
Relocation: Yes
Visa: OPT EAD
Minnervir
[email protected] | 817-678-0872
Only Corp to Corp
PROFESSIONAL SUMMARY:
6+ Years of experience in IT industry as a Data Scientist/ Data Analyst with good experience in Data Science (Machine Language) and Data Analysis.
Proficient with Python and basic libraries for statistical/ econometric modeling such as sci-kit-learn, pandas, NumPy, Data analysis using complex and optimized SQL and technologies of Structured and Unstructured Data, Data Acquisition, Data Validation, Predictive Modeling, Data Visualization, Web Crawling, Web Scraping Statistical Modeling, Data Mining and Natural Language Processing (NLP).
Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
Proficient in managing the entire Data Science project life cycle and actively involved in all the phases of the project life cycle, including Data acquisition, Data cleaning, Engineering, features scaling, feature engineering, statistical modeling (Regression, Logistic, XGBoost, Decision Trees, Time Series Models, Neural Networks-CNN, RNN, GNN, Support Vector Machine (SVM), Clustering KMeans, KNN, dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and Data Visualization.
Experiences on applying Deep Learning to NLP and other Natural Language Problems.
Experienced in using the analysis tools of Python - (Pandas, Numpy, Matplotlib, Seaborn, Bokeh, Sci-kit, SciPy, NLTK, OpenCV, Natural Language Processing, Deep Learning, Keras, and TensorFlow).
Strong theoretical foundations and practical hands-on projects related to Supervised Learning (Neural Networks, Random Forest, Gradient Boosting), Unsupervised Learning (Clustering, Dimensionality Reduction), Recommender System, Probability & Statistics, and Data Structures
Experienced in in Python and R like ggplot2, caret, dplyr, Rweka, models, RCurl, tm, C50, Twitter, NLP, Reshape2, Jason, plyr, pandas, NumPy, seaborn, scipy, matplotlib, sci-kit-learn, Beautiful Soup.
Experience in building models with deep learning frameworks like TensorFlow, PyTorch, and Keras.
Proficient in Data transformations using log, square-root, reciprocal, cube root, square and complete box-cox transformation depending upon the dataset.
Adept at handling Missing Data by exploring the causes like MAR, MCAR, MNAR and analyzing Correlations and similarities, introducing dummy variables and various Imputation methods.
Experience in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA.
Experienced in developing Supervised Deep Learning algorithms which include Artificial Neural Networks, Convolution Neural Networks, Recurrent Neural Networks, LSTM, GRU and Unsupervised Deep Learning Techniques like Self-Organizing Maps (SOM s) in Keras and TensorFlow.
Hands on experience in deployment activities in cloud MS Azure and Amazon Web Services (AWS) for Production Environment.
Experience in developing applications using Amazon Web Services like EC2, Virtual private clouds (VPCs), Elastic Load balancer ELBs, Storage models (EBS, S3, and instance storage).
Working Experience on Normalization and De-Normalization techniques for both OLTP and OLAP systems with strong Knowledge of Relational and Multidimensional Modeling concepts.
Worked on Modeling and full scale Implementation of both relational and Multidimensional Data warehouse.
Experienced T-SQL programming (DDL, DML, and DCL) skills like creating Stored Procedures, User Defined Functions, Constraints, Querying, Joins, Keys, Indexes, Data Import/Export, Triggers, Tables, Views and Cursors
Experience in creating SQL queries for a variety of RDBMS, including SQL Server, MySQL, Microsoft SQL, PostgreSQL, Teradata, as well as NoSQL databases, including HBase, MongoDB and Cassandra, to handle complex and huge data.
Experience in version control tools like Git and build tools like Apache Maven/Ant.
Hand-on experience on Data Visualization tools like Tableau, PowerBI, Matplotlib, Seaborn, ggplot2, and Plotly.
Experience in designing visualizations using Tableau software and Storyline on web and desktop platforms, publishing and presenting dashboards.

TECHNICAL SKILLS:
Operating Systems Linux, Windows
Programming Languages R, SQL, Python, Shell scripting, Java, Scala
Methodologies SDLC - Agile, Waterfall
Machine Learning Feature scaling, Linear and Logistic Regressions, K-NN, Decision Trees, K Means clustering, Support Vector Machine (SVM), Random Forest, Na ve Bayes, Hierarchical clustering, NLP, CNN, ANN
Python Packages Pandas, NumPy, seaborn, SciPy, matplotlib, scikit-learn, Beautiful Soup, Rpy2, Tensorflow, Pytorch
R Packages ggplot2, caret, dplyr, RWeka, gmodels, RCurl, Wordcloud, Kernlab, Neuralnet, Twitter, NLP, Reshape2, rjson, plyr
Statistics: Supervised Learning Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, Deep Neural Networks, Bayesian Learning
Statistics: Unsupervised Learning Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization
Sampling Methods: Bootstrap sampling methods and Stratified sampling
Descriptive analytics, hypothesis modelling, t-test, ANOVA
Cloud Amazon Web Services (AWS), MS Azure
AWS Services
Databases Oracle, SQL Server, MS Access, MySQL, MongoDB, Cassandra, PL/ SQL, Teradata
Data Warehouse Snow Flake, Star Schema
IDE R Studio, Jupyter Notebook, PyCharm, Atom
Visualization/BI Tools Tableau
ETL SSIS, Informatica Power Center

PROFESSIONAL EXPERIENCE:

Client: Charter Healthcare Group, Cucamonga, CA June 2022 - Till Date
Role: Data Scientist
Description: Charter is committed to enriching lives through the delivery of safe, compassionate care that improves quality outcomes, reduces costs, and creates a better experience for patients and families. Charter Healthcare main service offerings: private duty nursing, skilled home health care, palliative care, complex care management (CCM), hospice, and acute/ hospital based care.

Responsibilities:
Involved in gathering, analyzing and translating business requirements into analytic approaches.
Implemented Python libraries such as NumPy, Pandas, SciPy, Matplotlib, Scikit-Learn, NLTK, and seaborn.
Involved in implementing Spyder (Python) and R Studio (R) open source tools for statistical analysis and designing machine learning. Involved in creating the business rules, data definitions, and source to target data mappings.
Partnered with Data Science Platform and Business stakeholders to address any issues related to security, reliability, performance, and Data quality issues
Built, tuned, implemented and scheduled the machine learning models at on premises and on AWS Cloud.
Deployed Machine Learning (ML) models in a scalable and secure environment using Amazon Elastic Container Service (ECS) and AWS Fargate.
Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
Performed exploratory Data Analysis like calculation of descriptive statistics, detection of outliers, assumptions testing, factor analysis, etc., in Python and R.
Utilized Pandas, SciPy, Beautiful Soap, and NumPy, exercised data cleaning and assured data quality, consistency, and integrity.
Extracted and analyzed internal and external Data sources to help answer key business problems related to risk assessment.
Performed exploratory data analysis and data cleaning along with data visualization using Seaborn and Matplotlib.
Monitored and managed the performance of Machine Learning (ML) models using Amazon CloudWatch and AWS CloudTrial.
Trained several machine learning models like Logistic Regression, Random Forest and Support vector machines (SVM) on selected features to predict Customer churn.
Improved model accuracy by 5% by introducing Ensemble techniques: Bagging, Gradient, Xtreme Gradient and Adaptive Boosting.
Used AWS S3, DynamoDB, and AWS lambda, AWS EC2 for data storage and models deployment.
Built a classification model to classify customers for promotional deals to increase likelihood of purchase using Logistic Regression and Decision Tree Classifier.
Collected unstructured data from MongoDB and completed data aggregation.
Extracted Data from the Database using Excel/ Access, SQL procedures and created Python and R Datasets for statistical analysis, validation and documentation.
Created and maintained data pipelines for Machine Learning (ML) training and inference using AWS Glue and AWS Data Pipeline.
Worked with numerous data visualization tools in python like matplotlib, seaborn, ggplot, and pygal.
Analyzed the SQL scripts and designed the solution to implement using Pyspark and developed scripts as per the requirement.
Developed interactive executive dashboards using Power BI to provide a reporting tool that facilitates organizational metrics and data.

Environment: NumPy, Pandas, SciPy, Matplotlib, Scikit-Learn, NLTK, seaborn, Spyder (Python), R Studio (R), CloudWatch, CloudTrial, Elastic Container Service (ECS), AWS Fargate, AWS S3, Glacier, DynamoDB, Lambda, EC2, Glue, Data Pipeline, Beautiful Soup, Excel/Access, SQL, Power BI, MongoDB, Logistic Regression, Random Forest, Matplotlib, Seaborn, ggplot, pygal, Pyspark

Client: Patagonia, Ventura, CA Jan 2021 - May 2022
Role: Data Scientist
Description: Patagonia is an outdoor apparel company based in Ventura, California. Patagonia operates stores in more than 10 countries globally, as well as factories in 16 countries. A certified B-Corporation, Patagonia s mission is to save our home planet. Patagonia is a designer of outdoor clothing and gear for the silent sports: climbing, surfing, skiing and snowboarding, fly fishing, and trail running.

Responsibilities:
Implemented complete data science project involving data acquisition, data wrangling, exploratory data analysis, model development and model evaluation.
Worked on data discovery, handling structured and unstructured data, cleaning and performing descriptive analysis, and preparing data sets.
Performed data collection, data cleaning, data profiling, data visualization and report creating.
Worked with Data Engineers and Data Analysts into a cross functional team for the deployment of models and working of the projects.
Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using Python, and R.
Tested out performance of classifiers like Logistic Regression, Na ve Bayes, Decision tress and Support vector classifiers.
Designed Data profiles for processing, including running SQL, Procedural/ SQL queries and using Python and R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks.
Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Seaborn, Scikit-learn, and NLTK in Python at various stages for developing machine learning model.
Performed data cleaning on the medical dataset which had missing data and extreme outliers from PySpark data frames and explored data to draw relationships and correlations between variables.
Implemented data pre-processing using Scikit-Learn. Steps include Imputation for missing values, Scaling and logarithmic transform, one hot encoding etc.
Identified business problem, select appropriate performance metric, develop Data Science modeling/ algorithmic approach, and evaluate success of that approach within constraints of Data and timeline.
Performed data pre-processing and cleaning to prepare the data sets for further statistical analysis.
Implemented Amazon SageMaker for deploying and managing Machine Learning (ML) models in production.
Worked on Statistical methods like data driven Hypothesis Testing and A/B Testing to draw inferences, determined significance level and derived P-value, and to evaluate the impact of various risk factors.
Furthered Hypothesis testing by evaluating Errors (Type 1 and Type 2) to eliminate skewed inferences.
Implemented and tested the model on AWS EC2 and collaborated with development team to get the best algorithms and parameters.
Created and maintained CI/CD pipelines for Machine Learning (ML) models using AWS CodePipeline and AWS CodeBuild.
Integrated pre-built AI capabilities into models using AWS Comprehend, Rekognition, and Transcribe.
Extracted data from HTML and XML files by web-scraping through customer reviews using Beautiful Soup, also pre-processed raw data from the company s data warehouse.
Performed data post processing using NLP techniques like TF-IDF, Word2Vec & BOW to identify the most pertinent Product Subject Headings terms that describe items.
Performed Data Visualization using RStudio, used ggplot2, lattice, high charter, Leaflet, Plotly & Cufflinks, sunburstR, RGL to make interesting plots.
Performed Na ve Bayes, K-NN, Logistic Regression, Random Forest, SVM and KMeans to categorize customers into certain groups.
Performed Linear Regression onto the classified clusters of customers that were deduced from clustering through K-NN and K-means clustering.
Prepared data-visualization designed dashboards with Tableau, and generated complex reports including summaries and graphs to interpret the findings to the team.

Environment: Python, R, SQL, Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-Learn, NLTK, PySpark, AWS, Amazon SageMaker, AWS EC2, AWS CodePipeline, AWS CodeBuild, Beautiful Soup, NLP, Word2Vec, BOW, RStudio, ggplot2, lattice, Leaflet, Plotly, Cufflinks, sunburstR, RGL.

Client: Punjab National Bank, New Delhi, India Aug 2017 - Nov 2020
Role: Data Scientist/ Data Analyst
Description: Punjab National Bank (abbreviated as PNB) is an Indian public sector bank. Bank has spread its offerings and has grown as technology driven bank with products & services to meet the aspirations of every segment of customers. Punjab National Bank is an Ideal destination for all Banking need. PNB offers a wide range of personal banking services including loans, credit cards, Retail Banking, Corporate Banking, International Banking, Rural Banking, Digital Banking, and Merchant Banking.

Responsibilities:
Actively involved in Data Analysis, and Unit testing of the data and delivery assurance of the user story in Agile Environment.
Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, interpret data, model validation, and visualization to deliver data science solutions.
Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python.
Used Pandas, NumPy, Seaborn, and Matplotlib, Scikit-learn in Python for developing various machine learning models such as Logistic Regression, Gradient Boost Decision Tree and Neural Network in building predictive models.
Performed Text analytics on unstructured email data using Natural language processing tool kit (NLTK).
Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data.
Performed feature engineering, performed NLP by using some techniques like Word2Vec, BOW (Bag of Words), Tf-Idf, and Doc2Vec.
Used AWS S3, AWS lambda, AWS EC2 for data storage and models' deployment.
Performed Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
Used Convolutional Neural Network (CNN) to perform image classification and object detection.
Acquired image dataset of products from different data sources and aggregated into one dataset on Amazon Redshift.
Converted unstructured pure text consumer comments data to structured dataset using NLP techniques and feature engineering.
Built predictive models including support Vector Machine, Decision tree, and XGBoost, Naive Bayes Classifier, Neural Network plus ensemble methods of the models to evaluate how the likelihood to recommend of customer groups would change in different set of service by using python Scikit-learn.
Involved in various pre-processing phases of text data like Stemming, Lemmatization and converting the raw text data to structured data.
Implemented a Python-based distributed random forest via PySpark and MLlib.
Used Tableau to convey the results by using dashboards to communicate with team members and with other data science teams, marketing and engineering teams.
Created and constructed data models, tools, unique visualizations, and dashboards in Tableau that tell customers about outcomes, created a compelling narrative using the data, and improved their performance.

Environment: Python, NumPy, Pandas, Seaborn, Matplotlib, Scikit-learn, Machine Learning, Word2Vec, BOW, Tf-Idf, Doc2Vec, AWS S3, AWS Lambda, AWS EC2, Amazon Redshift, Support Vector Machine, Decision Tree, XGBoost, Neural Network, Python, Scikit-learn, PySpark, MLlib, Tableau.

EDUCATION:
Master of Science (MS) in Robotics in Electrical and Computer Engineering, Northeastern University | Boston, MA
Bachelor of Engineering (B.E) in Electrical Engineering, Thapar University | Patiala, India
Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree rlang information technology microsoft procedural language California Delaware Massachusetts

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];2882
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: