Home

Bhagya - Sr Data Scientist
[email protected]
Location: Frederick, Maryland, USA
Relocation: Yes
Visa: H1B
Bhagya
Sr Data Scientist
+1 669-842-3737
[email protected]
Frederick, MD
Yes
H1B

________________________________________
CAREER SUMMARY
Around 9+ Years of IT experience in the areas of Data Science, Data Analysis, Data Engineer.
Expertise in transforming business requirements into analytical models, developing data mining, designing algorithms, and reporting solutions that scale across the massive volume of structured and unstructured data.
Skilled in Data preparation, Exploratory analysis, Feature engineering, parameter fine-tuning in supervised Machine Learning models.
Proficient at building robust Machine Learning, Deep Learning models, Convolution Neural Networks (CNN), Recurrent Neural Networks (RNN), LSTM using TensorFlow and Keras. Adept in analyzing large datasets using Apache Spark, PySpark, Spark ML and Amazon Web Services (AWS).
Experience working in Azure cloud including azure Data Lake Gen2 for Data Store, Azure Data Factory, Azure Devops and Azure Databricks.
Experienced in implementing linear and logistic regression, classification modeling, decision-trees, cluster and Time Series Analysis, NLP, Dimensionality Reduction, CNN, ANN, Random forest, XGBoost, Naive Bayes, SVM, Clustering, Association Rule Mining using Python programming.
Strong knowledge and skills in statistical methodologies such as experiment design, hypothesis test, Z-test, T-test, Chi-square independence test, and ANOVA.
Experience in Text Mining of cleaning and manipulating text and developing topic modeling using TF/IDF, Word2Vec, Glove2Vec, lemmatization, stop words, n-grams.
Experienced in using various packages in Python like Pandas, NumPy, SciPy, Scikit-learn, Matplotlib.
Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like MYSQL.
Developed data visualizations using Python, R and creating dashboards using tools like Tableau, Power BI.
Effective team player with strong communication and interpersonal skills, possessing a strong ability to adapt and learn new technologies and new business lines promptly.
Good industry knowledge, analytical & problem-solving skills, and ability to work well within a team as well as an individual.

TECHNOLOGY STACK
Programming Python, R, SQL
Machine Learning Linear Regression (Rigid, Lasso) & Logistic Regression, Classification (KNN, SVM), Decision Trees, Time series analysis, Hierarchical Clustering, K-Means Clustering, Ensemble methods (Random Forest, Ada Boost, Gradient Boost, XGBoost), Association Rule Learning, Dimensionality Reduction Technique (PCA, LSA), ANN, Deep Learning
Tools and Utilities Pandas, Scikit-learn, NumPy, Keras, Pyspark, dplyr, pymysql
Big Data Spark Core, Spark SQL, HIVE, HDFS, Sqoop
Report & Visualization Power BI, Tableau, matplotlib & seaborn (Python Packages)
Database
Cloud Computing MYSQL, Sqlite, Mongodb
Microsoft Azure









PROFESSIONAL EXPERIENCE
Sr Data Scientist
Legal & General America, Frederick, MD March 2022 Present
Responsibilities:
Involved in data ingestion to Azure data lake, Azure Databricks by building pipelines in Azure Data Factory.
Analyzed data using SQL,Pyspark, Python and presented analytical reports to management and technical teams.
Collecting data through different sources and analyzing business results or by setting up and managing new schemas
Transferring data into a new format to make it more appropriate for analysis, creating new, experimental frameworks to collect data and building tools to automate data collection
Visualize the data for Exploratory analysis where data profiling, data wrangling is done
Transform the raw data into more useful and efficient format
Prepared R markdown (Documentation)
Exploratory Data Analysis (Visualization)
Data munging (aggregated the data and added new features to our data)
Joined required tables in SQL Server by unique identifier as primary key
Created additional features (based on conditional statement) for the combined dataset to identify the response variable.
ETL process has been performed where it extracts, transforms, and loads data from multiple sources to the database.
Performing the Post pruning techniques in machine learning to reduce the complexity of the final classifier which results in improving the predictive analysis by reducing over fitting, using python libraries(sklearn).
Aggregated and summarized the tables by using SQL queries
The train data is prepared after performing the data operations
Development and execution of Machine learning algorithm procedures for the train data
Acts as a representative from the business area and to identify and refine their requirements for information, informing the creation of a data transformation and predictions model.
Understand and analyze existing systems and related processes to adeptly present analysis findings on topics such as utilization, exceptions, and modifications.
Make recommendations for business decisions based on data analysis as well as recommend changes for process.
Developed classification models like Gradient Boost, XGBoost, random forest with multiple parameter tunings to classify outcome.
Model evaluation metrics like accuracy, precision, recall and F1-scores are used to choose a better model for classification.
Identified procedural areas of improvement through customer data to help improve the profitability.
Using various clustering techniques in Python, identified groups of states where our national underwriting models were underperforming and made improvements to increase their productivity.

Data Scientist
AuSuM Systems, Miami Beach, FL July 2020 Feb 2022
Responsibilities:
Developed Predictive Analytics using Pyspark and spark SQL on databricks to extract, transform and uncover insights from the raw data.
Build, Train and Development and execution of Machine learning algorithm procedures, along with validation documentation of process efficiency.
Responsible for estimating the cluster size, monitoring and troubleshooting of the spark databricks Cluster.
Acts as a representative from the business area and to identify and refine their requirements for information, informing the creation of a data transformation and predictions model.
Understand and analyze existing systems and related processes to adeptly present analysis findings on topics such as utilization, exceptions, and modifications.
Extracted transaction data by pyspark and analyzed the data to forecast the areas (SK Learn/MLLib) with revenue in a 95% accuracy rate.
Make recommendations for business decisions based on data analysis as well as recommend changes for process.
Involved in Exploratory data analysis using Descriptive statistics and Data visualization to determine the baseline MLAs.
Involved in building and automating the robust model with very good accuracy for the given customer base.
Analyzed assessing customer consuming behaviors and discovering the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
Coordinated the execution of A/B tests to measure the effectiveness of a personalized recommendation system.
Applied Wilcoxon sign test to stock performance data for pre-acquisition and post-acquisition for different sectors to find the statistical significance in R programming
Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
Implementing various machine learning algorithms on humongous data in Pyspark using MLLib.
Utilized SQL and Hive QL to query, manipulate data from various data sources including Oracle and HDFS, while maintaining data integrity.
Worked on data cleaning, data preparation, and feature engineering with Python including Numpy, SciPy, Pandas, Matplotlib, Seaborn, and Scikit-learn.
Predicted the claim severity to understand future loss and ranked the importance of features.
Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting.
Identifying internal and external information sources, building effective working relationships with subject matter experts across research groups within the firm and the external marketplace Involved in Data preparation using various tasks like.
Data reduction - Obtains reduced representation in volume but produces the same or similar analytical results Developed logistic regression models to predict subscription response rate based on customers' variables like past transactions, response to prior mailings, promotions, demographics, interests, etc.
Data discretization - transform quantitative data into qualitative data.
Data cleaning - Fill in missing values, handle the noisy data, identify, or remove outliers and resolve inconsistencies.
Data integration - Integration of multiple databases, data cubes, or files.
Data transformation - Normalization, standardization, and aggregation.
Designed dashboards with Tableau and D3.js and provided complex reports, including summaries, charts, and graphs to interpret findings to the team and stakeholders.
Developed classification models like Gradient Boost, XGBoost, random forest with multiple parameter tunings to classify outcome.
Model evaluation metrics like accuracy, precision, recall and F1-scores are used to choose a better model for classification.
Identified procedural areas of improvement through customer data to help improve the profitability.
Using various clustering techniques, identified groups of states where our national underwriting models were underperforming and made improvements to increase their profitability by 5%.
Used Python 3 (NumPy, SciPy, pandas, scikit-learn, seaborn) to develop variety of models and algorithms for analytic purpose.
Visualized graphs and reports using matplotlib, seaborn and panda packages in python on datasets for analytical models to know the missing values, outliers, correlation between the features.


Data Scientist/Analyst
KEMET Electronics Corporation
Fort Lauderdale, FL Jun 2019 - Aug 2019
Responsibilities:
Develop key business metrics used to evaluate performance, compare results, and track relevant data to improve business outcomes.
Worked on Customer Segmentation by using Machine-learning Algorithm.
Performed data wrangling using python scripts on a large volume of data to build key datasets. Developed exploratory data analysis from data sets and derive key insights discussing with domain experts for decision making.
Selecting features, building, and optimizing classifiers using machine learning techniques
Enhancing data collection procedures to include information that is relevant for building analytic systems.
Collaborated with 4 team members in a cross-functional setting to improve machine learning model precision from 70% to 85%
Worked closely with the Vice-President and discussed methodical and logical approaches.
My role was to create an automated R Script to find the segment of the unknown customers. In this project, we were using Microsoft Azure ML studio to predict the segment of unknown customers.
The objective of segmenting customers is to conclude how to identify customers in each portion to augment the estimation of every customer to the business.
Initially, we imported the data into Azure ML, split it into a testing and training set, applied a multi-class decision tree algorithm to the training set of the known customers, and predicted the testing set of the known customers which achieved good accuracy with some False Positives.
Auto-scaling decisions were done based on the upgrade/downgrade predictions on the model built from past utilization metrics.
Evaluated classification models to measure the performance considering metrics like Accuracy, Precision, Recall, Log loss function and ROC curves.
Developed correlation heatmaps, distribution/modality analysis using density plots and histograms, outlier analysis using box plots, scatter plots (cluster visualization).
Secondly, we created the same model in R Studio, my team and I made some decision rules according to company revenue, cost price, units, etc. and trained the model, predicted the testing set of customers by using the Decision tree algorithm which achieved the same accuracy (Approx.) as the Azure ML studio
Lastly, we tried to attach the R Script to the Azure ML studio and applied Multi-Class Decision Tree Algorithm
This combination gave better results than earlier ways. This final model was used for testing new 189,000 unknown customers to predict the segment in which they belong.
Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm.
Ensured data accuracy and treated missing values.
Model repositories were used for maintaining and deploying models and saved tuning parameters for multiple Machine learning models for reuse.
Worked with a cross-functional team to integrate machine learning models into the ongoing business process without causing delays to existing job flow and downstream application.
Delivered detailed and accurate reports for sales partners with key performance indicators, thereby regulating and balancing business operations to increase revenue.

Associate Data Scientist /Analyst
V2Value Biz Solutions Pvt. Ltd
Hyderabad, India Jan 2017- July 2018
Collecting data through different sources and analyzing business results or by setting up and managing new schemas
Transferring data into a new format to make it more appropriate for analysis, creating new, experimental frameworks to collect data and building tools to automate data collection
Visualize the data for Exploratory analysis where data profiling, data wrangling is done
Transform the raw data into more useful and efficient format
Prepared R markdown (Documentation)
Exploratory Data Analysis (Visualization)
Data munging (aggregated the data and added new features to our data)
Joined required tables in SQL Server by unique identifier as primary key
Created additional features (based on conditional statement) for the combined dataset to identify the response variable
ETL process has been performed where it extracts, transforms, and loads data from multiple sources to database
Aggregated and summarized the tables by using SQL queries
The train data is prepared after performing the data operations
Development and execution of Machine learning algorithm procedures for the train data
Acts as a representative from the business area and to identify and refine their requirements for information, informing the creation of a data transformation and predictions model.
Understand and analyze existing systems and related processes to adeptly present analysis findings on topics such as utilization, exceptions, and modifications.
Make recommendations for business decisions based on data analysis as well as recommend changes for process.
Developed classification models like Gradient Boost, XGBoost, random forest with multiple parameter tunings to classify outcome.
Model evaluation metrics like accuracy, precision, recall and F1-scores are used to choose a better model for classification.
Identified procedural areas of improvement through customer data to help improve the profitability.
Using various clustering techniques in Python, identified groups of states where our national underwriting models were underperforming and made improvements to increase their productivity.

Data Analyst
V2Value Biz Solutions Pvt. Ltd
Hyderabad, India Jan 2015 Dec 2016
Collecting data through different sources and analyzing business results or by setting up and managing new studies.
Transferring data into a new format to make it more appropriate for analysis, creating new, experimental frameworks to collect data and building tools to automate data collection.
Development and execution of Machine learning algorithm procedures, along with validation documentation of process efficiency.
Acts as a representative from the business area and to identify and refine their requirements for information, informing the creation of a data transformation and predictions model.
Understand and analyze existing systems and related processes to adeptly present analysis findings on topics such as utilization, exceptions, and modifications.
Make recommendations for business decisions based on data analysis as well as recommend changes for process.
Developed classification models like Gradient Boost, XGBoost, random forest with multiple parameter tunings to classify outcome.
Model evaluation metrics like accuracy, precision, recall and F1-scores are used to choose a better model for classification.
Identified procedural areas of improvement through customer data to help improve the profitability.
Using various clustering techniques in Python, identified groups of states where our national underwriting models were underperforming and made improvements to increase their profitability by 5%.




CAD/CAM Engineer
Techno soft Solutions
Visakhapatnam, AP, India Jan 2014- Sep 2014
Import Customer data into various CAM systems.
Perform design rule checks and edit data to comply with manufacturing guidelines.
Create array configurations, route, and test programs, penalization, and output data for production use.
Work with process engineers to evaluate and provide strategy for advanced processing as needed.
Itemize and correspond to design Issues with customers
Read and interpret blueprints, sketches, drawings, routing instructions, manuals, and specifications.
Understand GD&T and apply it during the programming process.
Use of CAD to create part models and geometry as needed.
Use of CAM programming on multi-axis mill-turn machines.
Use of CGTech Veri cut for NC file verification prior to manufacturing release.
Determine optimal tooling and cutting parameters based on material and setup.
Understand and implement work holding for manufacturing processes.
Plan and execute manufacturing processes individually or as a team.
Work individually or as a team on continuous improvement efforts.
Provide Engineering and Management information that affects part quality or schedule.
Maintain revision and quality control on programs, tool sheets, and setup documentation.

PUBLICATIONS
Published paper to IEEE SoutheastCon 2020, on Combinatorial Formulas Related to An Efficient Representation of Permutations
Published poster to the 33rd International FLAIRS Conference 2020 on "Analyzing Deep Learning Image Classification of High-Performance Liquid Chromatography Chromatograms with Metabolomics"


EDUCATION
St. Thomas University, Miami-FL, 2020
Master of Science in Big Data Analytics
Raghu Engineering College, India, 2013
Bachelor of Technology in Mechanical Engineering
Keywords: machine learning javascript business intelligence rlang information technology Florida Maryland North Carolina

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];210
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: