Kunal - Data Scientist |
[email protected] |
Location: Remote, Remote, USA |
Relocation: Yes |
Visa: OPT EAD |
KUNAL
[email protected] || 817-678-0872 Only Corp to Corp PROFESSIONAL SUMMARY: 5+ years of experience in Data Analysis, Machine Learning, Data Mining with large datasets of Structured and Unstructured Data, Data Acquisition, Data Validation, Predictive Modeling, Classification, Data Visualization, and discovering meaningful business insights. Expertise in all aspects of the Data Science Life Cycle (DSLC) from requirement analysis, model design, development, coding, testing, implementation, and maintenance. Proficient in Machine Learning algorithms and Predictive Modelling including Regression Models, Decision Trees, Random Forests, Sentiment Analysis, Na ve Bayes Classifier, SVM, and Ensemble Models. Experience in analyzing data and providing insights with R Programming and Python. Expertise in utilizing various Python packages, including Pandas, NumPy, SciPy, Scikit-learn, and Matplotlib. Hands-on experience in implementing LDA, and Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Neural Networks, and Principal Component Analysis. Hands on experience with R packages and libraries like ggplot2, Shiny, h2o, dplyr, reshape2, plotly, RMarkdown, ElmStatLearn, caTools etc. Experience in Python with a focus on developing, validating, evaluating, deploying, and optimizing the machine learning models that support many aspects of the business. Expertise in working with cloud technologies such as MS Azure, Google Cloud Platform (GCP), and Amazon Web Services (AWS). Experience in Artificial Intelligence (AI) & Deep Learning techniques such as Convolutional Neural Network (CNN) for Computer Vision, Recurrent Neural Network (RNN), Deep Neural Network with applications of Backpropagation, Stochastic Gradient Descent (SGD), Long Short-Term Memory (LSTM) and Continuous Bag of words, Text Analytics. Expertise in leveraging Artificial Intelligence (AI) techniques, including Linear and Logistic Regression, Classification Modeling, decision trees, Principal Component Analysis (PCA), Cluster, and Segmentation analyses. Proficient in leveraging Google Cloud Platform (GCP) services such as Big Query, Dataflow, and Dataproc for large-scale data processing and analysis. Proficient in utilizing Google Cloud's ML APIs for tasks such as natural language processing, image analysis, and sentiment analysis. Experienced in integrating GCP services with other Google Cloud products like Google Workspace for seamless collaboration and productivity. Hands-on experience in working with Amazon Web Services (AWS) Cloud platform and its features which includes AWS - EC2, ECR, VPC, RDS, EBS, S3, CloudWatch, Cloud Trail, Cloud Formation and Autoscaling, etc. Cloud computing implementation experience using HDInsight, Azure Data Lake (COSMOS), Azure Data Factory, Azure Machine Learning& Power Shell scripting. Skilled in designing and implementing machine learning models on GCP using TensorFlow, AutoML, and AI Platform. Expertise in building Supervised and Unsupervised Machine Learning experiments using Microsoft Azure utilizing multiple algorithms to perform detailed predictive analytics and building Web Services models for all types of data: continuous, nominal, and ordinal. Profound understanding of GCP's storage solutions including Cloud Storage and Bigtable for efficient data storage and retrieval. Hands-on experience with Google Cloud's managed services like Pub/Sub for real-time messaging and Cloud Functions for serverless computing. Experience in text mining and topic modeling using NLP & Neural Networks, tokenizing, stemming, and lemmatizing, tagging part of speech using Text Blob, Natural Language Toolkit (NLTK), and Spacy while building Sentiment Analysis. Experience working in MS Azure cloud, utilizing services such as Azure Data Lake Gen2, Azure Data Factory, Azure DevOps, and Azure Databricks. Knowledge of Natural Language Processing (NLP) and Multi-layer perceptron algorithm and Text Mining. Demonstrated expertise in utilizing Google Analytics to analyze website traffic, user behavior, and engagement metrics. Proficient in setting up and configuring Google Analytics accounts, including the creation of goals, custom dimensions, and tracking codes. Experience in Creating ETL mappings using Informatica to move Data from multiple sources like Flat files, and Oracle into a common target area such as Data Warehouse. Solid understanding of Data Modelling, Data Collection, Data Cleansing, Data Warehouse/Data Mart Design, ETL, BI, OLAP, and Client/Server applications. Experience in writing PL/SQL statements - Stored Procedures, Functions, Triggers, and packages. Involved in creating database objects like tables, views, procedures, triggers, and functions using T-SQL to provide definition, and structure and to maintain data efficiently. Experience in designing star schema, Snowflake schema for Data Warehouse, and ODS architecture by using tools like Erwin Data Modeler, Power Designer, Embarcadero E-R Studio, and Microsoft Visio. Hands-on learning with different ETL tools to get data in shape where it could be connected to Tableau through Tableau Data Extract. Hands-on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database. Experience in creating impactful data visualizations using Python and R and developed dashboards using tools like Tableau and Power BI. Possesses excellent industry knowledge, and analytical and critical thinking skills, with the ability to excel both in team collaborations and individual initiatives. A resolute collaborator with effective communication and interpersonal skills, capable of adapting and quickly learning modern technologies and business lines. TECHNICAL SKILLS: Operating System Windows, Unix, Linux Languages R, Python, SQL, UNIX shell scripting Python Python (TensorFlow, OpenCV, PyTorch, NumPy, SciPy, Scikit-learn, Matplotlib, Pandas, Keras), R Data Science Tools MATLAB, Jupyter Notebook, VScode, ML OPS, Deep Learning, RStudio. Machine Learning Linear Regression, Logistic Regression, Gradient boosting, Random Forests, Maximum likelihood estimation, Clustering, Classification Association Rules, K-Nearest Neighbors (KNN), K-Means Clustering, Decision, Tree (CART & CHAID), Neural Networks, Principal Component Analysis Weight of Evidence (WOE) and Information Value (IV), Factor Analysis Sampling Design, Time Series Analysis Data modeling Tools Erwin, ER/Studio, Star-Schema modeling, Snowflake Schema modeling Database and Big Data Oracle, MS Access, SQL Server, Sybase and DB2, Teradata, Hive, Cassandra, MongoDB, Hadoop, Spark, Databricks Cloud Technologies Amazon Web Services (AWS), MS Azure, Google Cloud Platform (GCP) AWS EC2, VPC, ECR, RDS, EBS, S3, CloudWatch, Cloud Trail, Cloud Formation and Autoscaling GCP TensorFlow, Auto ML, AI Platform, Cloud Storage, Bigtable, Dataflow, Pub/Sub, Cloud Functions, and Data Studio MS Azure HDInsight, Azure Data Lake (COSMOS), Azure Data Factory, Azure Machine Learning, Azure Data Lake Gen2, Azure Data Factory, Azure DevOps, Azure Databricks BI Tools Tableau, Tableau server, Tableau Reader, Power BI, Crystal Reports DB Applications Toad for Oracle, Oracle SQL Developer, MySQL, SQL Server, MS Word, MS Excel, MS PowerPoint, Teradata Version Control Tool Git, Gitlab, GitHub, MLflow, Kubeflow, DVC Methodologies SDLC - AGILE, SCRUM, DevOps, TDD PROFESSIONAL EXPERIENCE: Client: Charming Charlie, Houston, TX Jan 2023 Till Date Role: Data Scientist Description: Charming Charlie is a one-of-a-kind source of style that's been inspiring women to live more colorfully. Charming Charlie specializes in retail, fashion, jewelry, accessories, and trends & offers a wide array of women s apparel and fashion accessories, beauty, gifts, and more, all ingeniously arranged by color; making that perfect look fun and easy to find. Responsibilities: Built predictive machine learning, simulation, and/or Statistical Models using Python. Used R, and SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random Forest models, Decision trees, and Support Vector Machines for estimating the risks. Created advanced Artificial Intelligence (AI) and ML models, leveraging innovative techniques and algorithms to extract actionable insights from complex datasets. Performed Data Cleaning, features scaling, and features engineering using Pandas, and NumPy packages in Python. Integrated and automated execution runs/ workflows across multiple platforms seamlessly such as Amazon Web Services (AWS) & Google Cloud Platform (GCP). Involved in designing and deploying multi-tier applications using all the AWS services like EC2, S3, DynamoDB, SNS, SQS, and IAM. Applied Artificial Intelligence (AI)-driven methodologies to mitigate risk factors by conducting thorough analyses of financial and statistical data. Managed end-to-end data processing pipelines using Amazon Web Services (AWS) services such as AmazonS3, AWS Glue, and Amazon EMR. Worked with Google Cloud IAM (Identity and Access Management) for ensuring data security and access control. Conducted post-pruning techniques in machine learning, optimizing classifier complexity and improving predictive analysis using Python libraries like sklearn. Worked on AWS S3 buckets and intra-cluster file transfer between PNDA and s3 securely. Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on AWS (S3/EC2) and Django platform for the company's core business. Deployed and managed data lakes on Google Cloud Platform (GCP) using services like Cloud Storage and BigQuery for scalable data storage and analysis. Developed and executed machine learning algorithms, including Gradient Boost, XGBoost, and random forest, with meticulous parameter tuning for precise classification outcomes. Built Continuous Integration/Continuous Deployment (CI/CD) pipelines utilizing GitHub Actions to streamline and automate the testing and deployment workflow. Created deep learning models using TensorFlow and Keras by combining all tests as a single normalized score. Used XGB classifier if the feature is a categorical variable and XGB regressor for continuous variables and combined it using Feature Union and Function Transformer methods of Natural Language Processing. Performed Data collection, Data cleaning, Data visualization, and feature engineering using Python libraries such as pandas and NumPy, performed Deep feature synthesis, and extracted key statistical findings to develop business strategies. Creating S3 buckets also managing policies for S3 buckets and utilizing S3 buckets and Glacier for storage and backup on AWS. Conducted A/B tests and conversion rate optimization based on insights from Google Analytics. Worked on Natural Language Processing with NLTK module of Python for application development for automated customer response. Designed normalized 3NF data models for Operational Data Stores (ODS) and Online Transaction Processing (OLTP) systems, as well as Dimensional Data Models using Star and Snowflake Schemas, incorporating Artificial Intelligence (AI) - driven optimizations for improved data processing and decision-making. Implemented several statistical methodologies like Classification (K Nearest Neighbors (KNN), support vector machines, decision trees, Na ve-Bayes classifier) and Regression models (multiple regression and regression trees, SVR, and k-means clustering in Python, and R. Created monitors, alarms, and notifications for EC2 hosts using Cloud Watch, Cloud Trail, and SNS. Leveraged AWS Lambda functions for automating data transformation tasks, resulting in streamlined data ingestion and improved data quality. Performed Data Visualization, designed Dashboards with Power BI, and generated complex reports including charts, and summaries. Environment: Python, R, SQL, Pandas, NumPy, Amazon Web Services (AWS), Google Cloud Platform (GCP), Google Cloud IAM, Cloud Storage, BigQuery, GCP, VPC, TensorFlow, Keras, scikit-learn, NLTK, Power BI, Machine Learning, Statistical Modeling, Artificial Intelligence (AI), AWS Services (EC2, S3, DynamoDB, SNS, SQS, IAM, Glue, EMR), Data Lakes, Deep Learning, R, A/B Testing, NLP, Data Modeling (3NF, Star Schema, Snowflake Schema). Client: Logility, Atlanta, GA Sep 2021 - Dec 2022 Role: Data Scientist Description: Logility is a trusted innovation partner committed to enabling its customers to operate the most resilient and sustainable supply chain that will make them leaders in their industry year after year. Logility Digital Supply Chain Platform continues this history of industry-leading innovation bringing the latest developments in artificial intelligence and machine learning to supply chain organizations around the world. Responsibilities: Involved in distinct phases of Data Acquisition, Data Collection, Data Cleaning, Model Development, Model Validation, Model Monitoring, and Visualization to deliver solutions. Developed various Machine Learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas, NumPy, sea born, Matplotlib, and Scikit-learn in Python. Worked on Azure SDK to make HTTP calls to MS Azure services like Azure Storage, Azure Key Vault, and Azure Service Bus. Developed classification models like Gradient Boost, XGBoost, and Random Forest with multiple parameter tunings, utilizing model evaluation metrics for optimal selection. Collaborated with MLOps engineers to integrate new models and algorithms. Used Tensor Flow to create Deep Convolution and Recurrent Neural Networks. Involved in working with Big Data with Azure Databricks, Azure Data Lake Storage, and Azure HDInsight. Proficient in leveraging big data technologies, Hadoop, and Spark, for distributed data processing, enabling scalable analytics solutions and improving data processing speed and efficiency in large-scale projects. Integrated R Script with Azure ML Studio, applying multi-class decision tree algorithms for improved results. Analyze Data and Performed Data Preparation by applying a historical model on the data set in AZUREML. Extracted the data required for building models from Azure SQL Database. Performed data cleaning, including transforming variables and dealing with missing values, and ensured data quality, consistency, and integrity using Pandas and NumPy. Involved in automating end-2-end model refresh process using COSMOS, Azure Data Factory (ADF) pipelines, Azure ML, and PowerShell scripting. Performed Data Cleaning, features scaling, and features engineering using pandas and NumPy packages in Python and built models using deep learning frameworks. Automated R Script creation for segmenting unknown customers using Microsoft Azure ML Studio, achieving accurate predictions while effectively managing a minimal rate of False Positives. Enhanced Python scripts to incorporate Artificial Intelligence (AI) capabilities for matching training data with the database stored in MS Azure Cloud Search. Leveraged statistical learning and regression analysis, alongside data manipulation tools like Pandas and NumPy, to forecast product lifecycle duration across various domains, integrated with machine learning libraries such as Scikit-learn for enhanced predictive accuracy and operational efficiency. Implemented decision rules based on company revenue, cost price, units, etc., and trained models in both Azure ML Studio and R Studio. Collaborated with software engineers to integrate ML models into product workflows. Creating data pipelines using big data technologies like Azure Data Factory, Azure Databricks, etc. Worked on different data formats such as JSON, XML and performed Machine Learning algorithms in Python. Used RStudio, involved in scripting or programming for statistical and data analysis tasks. Worked on various Machine Learning algorithms like linear regression, logistic regression, Decision trees, random forests, K-means clustering, Support vector machines, and XGBoosting based on client requirements. Developed Machine Learning models using recurrent neural networks - LSTM for time series predictive analytics. Conducted exploratory data analysis and data visualization using Azure Data Explorer, Power BI, and Python libraries like Matplotlib and Seaborn to gain insights and identify patterns in large-scale datasets. Environment: Machine Learning Models (Logistic Regression, KNN, Gradient Boosting), Python, Pandas, NumPy, seaborn, Matplotlib, Scikit-learn, MS Azure, Azure SDK, Azure Storage, Azure Key Vault, Azure Service Bus, Azure Databricks, Azure Data Lake Storage, Azure HDInsight, Hadoop, Spark, PySpark, Azure ML Studio, TensorFlow, Deep Convolutional Neural Networks (CNN), R, Azure SQL Database, COSMOS, Azure Data Factory (ADF), PowerShell Scripting, RStudio, Power BI. Client: NKGSB BANK, Ahmedabad, Gujarat Jan 2019 - Aug 2021 Role: Data Scientist Description: The NKGSB Co-operative Bank is a Multi-State Scheduled Bank, established in India. The bank serves its customers through 99 branches across the states of Maharashtra, Goa, Gujarat as well as Karnataka. NKGSB BANK provides services such as deposits, demand draft facilities, fund transfer, forex, and loans among many others. Responsibilities: Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export using Python. Used RStudio for Integrated development environment (IDE) for the R programming language, Statistical computing, and graphics. Collaborated with Amazon Web Services (AWS) services including EC2, S3, RDS, Lambda, IAM, CloudFormation, and CloudWatch. Devised SQL scripts - Stored Procedures, Functions, Triggers, Views, and packages. Made use of Indexing, Aggregation, and Materialized views to optimize query performance. Setting up and managing virtual private clouds (VPCs) for secure networking in AWS. Generated the reports and visualizations based on the insights using Tableau and developed dashboards for the company insight teams. Leveraged Python and R to analyze customer data, applying collaborative filtering and content-based filtering techniques to uncover correlations in customer behavior. This enabled precise user segmentation and informed targeted process and product improvements, optimizing user engagement and satisfaction. Implementing disaster recovery and backup strategies using Amazon Web Services (AWS) services like S3, Glacier, and AWS Backup. Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression, and Linear Regression using Python to determine the accuracy rate of each model. Worked with Dimensional Data modeling to deliver Multi-Dimensional STAR schemas. Developed dimensions and fact tables for data marts like Monthly Summary, and Inventory data marts with various Dimensions like Time, Services, Customers, and policies. Involved in setting up a virtual environment in Linux, writing Linux script for process automation. Validated the Machine learning classifiers using ROC Curves and Lift Charts. Used pandas, NumPy, Seaborn, matplotlib, Scikit-learn, SciPy, NLTK in Python for developing various Machine Learning Algorithms. Worked on Machine Learning algorithms like Classification and Regression with the KNN Model, Decision Tree Model, Na ve Bayes Model, Logistic Regression, SVM Model, and Latent Factor Model. Performed Data Cleaning, features scaling, and features engineering using pandas and NumPy packages in Python. Created and maintained reports to display the status and performance of deployed models and algorithms with Tableau. Environment: Python, pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, SciPy, NLTK, RStudio, R, PL/SQL, AWS, EC2, S3, RDS, Lambda, IAM, CloudFormation, CloudWatch, VPC, VPN, Glacier, Tableau, Machine Learning Algorithms, Statistical Modeling, STAR Schemas, KNN Model, Support Vector Machine (SVM). EDUCATION: Master of Science in Computer Science, The University of Texas at Arlington, Arlington, TX. Bachelor of Engineering in Computer Engineering, L. D College of Engineering, Ahmedabad, Gujarat. Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree database rlang information technology microsoft procedural language Colorado Delaware Georgia Texas |