Home

Krishna - Sr. Data Scientist / GenAI / ML
[email protected]
Location: Denton, Texas, USA
Relocation: Yes
Visa: GC
Krishna M
Sr Data Scientist /GenAI/ML
[email protected]
+1 (940) 226-2019


Professional Summary:

Senior Data Scientist and Data Analyst with 10+ years of professional experience, performing Statistical Modelling, Data Mining, and implementing Machine Learning (ML), Artificial Intelligence (AI), Tableau, Data Exploration and Data Visualization of structured and unstructured datasets and Deep Learning models based on business understanding to deliver insights that drive key business decisions to provide value to the business.
Experience in Artificial intelligence (AI) techniques such as natural language processing (NLP), Large Language Models (LLM) computer vision, and machine learning algorithms are used to process and clean raw data.
Integrated Docker containers with Kubernetes to orchestrate and scale a high-availability web application.
Utilized GCP resources namely Big Query, cloud composer, computer engine, Kubernetes cluster and GCP storage buckets for building the production ML infrastructure.
Expertise in building data pipelines for automating the data process and consistency in processing.
Solid understanding of Data Modelling, Data Collection, Data Cleansing, Data Warehouse/Data Mart Design, Snowflake, ETL, BI, OLAP, Client/Server applications.
Proficient in handling data using data handling libraries such as Pandas, NumPy, nltk, TextBlob, Gensim, and scikit-learn packages.
Hadoop and Spark are built using Java.
Proficient in visualizing the data with the help of python matplotlib, seaborn, and plotly packages.
Expertise in statistical analysis of the data with the help of hypothesis testing methods such as One-sample Z-test, T-test, Two-sample hypothesis testing, Chi-square testing, and Anova testing.
Adequate exposure to Azure cloud environment and dealt with S3, DynamoDB, EC2, and IAM.
Design, develop, and implemented GENAI, LLM, and NLP models using Machine learning (ML Infrastructure).
Sound expertise in web scraping using python packages such as requests, bs4, lxml, and selenium.
Ability to write optimized complex SQL queries for extracting data using group by, window functions, case statements, joining, etc.
Understanding of Azure CloudFormation, RDS, SNS, SQS, Lambda functions, VPC.
Apache Spark is an open-source distributed computing system that provides API.
Utilize GENAI models to generate new creative text formats and answer all questions.
Adaptive to learning new technologies and sound knowledge of deep learning frameworks and packages such as Keras, PyTorch, TensorFlow, etc.
Understanding of various Machine learning (ML Infrastructure) algorithms such as Linear Regression, Lasso & Ridge, Logistic regression, KNN, na ve Bayes, Decision Tree, Random Forest, Snowflake, K means, DBSCAN, XGBOOST, AdaBoost, Gradient Descent, Isolation Forest, Gaussian mixture models, SVM, SVC, matrix factorization, etc.
ML(Reinforcement) involves deploying Machine learning (ML Infrastructure) models into production environments, whether it's on-premises, in the cloud, or at the edge.
Understanding Azure CLI, python boto3 package to connect Azure with python.
Ability to handle natural languages processing (NLP) with the help of Nltk, Gensim, Large Language Models (LLM), Huggingface, TextBlob, etc.
Ability to write the python program using OOPS style by facilitating Inheritance, polymorphism, Encapsulation, and Abstraction.
Good Understanding and Experience with deep learning frameworks such as TensorFlow or PyTorch can be advantageous, especially for tasks involving complex data like images, text, or time-series data.
Basic Understanding of Docker, Kubernetes, and Jenkins (CI/CD).
Open-source database systems like PostgreSQL, MySQL, and provide powerful SQL capabilities.
Familiar with deploying pre-trained LLMs in production environments using frameworks such as TensorFlow, PyTorch, and Hugging Face's Transformers library, and integrating them into larger applications and pipelines.
It provides auto-scaling, monitoring, and integration with other GCP services.
Integrated Kafka with Spark Frameworks for real time data processing.
Skilled in performing data parsing, data manipulation and data preparation with methods including describing data contents.
Strong experience in the Analysis, design, Scala, development, testing and Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, Client/Server applications.
Technical Skills:

Programming Languages Python (3. x), C, R, HTML, CSS, Angular JS, Java
Algorithms Linear Regression, Lasso & Ridge, Logistic regression, KNN, na ve Bayes, Decision Tree, Random Forest, Apriori, K means, DBSCAN, XGBOOST, ADABOOST, Gradient Descent, Isolation Forest, Gaussian mixture models, SVM, Deep Learning Neural Networks, simulated annealing.
Natural Language packages Nltk, Gensim, Hugging face, TextBlobl, Google trans, natural language processing (NLP), Large Language Models (LLM),
Data Science Packages Pandas, NumPy, SciPy, Scikit-learn, xgboost, Keras, TensorFlow, PyTorch, Shap, ML ops.
Visualization Matplotlib, Seaborn, Plotly
Misc Flask, bottle, streamlit, requests, bs4, lxml, selenium
Cloud Azure (EC2, IAM, VPC, Cloud Formation, RDS, Lambda, SNS, SQS, S3)
Version Control Git
Big Data Hive, Pyspark
Operating System Windows, Linux (Ubuntu, Redhat)
Reporting Tools Tableau, Power BI, BI - (SSIS, SSRS, SSAS)
Databases SQL Server, SQL, Oracle, MySQL, SQL Lite, HBase, MongoDB, Cassandra, PostgreSQL, DynamoDB

Professional Experience

Client: US BANK Dallas, TX Jan 2023 - Present
Role: Sr. Data Scientist/GenAI/ML

Responsibilities:
Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.
Experience in Artificial Intelligence (AI) algorithms, machine learning (Reinforcement) and deep learning models are developed and trained to solve various predictive and classification tasks.
Interacting with architects solutions for data -visualization using tableau and Packages in Python.
GenAI models and techniques are used in unlocking creativity and solving complex problems.
Successfully delivered multiple NLP to building a chatbot that assists a customer to trouble shoot claim issues and recommend actions.
Building, publishing customized interactive reports, report scheduling and dashboards using Tableau server.
ML Ops leverages CI/CD pipelines to automate the process of building, testing, and deploying Machine Learning (ML Infrastructure) models.
Java can help to work more effectively whether it's writing custom MapReduce in Hadoop or developing Spark applications.
PyTorch and TensorFlow is most popular in open-source deep learning frameworks by AI.
Can work parallelly in both GCP and Azure Clouds coherently.
Utilizing Google SQL and MySQL to extract, manipulate, and analyze large datasets for actionable insights.
Developed and deployed GPT-based LLM and NLP models to automate legal document analysis, saving 30% of manual review time.
LLMs are used in AI systems designed to understand intricacies of human language and generate intelligent, creative responses to queries.
Developed data pipelines, transformations, and models using ETL to migrate data to Snowflake data warehouse.
Good Understanding and proficiency in NumPy for numerical computing and Pandas for data manipulation and handling, cleaning, and analysis of large datasets.
Responsible for operations and support of big data Analytics platform, Splunk and Tableau visualization.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, Bigquery.
Involved in Data ingestion to Azure Data Lake, Azure Databricks by building pipelines in Azure Data Factory.
Continuously exploring and mastering new technologies within the GCP, GenAI,
Experience in working with GenAI and Discriminative Machine Learning (Reinforcement) algorithms.
Used pandas, NumPy, seaborn, matplotlib, Scikit-learn, SciPy, NLTK in Python for developing various Machine Learning (ML) algorithms.
Data visualization skills are essential for communicating insights and findings from data analysis effectively.
AI models are unbiased and do not perpetuate existing societal biases present in the training data.
Proficient in using open source LLMs such as GPT-2 and BERT for various NLP tasks including text generation, LLM, text classification, sentiment analysis, and named entity recognition.
Utilized Apache Spark SQL with Python to develop and execute Big Data Analytics and Machine Learning (ML Infrastructure) applications, executed Machine Learning use cases under Spark ML and ML ops.
Designed and developed NLP models for sentiment analysis.
SQL which is used to interact with relational databases like PostgreSQL and MySQL.
Java is used in enterprise environments and many existing systems applications are built using Java.
Designed and provisioned the platform architecture to execute Hadoop and Machine Learning (ML Infrastructure) use cases under ML infrastructure, Azure, EMR, and S3.
Optimized query performance and snowflake architecture.
PyTorch's API is more Pythonic and intuitive, making it easier to write and debug code.
TensorFlow's, PyTorch's community are growing, and it has ecosystem of libraries and resources for deep learning research and development.
Worked on Machine Learning (Reinforcement) on large size data using Spark SQL MySQL and MapReduce.
Application of various Machine Learning (ML Infrastructure) algorithms and statistical modeling to identify Volume using the scikit-learn package in python, MATLAB.
Skilled in fine-tuning pre-trained LLMs to adapt them to specific tasks and domains, leveraging techniques such as transfer learning and domain adaptation.
Strong foundation in natural language processing (NLP) with generative models and text generation.

Environment: SQL/Server, Oracle, MS-Office, LLMs, TensorFlow, GenAI, NLP, Reinforcement, Snowflake, PyTorch, Java, Open Source, Google SQL, Scikit-learn, Machine Learning, Pandas, Artificial Intelligence, MySQL, PostgreSQL, GCP, Azure, NumPy, Machine Learning Infrastructure, Teradata, GPT, ML ops, Informatica, ER Studio, XML, R connector, Python, R, Tableau 9.2.

Client: - Terumo Heart Inc, Michigan Oct 2020 Nov 2022
Role: Sr. Data Scientist /Sr. Data Analyst
Project: Product Claim Analysis or Competitor Products Analysis(E-Commerce)

Responsibilities:
Worked with large amounts of structured and unstructured data.
Knowledge in Machine Learning (ML Infrastructure) concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.)
Experience in GenAI to monitoring systems track the performance of deployed models, detecting anomalies or drifts in data distribution that may affect model accuracy.
Responsible for building an Azure Cloud Enterprise Data Platform. Including establishing connection between Azure Resources (ADF, Databricks, ADLS GEN2, Storage layer access for ADF)
Worked in Business Intelligence tools and visualization tools such as BusinessObjects, Tableau, Chart IO, etc.
Strong Knowledge in GenAI models is deployed into production environments to make real-time predictions or automate decision-making processes.
NumPy is a fundamental package for numerical computing in Python and Pandas offers data structures and functions for manipulating structured data, making it indispensable for data preprocessing and analysis.
ML(Reinforcement) involves monitoring the performance of deployed models in real-time, tracking key metrics such as accuracy, latency, and throughput.
Java has its own set of libraries and frameworks for data analysis and machine learning (Reinforcement).
Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX and Python.
Utilize GCP's machine learning libraries like Tensor Flow for advanced feature engineering tasks.
Utilized GPT models to develop natural language processing (NLP) solutions, including chatbots and text tools.
PostgreSQL and MySQL are powerful capabilities for data preprocessing tasks such as data cleaning, filtering, joining
Experience in leading platform for building Python programs to work with human language data.
Specialize in large language model (LLM) is used in artificial intelligence (AI) program to recognize and generate text and other tasks.
Imported data from Azure S3 into Spark RDD, Performed transformations and actions on RDD's. Implemented Elastic Search on Hive data warehouse platform and worked with Elastic mapreduce and setup Hadoop environment in Azure EC2 Instances.
Extensive knowledge of Snowflake Database, including database schema and table structures.
Experienced in evaluating the performance of LLMs using metrics such as accuracy, precision, recall, F1-score, and perplexity, and adept at interpreting model outputs to assess their quality.
The ability to preprocess and clean messy data, handle outliers, and perform data wrangling tasks efficiently using Python libraries is crucial for real-world data analysis.
GenAI assists are evaluating the performance of machine learning (ML Infrastructure) using various metrics and techniques. Cross-validation, hyper parameter tuning, and model comparison help in selecting the best-performing model for a given task.
Collaborate with software and developers the predominantly use Java for application development.
Optimize resource usage and costs by leveraging GCP's pricing models and cost management tools.
ML(Reinforcement ) addresses the challenges of scaling machine learning workloads to handle increasing volumes of data and requests.
Experience in fact dimensional modeling (STAR, Snowflake), transactional modeling, and SCD.
Python's Pandas library or R can directly interface with PostgreSQL and MySQL databases.
Experienced in documenting LLM development processes, experiments, and results, and proficient in communicating technical concepts and findings to both technical and non-technical stakeholders.
Working with GCP combines expertise in data analysis, Machine learning (ML Infrastructure) and cloud computing to derive insights from data, build predictive models, and deploy them at scale to drive business value.
Artificial Intelligence (AI) algorithms help to explore and visualize the characteristics of the data.
Strong programming skills in Python, R, and experience with AI frameworks like TensorFlow and PyTorch.
Developed Predictive Analytics using Pyspark and Spark SQL on Databricks to extract, transform and uncover insights from the raw data.
Experienced in scaling LLM training and inference tasks using parallel and distributed computing frameworks such as TensorFlow's distributed training, PyTorch's Distributed Data Parallel, or distributed computing platforms like Apache Spark and Spark SQL.
Implemented the project in Linux environment.
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, GenAI, LLMs, Python, Reinforcement, Snowflake, PostgreSQL, TensorFlow, PyTorch, Google SQL, Machine Learning (ML Infrastructure), Artificial Intelligence, Azure, GPT, GCP, NumPy, Framework, Pandas, PyTorch, QlikView, ML ops, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE.

Client: Client: Verizon, Irving, Texas Jun 2018 Sep 2020
Role: Data Scientist/Data Analyst
Project: Production Scheduling and Optimization

Responsibilities:
Designed and developed a call quality control system using Python and Machine learning (ML Infrastructure) techniques such as Random Forest regressor using Scikit-learn.
Conducted feature selection using Grid Search CV to identify and select the most important features for predicting call quality.
Conducted data cleaning and preprocessing using Pandas and NumPy to ensure the data were suitable for analysis.
Used standardization techniques to normalize the data and ensure consistency across all call quality metrics.
ML Ops applies principles of version control to manage changes to machine learning (ML Infrastructure) models, datasets, and codebases.
Developed and utilized Scikit-Learn modeling pipelines to automate the entire modeling process and increase efficiency.
Utilized the SciPy library for statistical analysis and hypothesis testing to identify potential correlations and causations between call quality metrics.
Conducted regular testing and maintenance to ensure the call quality control system was always running optimally.
Conducted data exploration to identify potential outliers and anomalies in the call quality data.
Created documentation and training materials to enable stakeholders to understand and use the call quality control system effectively.
Involved in the development of real time streaming applications using PySpark, Apache Flink, Kafka, Hive on distributed Hadoop Cluster.
Developing ETL pipelines in and out data warehouse using combination of Python and Snowflake SQL.
Designed and developed Flink pipelines to consume streaming data from Kafka and applied business logic to massage and transform and serialize raw data.
Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP.
Configure, monitor, and automate Amazon Web Services as well as involved in deploying the content cloud platform on Azure using EC2, S3 and EBS Used Azure Lambda to perform data validation, filtering, transformations for every data change in a DynamoDB table and load the transformed data to another data store.
Worked on google cloud platform (GCP) services like computer engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring and cloud deployment manager.
Developed Map Reduce/Spark Python modules for machine learning (ML Infrastructure) & predictive analytics in Hadoop on Azure.
Utilized Spark, Hadoop, HBase, Kafka, Spark Streaming, Caffe, Tensor Flow, ML ops, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
Experience in LLM specializes in legal aspects like intellectual property rights related to data and algorithms and LLM involving data governance, compliance, or advising on legal implications of data.
Performed time series analysis using Tableau Desktop, created detail level summary reports and dashboards using KPI's and visualized trend analysis.
Environment: Python, R studio, Oracle, Snowflake, Machine Learning (ML Infrastructure), Regressions, KNN, SVM, SQL, Decision Tree, Random Forest, XG boost, Azure, Collaborative filtering, Ensemble), Pandas, GCP, NLP, R, ML ops, Oracle, Flink, Spark, Hive, MapReduce, Hadoop, Scikit-Learn, KERAS, TensorFlow, Seaborn, NumPy, SciPy, MySQL, Tableau.

Client: Norfolk Southern Corporation, Atlanta, GA Oct 2015 Nov 2017
Role: Data Analyst
Project: Audience Profiling (Marketing, Media Tech)

Responsibilities:
Responsible for analyzing the data from Hive tables, extracted data from tables using Pyspark SQL.
Responsible for analyzing the data with RDD using Apache spark and python programming language PySpark.
Preprocessed the data such as removing duplicates, handling missing values, and feature engineering using PySpark.
Segmented the customers by taking different features like demographics, purchase behavior, browsing behavior, customer lifetime value, etc.
Built regression model to predict customer lifetime value prediction with the help of historical data using ML ops regression algorithms such as Linear Regression, Lasso& Ridge, and Random Forest regression algorithms.
Responsible for creating data pipelines by collaborating with data engineering to create ETL pipelines.
Performed elbow method to optimal K to build K means clustering.
Built clustering model to segment the customers using ML ops package with algorithms such as Kmeans, Gaussian mixture models.
Evaluated clusters with silhouette score and evaluated clusters to measure the clustering results.
Model helped to identify the customers by targeting different products, to increase sales.
Environment: Hive, PySpark, ML ops, Azure, Kmeans, Gaussiam Mixture models, Linear, Python, Lasso, Ridge, Random Forest, SQL.
Client: Intuit, Bangalore, India Oct 2013 Sep 2015
Role: Data Analyst
Project: Financial Data Analysis (Finance)

Responsibilities:
Proficient in understanding and analysing data of the financial domain, Meetings with stakeholders to understand and set business problems.
Extracted data from various sources which include flat files, and databases such as Oracle using SQL.
Analyzed the data for data gaps and performed Exploratory data analysis on the data by performing univariate and bivariate analysis on the data with the help of pandas.
Analyzed the data for identifying the gaps in the data, analyzing the various patterns of financial transactions of the data.
Using clustering algorithms such as K means classifying the transactions and analyzing the clusters and creating the profiles for clustering to classify the clusters by labeling the clusters.
Identified anomalous/fraudulent transactions using the isolation forest algorithm to mark the transaction for further investigation.
Based on the type of transaction, transaction amount, reason, type of account, frequency of transactions, location, time, age, balance, rolling balance, etc., to check for fraudulent transactions and created a dashboard using PowerBI.
For version control git was used.
Deployment of analysis results and created data pipelines to classify future transactions by using Azure EC2 environment exposing flask, Docker.

Environment: Azure (EC2, ECS), K Means, Flask, Git, PowerBI, isolation forest, Oracle SQL, EDA, Univariate and Bivariate Analysis), Python, Pandas, scikit-learn.

Education:

B. Tech in Computer Science Engineering from Amrita University, India 2013.
Keywords: cprogramm continuous integration continuous deployment artificial intelligence machine learning javascript business intelligence sthree rlang information technology microsoft procedural language Georgia Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];2908
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: