Home

Shashank k - Data Scientist
[email protected]
Location: Remote, Remote, USA
Relocation: Any
Visa: H1B
Shashank k
Data Scientist
469-476-0484
[email protected]

Any
H1B

Professional Summary:
Senior Data Scientist and Business Intellect with 9+ years of professional experience, performing Statistical Modelling, Data Mining, Data Exploration and Data Visualization of structured and unstructured datasets and implementing Machine Learning and Deep Learning models based on business understanding to deliver insights that drive key business decisions to provide value to the business.
Integrated Docker containers with Kubernetes to orchestrate and scale a high-availability web application.
Have experience in leveraging AI and ML to extract insights, solve complex problems, and drive data-driven decision-making. seeking a challenging role in a progressive organization to apply data-driven insights and advanced AI techniques to drive business success.
Utilized Docker Compose to define and manage multi-container environments for development and testing, enhancing team collaboration.
Performed predictive Modelling, Pattern Discovery, Market Basket Analysis, Segmentation Analysis, Regression Models, and Clustering.
Utilized GCP resources namely Big Query, cloud composer, compute engine, Kubernetes cluster and GCP storage buckets for building the production ML pipeline
Experience in Creating ETL mappings using Informatica to move Data from multiple sources like Flat files, Oracle into a common target area such as Data Warehouse.
Solid understanding of Data Modelling, Data Collection, Data Cleansing, Data Warehouse/Data Mart Design, ETL, BI, OLAP, Client/Server applications
Analysed data and provided insights with R Programming and Python Pandas
Experience in writing PL/SQL statements - Stored Procedures, Functions, Triggers and packages.
Have good knowledge on LLM Modules.
Knowledge and Experience in social media analytics tools like Social Media Analytics, Microsoft Social Engagement, and Watson Analytics for social media.
Deep analytics and understanding of Big Data and algorithms using Hadoop, MapReduce, NoSQL and distributed computing tools.
Involved in creating database objects like tables, views, procedures, triggers, and functions using T-SQL to provide definition, structure and to maintain data efficiently.
Expert in Data Science process life cycle: Data Acquisition, Data Preparation, Modelling (Feature Engineering, Model Evaluation) and Deployment.
Experienced with working on A/B testing design and Execution. deploying machine learning models into production for the teams.
Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions
Worked on Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and sequence-to-sequence models.
Worked with AWS Cloud platform and its features which includes EC2, VPC, RDS, EBS, S3, CloudWatch, Cloud Trail, CloudFormation and Auto scaling etc.
Experience implementing BI solutions using Analysis Services (SSAS), dashboards, scorecards using Reporting Services (SSRS), Power BI, Tableau & Excel PowerPivot
Strong Experience in ER & Dimensional Data Modelling to deliver Normalized ER & STAR/SNOWFLAKE schemas using Erwin r7.2, ER Studio 10.0, EA Studio 1.5.1, Sybase power designer 12.1, SQL Server Enterprise manager and Oracle designer.
Experience working in Azure Cloud including Azure Data Lake Gen2 for Data Store, Azure Data Factory, Azure DeVops and Azure Databricks.
Experience in text mining and topic modelling using NLP & Neural Network, tokenizing, stemming, and lemmatizing, tagging part of speech using Text Blob, Natural Language Toolkit (NLTK), and Spacy while building Sentiment Analysis
Extensive knowledge in various reporting objects like Facts, Attributes, Hierarchies, Transformations, filters, prompts, Calculated fields, Sets, Groups, Parameters etc., in Tableau.
Hands on learning with different ETL tools to get data in shape where it could be connected to Tableau through Tableau Data Extract.
Experienced with all major databases: Oracle, SQL Server Teradata in large data warehouse (OLAP) environments
Expertise in using QlikView
Skilled in Tableau Desktop versions 10x for data visualization, Reporting and Analysis.
Developed reports, dashboards using Tableau for quick reviews to be presented to Business and IT users.
Worked in Production support team for maintaining the mappings, sessions and workflows to load the data in Data warehouse.
Experience in working with SAS Enterprise Guide Software for reporting and analytical tasks.
Experience in utilizing SAS Procedures, Macros, and other SAS application for data extraction using Oracle and Teradata.
Expertise in writing complex SQL queries, made use of Indexing, Aggregation and materialized views to optimize query performance
Involved in System Integration Testing (SIT), Regression Testing, GUI Testing, Performance Testing & User Acceptance Testing (UAT).

Technical Skills:
Languages Python, R, SQL, PySpark, , Java, , C++, C, MATLAB
Databases: SQL Server, Oracle, SQL Lite, HBase, MongoDB, Cassandra, PostgreSQL, Dynamo DB
Operating Systems Windows, Linux, Unix, Mac OS
Web Technologies React, Angular, Redux, Node, Express, JavaScript, HTML5, CSS6, DOM.
IDEs: Eclipse, IntelliJ, NetBeans, VS Code, PyCharm, Jupyter Notebook, Google Colab
Cloud: AWS and GCP
Reporting Tools:
Tableau, Power BI, BI - (SSIS, SSRS, SSAS)
Big Data Ecosystem Hadoop, HDFS, Map Reduce, Hive, Pig, Spark, Sqoop, Spark SQL.
Machine Learning RNN, CNN, Regression (Linear and Logistic), Decision trees, Random Forest, SVM, KNN, PCA
Version control: GitHub, Gitlab, SVN, Bitbucket




PROFESSIONAL EXPERIENCE:
Intact Insurance Chicago, IL Jan 2022 Till Date
Sr Data Scientist/GCP Engineer
Responsibilities:
Involved in extensive hoc reporting, routine operational reporting and data manipulation to produce routine metrics and dashboards for management
Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.
Proven track record of leveraging data science and AI to solve complex business problems
Interacting with other data scientists and architects, custom solutions for data -visualization using tools like a tableau and Packages in Python.
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
Successfully delivered multiple NLP projects like building a chatbot that assists a customer to trouble shoot claim issues and recommend actions. Further the bot could handle questions asked in natural language related to common issues with the customer e.g. when is my premium due, what is my plan deductible, what is my copay for a sick reject.
Involved in running MapReduce jobs for processing millions of records.
Written complex SQL queries using joins and OLAP functions like Count, CSUM, and Rank etc.
Building, publishing customized interactive reports, report scheduling and dashboards using Tableau server.
Can work parallelly in both GCP and Azure Clouds coherently.
Utilizing Google SQL to extract, manipulate, and analyze large datasets for actionable insights.
Conducted finite element analysis to assess structural integrity, stress distribution, and vibration characteristics of mechanical systems.
Created custom Docker images for a data pipeline, optimizing image size and improving deployment efficiency.
Uploaded and managed Docker images on Docker Hub, ensuring easy access and distribution for project stakeholders.
Worked on different data formats such as JSON, XML and performed Machine Learning algorithms in Python.
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark databricks cluster.
Developed and deployed GPT-based NLP models to automate legal document analysis, saving 30% of manual review time.
Responsible for operations and support of big data Analytis platform, Splunk and Tableau visualization.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.
Involved in Data ingestion to Azure Data Lake, Azure Databricks by building pipelines in Azure Data Factory.
Experience in working with Generative AI (GAI) as well as Discriminative machine learning algorithms
Developed and maintained BI dashboards for real-time monitoring of key performance indicators to increase in operational efficiency.
Designed and implemented OLAP cubes for multidimensional analysis, enabling the identification of market trends and customer behavior patterns.
Utilized IOT sensors for collecting health information of warehouses and build streaming data pipeline into GCP s Big Query.
Developed Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management Architecture involving OLTP, ODS and OLAP
Used pandas, NumPy, seaborn, matplotlib, scikit-learn, SciPy, NLTK in Python for developing various Machine Learning algorithms.
Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine Learning applications, executed Machine Learning use cases under Spark ML and Mllib.
Designed and developed NLP models for sentiment analysis.
Designed and provisioned the platform architecture to execute Hadoop and Machine Learning use cases under Cloud infrastructure, AWS, EMR, and S3.
Worked on Machine Learning on large size data using Spark and MapReduce.
Application of various Machine Learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using the scikit-learn package in python, MATLAB.
Developed in Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
Strong foundation in natural language processing (NLP) with generative models and text generation.
Performing statistical data analysis and data visualization using Python.
Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau.
Created data models in Splunk using pivot tables by analyzing vast amount of data and extracting key information to suit various business requirements.
Created new scripts for Splunk scripted input for system, collecting CPU and OS data.
Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
Developed normalized Logical and Physical database models for designing an OLTP application.
Knowledgeable in AWS Environment for loading data files from on prim to Redshift cluster.
Performed SQL Testing on AWS Redshift databases
Developed TeradataSQL scripts using OLAP functions like rank and rank over to improve the query performance while pulling the data from large tables.
Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.
Designed the DataMarts in dimensional data modelling using star and snowflake schemas.
Analyzed Data Set with SAS programming, R and Excel.
Publish Interactive dashboards and schedule auto-data refreshes
Maintenance of large data sets, combining data from various sources by Excel, Enterprise, and SAS Grid, Access and SQL queries.
Created Hive queries that helped market analysts spot emerging trends by comparing incremental data with Teradata reference tables and historical metrics.
Design and development of ETL processes using Informatica ETL tools for dimension and fact file creation.
Develop and automate solutions for a new billing and membership Enterprise data Warehouse including ETL routines, tables, maps, materialized views, and stored procedures incorporating Informatica and Oracle PL/SQL toolsets.
Performed analysis of implementing Spark uses Scala and wrote spark sample programs using PySpark.
Collaborative team player with excellent communication skills and a commitment to responsible AI development.

Environment: - SQL/Server, Oracle, MS-Office, Google SQL, Teradata,GPT, Informatica, ER Studio, XML, R connector, Python, R, Tableau 9.2

Humana Louisville, KY Sep 2020 Dec 2021
Data Scientist
Responsibilities:
Worked with large amounts of structured and unstructured data.
Knowledge in Machine Learning concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.)
Responsible to build an Azure Cloud Enterprise Data Platform. Including establishing connection between Azure Resources (ADF, Databricks, ADLS GEN2, Storage layer access for ADF)
Worked in Business Intelligence tools and visualization tools such as BusinessObjects, Tableau, Chart IO, etc.
Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
Configured the project on Web Sphere 6.1 application servers
Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL.
Handled end-to-end project from data discovery to model deployment.
Monitoring the automated loading processes.
Communicated with other Health Care info by using Web Services with the help of SOAP, WSDLJAX-RPC
Utilized Google SQL to build and optimize databases, improving query efficiency and reducing processing.
Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
Utilized GPT models to develop natural language processing solutions, including chatbots and text generation tools.
Used SAX and DOM parsers to parse the raw XML documents
Used RAD as Development IDE for web applications.
Strong programming skills in Python, R, and experience with AI frameworks like TensorFlow and PyTorch.
Developed Predictive Analytics using Pyspark and Spark SQL on Databricks to extract, transform and uncover insights from the raw data.
Used Log4J logging framework to write Log messages with various levels.
Involved in fixing bugs and minor enhancements for the front-end modules.
Implemented MicrosoftVisio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
Doing functional and technical reviews
Maintenance in the testing team for System testing/Integration/UAT.
Guaranteeing quality in the deliverables.
Conducted Design reviews and technical reviews with other project stakeholders.
Was a part of the complete life cycle of the project from the requirements to the production support
Created test plan documents for all back-end database modules
Implemented the project in Linux environment.

Environment: - R 3.0, Erwin 9.5, Tableau 8.0, MDM, Google SQL, GPT, QlikView, MLlib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

JVR Systems Plano, TX Dec 2019 Aug 2020
Data Scientist/Data Engineer
Responsibilities:
Application of various machine learning algorithms and Statistical Modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python
Developed NLP models for Topic Extraction, Sentiment Analysis, Identify and assess available machine learning and statistical analysis libraries (Decision Trees, Regression Models, Neural Networks, Support Vector Machine (SVM), Clustering) and work with NLTK library to NLP data processing and finding the patterns.
Conducted experiments to validate and improve mechanical designs.
Used the AWS Sage Maker to quickly build, train and deploy the machine learning models. Worked with Spacy library for deep learning.
Using Spacy prepared text for deep learning and connected to statistical models and rest of application.
Experience in moving data between GCP and Azure using Azure Data Factory.
Conducted exploratory data analysis, cleaning, and feature engineering to prepare datasets for AI model development.
Built Artificial Neural Network using Tensor Flow in Python to identify the customer's probability of cancelling the connections. (Churn rate prediction)
Generated graphs and reports using matplotlib, NumPy, Scikit-Learn, Seaborn and pandas packages in python for analytical models and cross data validation.
Involved in the development of real time streaming applications using PySpark, Apache Flink, Kafka, Hive on distributed Hadoop Cluster.
Developed predictive models using Decision Tree, Random Forest, and Naive Bayes.
Cleansing and analyzing financial data by creating SAS/macros and preparing reports using SAS report procedures like proc print, proc report, proc tabulate, proc freq, proc means and proc transpose.
Collaborated with cross-functional teams to integrate AI solutions into existing systems and processes.
Designed and developed Flink pipelines to consume streaming data from kafka and applied business logic to massage and transform and serialize raw data.
Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP. Configure, monitor and automate Amazon Web Services as well as involved in deploying the content cloud platform on Amazon Web Services using EC2, S3 and EBS Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load the transformed data to another data store.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's. Implemented Elastic Search on Hive data warehouse platform and worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
Worked on google cloud platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring and cloud deployment manager.
Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
Utilized Spark, Hadoop, HBase, Kafka, Spark Streaming, Caffe, Tensor Flow, ML Lib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
Performed time series analysis using Tableau Desktop, created detail level summary reports and dashboards using KPI's and visualized trend analysis.

Environment: Python, R studio, Oracle, Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, XG boost, Collaborative filtering, Ensemble), NLP, R, Oracle, Flink, AWS, Spark, Hive, MapReduce, Hadoop, Scikit-Learn, KERAS, TensorFlow, Seaborn, NumPy, SciPy, MySQL, Tableau

JDA Software India Oct 2017 Oct 2019
Data Analyst
Responsibilities:
Explored the insurance claim data and found the patterns, groups, and regions where claims are more and compared them using pie charts and bar graphs.
Involved in designing and developing the predictive model which predicts the false claims using the historical claims data with 85%, performed this in R programming and did A/B testing.
Designed a model to predict the potential claimant (who claims more than a specific amount) using company s claims data using Logistic Regression, Decision Trees, and Random Forest.
Performed Credit Risk Predictive Modelling by using Decision Trees and Regressions to get the risk involved by giving individual scores to the customers
Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.
Key member of the Wholesale Credit Risk Team responsible for generating the wholesale exposure data for building the Accounting View.
Contributed in defining the directory structure and wholesale credit risk data model
Involved in the initial CRP architectural and design meetings to define the directory structure and Wholesale Credit Risk Data Model.

ENVIRONMENT: R, SQL, logistic regression, Hadoop, Hive, random forest, SVM, JSON, Tableau, XML AWS.

Telstra India June 2014 Sep 2017
Data Analyst
Responsibilities:
Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
Implemented MicrosoftVisio and RationalRose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
Worked with other teams to analyse customers to analyse parameters of marketing.
Conducted Design reviews and technical reviews with other project stakeholders.
Was a part of the complete life cycle of the project from the requirements to the production support
Created test plan documents for all back-end database modules
Used MS Excel, MS Access, and SQL to write and run various queries.
Used trace ability matrix to trace the requirements of the organization.
Recommended structural changes and enhancements to systems and databases.
Conducted Design reviews and technical reviews with other project stakeholders.
Maintenance in the testing team for System testing/Integration/UAT
Guaranteeing quality in the deliverables.

Environment: - Teradata SQL Assistant, Teradata Loading utilities (Bteq, Fast Load, Multi Load), Python, UNIX, Tableau, MS Excel, MS Power Point, Business Objects, Oracle.
Keywords: cprogramm cplusplus artificial intelligence machine learning business intelligence sthree database rlang information technology microsoft procedural language Illinois Kentucky Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1353
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: