Home

Arvind Sagi - Data Scientist / Data Analyst
[email protected]
Location: Boston, Massachusetts, USA
Relocation: Yes
Visa: H1B
Name: Arvind Sagi
Phone: 281-769-1263
Email: [email protected]
LinkedIn: linkedin.com/in/arvinds42
Data Analyst/Scientist

Summary:

Around 7 years of total IT experience in the Analysis, Design, Modeling, Development, Implementation and Testing of Data Warehouse applications and acquired excellent analytical, co-ordination and interpersonal skills for leadership.
A Data Warehouse developer, Data modeler, Data Analyst with more than 5 years of experience in all phases of Software development life cycle (SDLC) including System Analysis, Design, Data Modeling, Implementation and Support and maintenance of various applications in both OLTP and OLAP systems.
Possess strong Documentation skill and knowledge sharing among Team, conducted data modeling sessions for different user groups, facilitated common data models between different applications, participated in requirement sessions to identify logical entities.
Extensive Experience working with business users as well as senior management. Strong understanding of the principles of Data warehousing, Fact Tables, Dimension Tables, Star and Snowflake schema modeling. Strong experience with Database performance tuning and optimization, query optimization, index tuning, caching and buffer tuning.
Extensive experience in relational Data modeling, Dimensional data modeling, logical/Physical Design, ER Diagrams and OLTP and OLAP System Study and Analysis.
Extensive experience in Enterprise Information Management and Architecture technologies including Information Lifecycle, Master Data Management and Business Intelligence.
Experienced in writing SQL Stored Procedures, Triggers and Functions.
Experience in building data intensive applications and creating pipelines using python and shell scripting with extensive knowledge on amazon web services (AWS).
Building a Data warehouse using Star and Snowflake schemas. Well versed with scrum methodologies.
Experience with Amazon Web Services (Amazon EC2, Amazon S3, Amazon RDS, Amazon SQS, AWS Cloud Watch, Amazon EBS, Glue,Lambda and Redshift)
Experience in using python integrated IDEs like PyCharm, Jupyter, VS Code
Working knowledge on Kubernetes to deploy scale, load balance, and manage Docker containers.
Good knowledge in Data Extraction, Transforming and Loading (ETL) using various tools such as SQL Server
Integration Services (SSIS), Data Transformation Services (DTS).
Data Ingestion to Azure Services and processing the data in In Azure Databricks.
Creating and enhancing CI/CD pipeline to ensure Business Analysts can build, test, and deploy quickly.
Extensive knowledge of Exploratory Data Analysis, Big Data Analytics using Spark, Predictive analysis using Linear and Logistic Regression models and good understanding in supervised and unsupervised algorithms.
Worked on different statistical techniques like Linear/Logistic Regression, Correlational Tests, ANOVA, Chi-Square Analysis, K-means clustering.
Hands-on experience on Visualizing the data using Power BI, Tableau, R, Python
Integrating Azure Databricks with Power BI and creating dashboards.
Good Knowledge in writing Data Analysis expression (DAX) in Tabular data model.
Hands on knowledge in designing Database schema by achieving normalization.
Well versed with Scrum methodologies.
Analyzed the requirements and developed Use Cases, UML Diagrams, Class Diagrams, Sequence and State Machine Diagrams.
Excellent communication and interpersonal skills with ability in resolving complex business problems.
Direct interaction with client and business users across different locations for critical issues.

TECHNICAL SKILLS:

Languages Python, R, C, C++
Integrated Development Environment Jupyter, R Studio, Google Colab, Visual Studio Code, PyCharm
Operating Systems Unix, Linux, Windows
Libraries Spacy, Nltk, Scikit Learn, Beautiful Soup, TensorFlow, Keras, Pytorch, PySpark,Tidyverse, Ggplot2, Lettice, PlotlyR
Cloud Databricks, Azure Form Recognizer, AWS [S3, Glue, Redshift], DataLake, BigQuery, Pyspark
Databases MySQL, Microsoft SQL Server, SQLite, PostgreSQL
BI Tools Microsoft Excel, PowerBI, Tableau, Google Sheets
Other Microsoft PowerPoint, SAP SD
Source Control GIT, GitHub,
Cluster Management Kubernetes
Build and Deploy Tools Jenkins, Docker
Software Methodologies Agile-SCRUM, Waterfall
Ticketing Tool JIRA, Service Now
Big Data machine learning, natural language processing, data engineering, data pipelines, predictive modeling, deep learning, pre-trained transformers, clustering, classification, regression, decision trees.
Data Manipulation data wrangling, data cleaning, data visualization, dashboards, analytics

Education

Master of Science in Data Science Northeastern University

Experience:

Master Card, New York, NY May 2022- Till Date
Data Scientist

Responsibilities:

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL
queries to perform data extraction to fit the analytical requirements.
Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Red Shift.
Perform Data Profiling to learn about behavior with various features such as traffic pattern, location, Date and Time etc.
Tackled highly imbalanced Fraud dataset using under sampling, oversampling with SMOTE and cost sensitive algorithms with Python Scikit-learn.
Analyze Data and Performed Data Preparation by applying historical model on the data set in AZUREML.
Implemented Statistical model and Deep Learning Model (Logistic Regression, XGboost, Random Forest, SVM,RNN, CNN).
Deployed Models using Docker containers by creating docker image on EC2 instance.
Performed fraud detection predictions by hosting multiple models on GCP Kubernetes cluster.
Used AWS S3, EC2 using AWS CLI and Boto3 client.
Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon etc.
Building data platforms for analytics, advanced analytics in Azure.
Experienced in using supervised, unsupervised and regression techniques in building models.
Employed various metrics such as RMSE, MAE, Confusion Matrix, ROC and AUC to evaluate the performance of each model.
Work with NLTK library to NLP data processing and finding the patterns.
Performed Sentiment Analysis and Text using different social networking sites for categorizing comments into positive and negative clusters.
Developed interactive dashboards, created various Ad Hoc reports for users in Tableau by connecting various data sources.
Used Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Linear regression, Logistic regression, Gradient Boosting, SVM and KNN
Used PCA and other feature engineering techniques for high dimensional datasets while maintaining the variance of most important features.
Created Transformation Pipelines for preprocessing large amount of data with methods such as imputing, scaling, selecting and etc.
Built interactive dashboards using DASH, plotly.
Developed interactive executive dashboards using Tableau to provide a reporting tool that facilitates organizational metrics and data.
Conducted data preparation and outlier detection using Python.
Environment: Python 3.x, R, HDFS, Hadoop 2.3, Hive, Linux, Docker Image, GCP, Kubernetes Cluster Spark,Tableau Desktop, SQL Server 2012, Microsoft Excel, Matlab, Spark SQL, Pyspark.

Chubb Insurance, Jersey City, NJ Apr 2021 Apr 2022
Artificial Intelligence Engineer / Data Scientist

Responsibilities:

Automated insurance cancellation with HuggingFace QnA model and validated with Triplet Evaluator, to maintain the semantic structure of the query, for chatbot, accessing a wide scaled customer base across 2 countries.
Generated N.E.R. model on severance packages for automatic insurance approval, reducing 150+ man-hours of work per month.
Utilized Docker & Kubernetes for containerization, virtualization, Ship, Run, and deploying models securely.
Developed N.E.R. and Rule-based models for entities from Emails and Forms to automate Mexico Auto-Insurance approval to increase projected annual revenue by 1.25 million USD, in decreased costs and increased period lengths.
Implemented RNN and LSTM models to analyze risk factors, assess customer behavior patterns, and develop personalized pricing strategies that reward positive behaviors, resulting in lower premiums for customers.
Revamped coverage options by collaborating with product development teams to design and implement flexible plans tailored to diverse customer segments, ensuring affordability and accessibility.
Environment: PyCharm, Databricks, Azure, GIT, Python, Kubernetes, Docker

Accenture, Hyderabad, India Jun 2018 - Dec 2020
Application Development Analyst / Data Scientist

Responsibilities:

Pitched strategic sales & logistics development for client portfolio expansion utilizing advanced reporting, data modelling, analytics, and building visualization dashboards, bringing in 3 of these clients.
Coalesced efforts with clients on BOM tool and inventory model creation to track supply chain overorders, resulting in $650,000 in operational savings with no customer service disruption.
Integrated logistic regression model to help the SEO team decide on which keywords to target, resulting in a 15% lift in YoY site visitors.
Initiated and conducted a systematized requirement gathering & data sources identification as cross-functional teamwork, from stakeholders, and to collaborate and send status reports, accelerating operational efficiency by ~30%.
Ensured data accuracy, consistency, and integrity by developing and implementing data cleansing and validation procedures.
Assisted management in gathering insights for complex business issues to develop advance analytical tools, rendering backend support in analyzing reports and management models to improve business performance based on results.
Helped to translate data complexity to business understanding to standardize the flow of communication among internal teams and with clients.
Environment: Jupyter, Python, MySQL , MS Excel, PowerBI ,AWS, GIT, GitHub

Accenture, Pune, India May 2016 to Jun 2018
Application Development Associate

Responsibilities:

Streamlined postdated invoices for local tax brackets, summarizing net amount payable per cycle, boosting credit management & inventory maintenance written on the aviation line, adding ~ 2 million Euros/quarter as cost optimization.
Built 3 periodic chains for invoice writing cycles to reduce delay time in airport holdings and to track fiscal loss minimization.
Revamped existing Trans-codification tables & centralized them on a Bill-to basis, accumulating all business accruals in one place.
Supervised new hires performance and hosted collaborative knowledge transfer sessions, as a Lead, for 18 months.
Partnered with stakeholders to define key performance indicators (KPIs) and develop data-driven metrics, like CAC, for monitoring and evaluating business performance.

Environment: Jupyter, Python, MS Excel, Tableau
Keywords: cprogramm cplusplus continuous integration continuous deployment business intelligence sthree active directory rlang information technology microsoft Colorado New Jersey New York South Dakota

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];155
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: