Home

swetha - Data Scientist
[email protected]
Location: Stephens City, Virginia, USA
Relocation: yes
Visa: H1B
Swetha R
Role: Data Scientist



TECHNICAL SKILLS:

Data Science Tools Machine Learning, Deep Learning, Data Mining, Data Analysis, Big data, Visualizing, Data Modelling
Database Microsoft SQL PostgreSQL, MongoDB, Cosmos DB
Reporting Tools Business Objects, MS Excel Reports, MS Access Reports, Tableau reports
Operating Systems Windows, Linux
Languages SQL, Python (Pandas, SciPy, Matplotlib, OpenCV, PyTesseract)
Skills Regression, Clustering, Random forest, SVM
Cloud Technology Microsoft Azure

EXPERIENCE: 12+

Client: The Boeing Company, Seattle, WA Nov 20- present
Role: Lead Data Scientist

Responsibilities:
Gathered, documented, and implemented business requirements for analysis or as part of a long-term document/report generation. Analyzed large volumes of data and provide results to technical and managerial staffs.
Worked with various data pools and DBAs to have access to data. Have knowledge of NLP, NLTK or Text Mining.
Have programming knowledge in Sql and python.
Used K-means clustering for grouping similar data and documented.
Extracted, transformed, and loaded data in Postgres data base using Python scripts.
Data visualization: Have knowledge of Numerical optimization, Anomaly Detection and estimation, A/B testing, Statistics, numpy, scipy, Pandas, scikit-learn.
Worked to research and develop statistical learning models for data analysis. Collaborated with product management and engineering departments.
SAS Data Analysts is used for analyzing client business needs, managing large data sets, storing and extracting information. Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
Splunk ES was used for application management, security, performance management, and analytics for the public APIs.
Worked on data generation, machine learning for Anti-Fraud detection, Data Modeling, operation decision, and loss forecasting such as product-specific fraud, or buyer vs. seller fraud.
Monte Carlo simulation algorithms were used to obtain numerical results by running simulations many times in succession in order to calculate probabilities with machine learning. Analyzed data for Fraud Analysis and Direct Fraud.
K-fold cross Validation technique was used to improved model performance and to test the model on the sample data before finalizing the model.




Worked with public/private Cloud Computing technologies (IaaS, PaaS &SaaS) and Microsoft Azure and worked for customer analytics and predictions.
Kibana and Tableau was used for Business Intelligence tool for visually analyzing the data and to shows the trends, variations and density of the data in form of graphs and charts.
Formulated procedures for integration of R programming plans with data sources and delivery systems.
Used query languages such as SQL and experience with NoSQL databases, such as MongoDB.
Worked with both unstructured/structured data Machine Learning Algorithms such as Linear, Logistic, Decision Tress, Random Forests, Support Vector Machines, Neural Networks, KNN, and Time series analysis.
Keras along with numerical computation libraries such as Theano and TensorFlow was used for developing and evaluating deep neural network models.
Tableau was used for analyzing the data to show the trends, variations and density of the data in form of graphs and charts. Tableau was connected to files, relational and big data sources to acquire and process data.
Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
Created and executed complex SQL statements in both SQL production and development environments.
Used scikit-learn, Pandas, and the stats models Python libraries to build predictive forecasting for time series analysis using AR (Autoregressive), MA (Moving Average), and ARIMA (Autoregressive Integrated Moving Average) models.


Client: Parker Hannifin Sunnyvale, CA. Apr 2017 Oct 2020
Role:Sr. Data Scientist

Responsibilities:
Implemented various Machine learning algorithms - Linear Regression, Logistic Regression, Decision Tree, SVM, Naive Bayes, KNN, K-Means, Random Forest, and Gradient Boost & Adaboost on UCI Machine Learning Repository.
Built data pipeline framework using python for data extraction, data wrangling and data loading.
Involved in Data analysis for data conversion - included data mapping from source to target database schemas, specification and writing data extract scripts/programming of data conversion, in test and production environments.
Data Warehouse - Designed and programmed ETL and aggregation of data in target database, working with staging, de-normalized and start schemas and dimensional reporting.
Developed business predictive/historic analysis, Data Mining/Text Mining using Python with pandas.
Integrated new tools and developed technology frameworks/prototypes to accelerate the data integration process and empower the deployment of predictive analytics by developing Spark Scala modules with R.
Developed and implemented Predictive analysis using R for Management and Business users for decisions making process.
Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
Developed Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
Performing statistical data analysis and data visualization using Python and R
Worked on creating filters, parameters and calculated sets for preparing dashboards and worksheets.


Interacting with other data scientists and architected custom solutions for data visualization using tools like tableau, Packages in R and R-Shiny.
Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
Worked on SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
Design and development of ETL processes using Informatica ETL tool for dimension and fact file creation.
Responsible for business case analysis, requirements gathering use case documentation, prioritization and product/portfolio strategic roadmap planning, high level design and data model.
Responsible for the end to end solutions delivery, including sprint planning and execution, change management, project management, operations management, and UAT.
Primary liaison between customer and engineering groups; serve as the key unifying force for all BI platform activities that enables better communication across teams, proactively identifies gaps, and ensure the successful delivery of capabilities.


Client: TATACONSULTANCYSERVICES India Oct 2013 Apr 2017
Role: Data Scientist

Responsibilities:
Worked with business requirements analysts/subject matter experts to identify and understand requirements. Conducted user interviews and data analysis review meetings.
Defined key facts and dimensions necessary to support the business requirements along with Data Modeler.
Created draft data models for understanding and to help Data Modeler.
Resolved the data related issues such as: assessing data quality, data consolidation, evaluating existing data sources.
Manipulating, cleansing & processing data using Excel, Access and SQL.
Responsible for loading, extracting and validation of client data.
Coordinated with the front-end design team to provide them with the necessary stored procedures and packages and the necessary insight into the data.
Participated in requirements definition, analysis and the design of logical and physical data models
Leading data discovery discussions with Business in JAD sessions and map the business requirements to logical and physical modeling solutions.
Conducted data model reviews with project team members captured technical metadata through data modeling tools
Code standard Informatica ETL routines developed standard Cognos Reports.
Collaborated with ETL teams to create data landing and staging structures as well as source to target mapping documents
Ensure data warehouse database designs efficiently support BI and end user requirements.
Collaborated with application and services teams to design databases and interfaces which fully meet business and technical requirements
Maintain expertise and proficiency in the various application areas.
Maintain current knowledge of industry trends and standards.

Environment: Informatica, SQL Developer, PL/SQL, MS Access, MS Excel






Client: Textron India Private Limited Bangalore, India. Sep 2010 Oct 2013
Role: Python Developer

Responsibilities:
Involved in building database Model, APIs and Views utilizing Python, to build an interactive web-based solution.
Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.
Used Python to write procedural code using C# which is object - oriented.
Worked on Python Open stack API's.
Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.
Carried out various mathematical operations for calculation purpose using python libraries.
Involved in building database Model, APIs and Views utilizing Python in order to build an interactive web-based solution.
Creating unit test/regression test framework for working/new code.
Using Subversion version control tool to coordinate team-development.
Developed SQL Queries, Stored Procedures, and Triggers Using SQL and PL/SQL.
Used GitHub for version control.
Created PyUnit test cases for unit testing.
Developed Python batch processors to consume and produce various feeds.
Managed large datasets using Panda data frames and MySQL.
Generated property list for every application dynamically using Python.
Wrote validation scripts in SQL to validate data loading.
Utilized Agile process and JIRA issue management to track sprint cycles.
Supported user groups by handling target-related software issues/service requests, identifying/fixing bugs.
Used data types like dictionaries, tuples and object-oriented concepts-based inheritance features for making complex algorithms of networks.

Environment: Python, XML, JSON, REST, GitHub, Jira, SQL, MYSQL, Agile and Windows.
Keywords: csharp business intelligence database active directory rlang microsoft procedural language Arkansas California Delaware Massachusetts Washington

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];25
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: