Home

Rahul Mogillapalli - Data Engineer
[email protected]
Location: Remote, Remote, USA
Relocation:
Visa: H1B
Rahul
Data Engineer
Ph:(707)383-6730
Email:[email protected]

PROFESSIONAL SUMMARY:

Over 8+ years of experience in areas including Data Analyst, Statistical Analysis, Machine Learning, Deep Learning with large data sets of structured and unstructured data intravel services, strong functional knowledge, business processes, and latest market trends and manufactory industries.
Developed predictive models using Decision Tree, Random Forest, Na ve Bayes, Logistic Regression, Cluster Analysis.
Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling, and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema.
Expert in the entire Data Science process life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation, and Visualization.
Strong knowledge in Statistical methodologies such as HypothesisTesting, Principal Component Analysis (PCA), Sampling Distributions and Time Series Analysis.
Extensively worked on Data preparation, exploratory analysis, Feature engineering using supervised and unsupervised modeling.
Experienced in Machine Learning, Data mining with large datasets of Structured and Unstructured Data, Data Acquisition, Data Validation, and Predictive Modeling.
Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values using Talend tool.
Proficient in Python and its libraries such as NumPy, Pandas, Scikit-Learn, Matplotlib and Seaborn.
Expert in prepossessing data in Pandas using visualization, data cleaning and engineering methods such as looking for Correlations, Imputations, Scaling and Handling Categories
Experience in building various machine learning models using algorithms such as Linear Regression, Gradient Descent, Support Vector Machines (SVM), Logistic Regression, KNN, Decision Tree, Ensembles such as Random Forrest, AdaBoost, Gradient Boosting Trees.
Experienced the full software lifecycle in SDLC, Agile, and Scrum methodologies.
Strong SQL programming skills, with experience in working with functions, packages, and triggers.
Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forestsetc.
Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, SSRS.
Expert in developing Data Conversions/Migration from Legacy System of various sources (flat files, Oracle, Non-Oracle Database) to Oracle system Using SQL LOADER, External table and Calling Appropriate Interface tables and API's Informatica.
Drive core insights from available data to suggest A/B tests that drive improvements in site execution and experience
Good Knowledge and experience in deep learning algorithms such as Artificial Neural network (ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), LSTM and RNN based speech recognition using Tensor Flow.
Working knowledge on Azure cloud components (HDInsight, DataBricks, DataLake, Blob Storage, Data Factory, Storage Explorer, SQL DB, SQL DWH, CosmosDB).
Experienced in developing Data Pipelines in Azure Data Factory and Datasets/pipelines during ETL process from Azure SQL, Blob Storage, Azure SQL Datawarehouse.
Possesses hands on experience in Cloudera Hadoop, Hadoop, various ETL tools, Cassandra, and various Confidential IaaS/PaaS services.
Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, Cassandra, HBase, Sqoop, Hive, Pig, MLlib, ELT.
Expertise in writing Spark RDD transformations, actions, Data Frames, case classes for the required input data and performed the data transformations using Spark-Core, Spark SQL.
Expertise in building PySpark and Spark-Scala applications for analysis, batch processing, and stream processing.
Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka.
Experienced in writing Spark scripts in both Python and Scala for development and data analysis.
Experience in analyzing data from multiple sources and creating reports with Interactive Dashboards using power BI, Tableau and Matplotlib.
Experience in Importing and exporting data from different databases like MS SQL Server, Oracle, Cassandra, Teradata, PostgreSQL Post into HDFS using Sqoop.
Experience in Developing ETL workflows using Informatica PowerCenter 9.X/8.X and IDQ. Worked extensively with Informatica client tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor.
Well versed with the design and development of presentation layer for web applications using technologies like HTML, CSS, jQuery, and JavaScript.
Experience in software methodologies like Agile, Waterfall model.

TOOLS & TECHNOLOGIES:

Big Data Ecosystem HDFS, MapReduce, Yarn, Spark, Kafka, Airflow, Hive, Pig.
Hadoop Distributions Apache Hadoop 2.x/1.x, AWS (EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, DynamoDB, Redshift, ECS, Quicksight), Azure (HDInsight, DataBricks, DataLake, Blob Storage, Data Factory ADF, SQL DB, SQL DWH, CosmosDB, Azure AD).
Machine Learning Linear Regression, Logistic Regression, Na ve Bayes, Decision Trees, Random Forest, SVM, KNN.
Programming Languages Python, Scala, Shell Scripting, Pig Latin, HiveQL.
NoSQL Database MongoDB 3.x, Hadoop HBase 0.98.
Database Snowflake, AWS RDS, Teradata, Oracle 9i/10g, MySQL 5.5/5.6/8.0, Microsoft SQL, Postgres SQL.
ETL/BI Snowflake, Informatica, Tableau.
Reporting & Visualization Tableau 9.x/10.x, Matplotlib, Power BI.
Web Development JavaScript, HTML, CSS, Postman, Flask.
Operating systems Linux (Ubuntu, RedHat), Windows (XP/7/8/10).



PROFESSIONAL WORK EXPERIENCE:

EVERBANK (TIAA)J April 2022 - Till Date
Data Engineer/ Data Analyst

Description: We feel privileged every day to work with such an amazing team and to be able to strive for creative excellence. With so many backgrounds, locations, and styles, we truly bring a wide range of ideas to the table.

Responsibilities:
Perform data manipulation, data preparation, normalization, and predictive modelling. Improve efficiency and accuracy by evaluating model in Python and R.
Used Python and R for programming for improvement of model. Upgrade the entire models for improvement of the product.
Writing complex SQL queries for validating the data against different kinds of reports generated by Cognos.
Filtered the discovered boundaries by implementing a non-max suppression algorithm to achieve an optimal bounding box per identified object.
Evaluated models using Cross-Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elasticsearch, Kibana, etc
Implemented statistical modeling with XGBoostmachine learning software package using Python to determine the predicted probabilities of each model.
Created master data for modeling by combining various tables and derived fields from client data and students LORs, essays and various performance metrics.
Used NumPy, SciPy, pandas, NTLK(Natural Language Processing Toolkit),Matplotlib to build the model.
Worked in Hadoop Environment using pig, Sqoop, Hive, HBase and detailed understanding of MapReduce programs
Involved in integration of various relational and non-relational sources such as Oracle, XML and Flat Files.
Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using AmazonEC2.
Formulated several graphs to show the performance of the students by demographics and their mean score in different USMLE exams.
Application of various Artificial Intelligence (AI)/machine learning algorithms and statistical modeling like decision trees,text analytics, natural language processing(NLP), supervised and unsupervised, regression models for Predictive Analysis of data.
Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python and build models using deep learning frameworks.
Created deep learning models using TensorFlow and Keras by combining all tests as a single normalized score and predict residency attainment of students.
Used XGB classifier if the feature is a categorical variable and XGB regressor for continuous variables and combined it using FeatureUnion and FunctionTransfomer methods of Natural Language Processing.
Created data layers as signals to Signal Hub to predict new unseen data with performance not less than the static model build using deep learning framework.
Worked with the Data Governance group in creating a custom data dictionary template to be used across the various business lines.
Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
Interface with other technology teams to load (ETL), extract and transform data from a wide variety of data sources
Provides input and recommendations on technical issues to Business & Data Analysts, BI Engineers,and Data Scientists.
Developed SQL procedures to synchronize the dynamic data generated from GTID systems with the Azure SQL Server.
Expertise in building Azure native enterprise applications and migrating applications from on-premises to Azure environments
Responsible for maintenance and monitoring of production/test/dev systems running on MS Azure

Environment: Python 2.x,3.x, Hive, AWS, Linux, Tableau Desktop, Microsoft Excel, NLP, Deep learning frameworks such as TensorFlow, Keras, Boosting algorithms,DB2, R, Python, Visio, HP ALM, Agile.

CGI Boston. Dec 2020 - March 2022
Data Engineer

Description: CGI is among the largest IT and business consulting services firms in the world. We are insights-driven and outcomes-based to help accelerate returns on your investments.

Responsibilities:
Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver data science solutions.
Extracted data from SQL Server Database copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, NumPy.
Developed predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning.
Implemented public segmentation using unsupervised machinelearning algorithms by implementing K-means algorithm by using PySpark using data munging. Analyzed and solved business problems and found patterns and insights within structured and unstructured data.
Implemented advanced computer vision techniques like distortion correction, thresholding techniques, and the sliding window method to identify the lane markings to highlight the entire lane.
Tested the algorithm in a video to ensure that the lane boundaries are accurately identified.
Utilized a diverse array of technologies and tools as needed, to deliver insights such as R, SAS, MATLAB, Tableau and more.
Detected near-duplicated news by applying NLP methods and developing machine learning models like label spreading and clustering.
Employed the output of the semantic segmentation to perform drivable space estimation in 3D, lane estimation and to filter errors in the output of the 2D object detectors.
Prototyping and experimenting with ML algorithms and integrating into a production system for different business needs.
Implemented Porter Stemmer (Natural Language Tool Kit) and NLP bag of words model (Count Vectorizer) to prepare the data.
Implemented number of customer clustering models and these clusters are plotted visually using Tableau legends for the higher management.
Developed SQL procedures to synchronize the dynamic data generated from GTID systems with the Azure SQL Server.
Process automation using Python/R scripts with Oracle database to generate and write the results in the production environment on a weekly basis.
Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified various anomalies.
Performing Data Validation / Data Reconciliation between the disparate source and target systems for various projects.
Writing complex SQL queries for validating the data against different kinds of reports generated by Cognos.
Provides input and recommendations on technical issues to Business & Data Analysts, BI Engineers, and Data Scientists.

Environment: SAS, R, MLIB, Python, Data Governance, MDM, MATLAB, Tableau,Azure SQL Server.

Dealer socket - Dallas, TX. Aug 2019 - Nov 2020
Data Engineer

Description: Dealer Socket transforms the automotive experience with innovations and unparalleled service that helps customers grow and serve their customers.

Responsibilities:
Analyzed data using SQL, R, Python, Apache Spark, PySpark and presented analytical reports to management and technical teams.
Worked with different datasets which includes both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation and Visualization.
Lead discussions with users to gather business processes requirements and data requirements to develop a variety of conceptual, logical and Physical Data models and ETL using Informatica and Talend.
Expertise in Business intelligence and Data Visualization tools like Tableau.
Handled importing data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.
Designed and implemented a recommendation system which leverage Statistical Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend policies for different customers.
Created Data Quality Scripts using SQL and Hive (HQL) to validate successful data load and quality of the data.
Participated in features engineering such as feature generating, PCA, Feature normalization and label encoding with Scikit-learn pre-processing. Data Imputation using variant methods in Scikit-learn package in Python.
Utilized Informatica toolset (Informatica Data Explorer and Data Quality) to inspect legacy data for data profiling.

Environment: SQL Server, Hive, Hadoop Cluster, ETL, Tableau, Teradata, Machine Learning (Logistic regression/ Random Forests/ Collaborative filtering), Git Hub, MS Office suite, Agile.

Citi Financial - Tampa, FL Aug 2018 - July 2019
Data Engineer

Description: Citigroup Inc. or Citi is an American multinational investment bank and financial services corporation

Responsibilities:
Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.
Retrieving data from SQLServer database by writing SQL queries like stored procedure, temp table, view.
Worked with the DBA group to create a Best-Fit Physical Data Model from the Logical Data Model using Forward engineering using Erwin.
Connected Database with Jupyter notebook for Modeling and Tableau for visualization and reporting.
Worked on fraud detection analysis on loan applications using the history of loan taking with supervised learning methods.
Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Logistic Regression, Random Forest, Gradient Boost Decision Tree, and Neural Network.
Experienced in performing feature engineering such as PCA for high dimensional datasets, important feature selection by Tree-based models.
Perform model tuning and selection by using cross-validation, parameters tuning to prevent overfitting.
Ensemble methods were used to increase the accuracy of the training model with different Bagging and Boosting methods.

Environment: SQL Server 2008, Python 2.x (NumPy/Pandas/Scikit-Learn), GitHub, Scrum, JIRA.

Capgemini Technologies, Hyderabad, India Nov 2016 July 2018
Data Engineer

Description: We are a global leader in partnering with companies to transform and manage their business by harnessing the power of technology.

Responsibilities:
Experience in working with Azure cloud platform (HDInsight, DataBricks, DataLake, Blob Storage, Data Factory, SQLDB, SQL DWH and Data Storage Explorer).
Experienced with AWS, AZURE services to smoothly manage applications in the cloud.
Involved in building and creating HDInsight cluster and Storage Account with End-to-End environment.
Created Pipelines in Azure Data Factory using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool & backwards.
Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity to load data from on prem to AZURE cloud storage and databases.
Analyzed large and critical datasets using Cloudera, HDFS, MapReduce, Hive, Hive UDF, Pig, Sqoop and Spark.
Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Extensively coded applications in Scala for better performance and code optimization wrote UDF s.
Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates.
Wrote live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
Performed full loading of data from AWS S3 to Azure Data Lake and SQL Server using Azure Data Factory.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Created Resource Groups that contains (Tags, Key Vault, Automation Acct, RBAC assignment) using PowerShell/Arm Template/DevOps-pipeline.
Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
Performed data cleansing operations and applied Spark transformations using Azure DataBricks.
Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Kafka.
Loaded data from web servers using Flume and Spark Streaming API. Used flume sink to write directly to indexers deployed on cluster, allowing indexing during ingestion.
Scheduled Airflow DAGs to run multiple Hive and Pig jobs, which independently run with time and data availability.
Worked on importing and exporting data from snowflake, Oracle, and MySQL DB into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
Enhanced and optimized product Spark code to aggregate, group & run data mining tasks using the Spark framework.
Developed Power BI reports & effective dashboards after gathering and translating end-user requirements.
Monitor system life cycle deliverables and activities to ensure that procedures and methodologies are followed, and that appropriate complete documentation is captured.

Environment: Azure (HDInsight, DataBricks, DataLake, Blob Storage, DataFactory, SQL DB, SQL DWH, AD), AWS, Scala, Python, Hadoop 2.x (HDFS, MapReduce, Yarn), AWS, Spark v2.0.2, Airflow v1.8.2, Hive v2.0.1, Sqoop v1.4.6, HBase, Oozie, CosmosDB, Cassandra, MySQL, MongoDB, Ambari, Flume, PowerBI, Azure DevOps, Ranger, Git.

Ola Cabs, Bangalore, India March 2015 Oct 2016
Business Analyst

Description: Ola is India s largest mobility platform and one of the world s largest ride-hailing companies, serving 250+ cities across India, Australia, New Zealand, and the UK. The Ola app offers mobility solutions by connecting customers to drivers and a wide range of vehicles across bikes, auto-rickshaws, metered taxis, and cabs, enabling convenience and transparency for hundreds of millions of consumers and over 1.5 million driver-partners.

Responsibilities:
Worked on customer data related issues and worked on project resolution in collaboration with development teams
Created complex SQL queries and scripts to extract, aggregate and validate data from MS SQL, Oracle, and flat files using Informatica and loaded into a single data warehouse repository
Wrote SQL queries using joins, nested sub-queries, grouping, and aggregation depending on data needed from various relational databases.
Developed Stored Procedures in SQL Server to consolidate everyday DML transactions such as insert, update, and delete from the database.
Used SQL Server and MS Excel daily to manipulate the data for business intelligence reporting needs.
Developed the stored procedures as required, and user-defined functions and triggers as needed using T-SQL.
Involved in writing python scripts to extract data from different API s.
Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
With Pivot tables, V-lookups, and Macros in Excel, developed ad-hoc reports and recommended solutions to improve business decision making.
Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data-driven decisions for business users.
Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
Collected, analyze, and interpret complex data for reporting and performance trend analysis

Environment: Python v3.x/2.x, MS SQL SERVER, T-SQL, SQL Server Management Studio, Oracle, Excel.

EDUCATION:

Bachelor of Computer Science Engineering.
Keywords: artificial intelligence machine learning user interface business intelligence sthree database active directory rlang information technology hewlett packard microsoft Florida Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1718
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: