Deepthi - Sr. Data Engineer / Python Developer |
[email protected] |
Location: Houston, Texas, USA |
Relocation: Yes |
Visa: GC-EAD |
Deepthi N
Sr. Data Engineer/Python Developer [email protected] 903-246-9273 Professional Summary: Highly skilled Data Engineer/Python Developer with 9+ years of experience in the IT Industry in areas of Data Analysis, Statistical Analysis, Machine Learning, Deep Learning, Data mining with large data sets of structured and unstructured data source and Big Data. Results-driven Python Data Engineer with extensive experience in designing, developing, and optimizing data processing solutions. Experience in Cloud (Azure, AWS, GCP), DevOps, Configuration management, Infrastructure automation, Continuous Integration and Delivery (CI/CD). Hands on experience on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL. Strong experience using Spark RDD API, Spark Data frame/Dataset API, Advanced SQL, Spark - SQL, ANSI SQL, SQL Database Tuning and Spark ML frameworks for building end to end data pipelines. Building ETL data pipeline on Hadoop/Teradata using Hadoop/Pig/Hive/UDFs. Created and developed custom Angular Components, Directives, Services, Pipes. Experience in writing distributed Scala code for efficient big data processing. Strong expertise in development of web-based applications using Python, MongoDB, PyCharm, Flask, GIT, Dojo, pyramid, XML, CSS, DHTML, JavaScript, JSON and jQuery, PostgreSQL, J2EE. Experience in complete Software Development Life Cycle including Analysis, Design, Development, Testing, and Implementation using Python, Django, and Flask technologies. Develop complex SQL queries, stored procedures and SSIS packages. Good experience in developing web applications implementing Model View Control (MVC) architecture using Django, Flask, Pyramid and Python web application frameworks. Experienced in developing API services in Python/Tornado, Node.JS while leveraging AMQP and RabbitMQ for distributed architectures. Experience in bi - directional data pipelines from HDFS to Relational Database with Sqoop. Created and wrote shell scripts (Bash), Ruby, Python and PowerShell for automating tasks. Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP. Migrate data into RV Data Pipeline using DataBricks, Spark SQL and Scala. Experience in working with relational databases (Teradata, Oracle) with advanced SQL programming skills. Worked on advanced SQL coding writing complex stored procedures, functions, CTE Triggers, Views and dynamic SQL for interactive reports. Good experience in automating end to end data pipelines using Oozie workflow orchestrator. Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups (ASG), EBS, Snowflake, IAM, CloudFormation, Route 53, CloudWatch, CloudFront, CloudTrail. Optimization and performance tuning of Complex and advanced SQL Scripts. Good understanding in Dynamic SQL queries performance Optimization. Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system. Strong experience using and integrating various AWS cloud services like S3, EMR, Glue Metastore, Athena, Redshift into the data pipelines. Experience in Angular network performance concepts like Lazy Loading of resources, AoT compilation, compression and caching. Experience in writing UNIX Shell Scripts and automation of the ETL processes using UNIX Shell Scripting and SQL. Experienced in integration of various relational and non-relational sources such as DB2, Oracle, SQL Server, NoSQL-MongoDB, XML and Flat Files, to Netezza database. Technical Summary: Programming Languages Python, SQL, Java, JavaScript, PHP, C, MATLAB, C++, Swift, Mathematica Machine Learning Classification, Regression, Clustering, NLP, Deep Learning Statistical Methods Regression Models, Hypothesis Testing, PCA, Python Packages Scikit-learn, NumPy, Pandas, Django, Flask, Matplotlib, PySpark, NLTK, TensorFlow, PyTorch, MATLAB Big Data Technologies Hadoop, PySpark, MapReduce, Apache Kafka, Spark, Pig, Hive, YARN, Sqoop Databases MySQL, SQL Server, MongoDB, HBase, Teradata, Redis Cloud technologies Azure, AWS (EC2, Redshift, RDS, S3), Google Cloud Platform EMR, Step Functions, Big Query, Redshift, CloudWatch, S3, DynamoDB, EC2, S3, Glue, Lambda, Dataproc, AWS Batch, AWS Quick sight. Data Visualization & BI Tools Tableau, GCP, Power BI, SSIS, SSRS, ELT, ETL Robotic Control Package /Simulation ROS, Multisim, Gazebo Framework Spring Boot, Restful API, Microservices Version Control Git, Internal CICD, Airflow, GitHub, Gitlab, SVN, Bitbucket. Streaming & Message broker Kafka, Flink, Kinesis, Cloud Pub/Sub. ETL Tools AWS Glue, Apache MiFi, Dataflow, Data stage. Educational Qualification: Masters in Computer Science. Professional Experience: Humana - Louisville, KY Dec 2021 Till Date Sr. Data Engineer/Python Responsibilities: Design and develop ETL processes in AWS Glue to migrate payments data from external sources like S3, Parquet/Text files into AWS Redshift. Implemented DBT (Data Build Tool) in an AWS environment to streamline and automate data transformation processes, enabling efficient and scalable data pipeline management for analytics and reporting. Implemented user interface guidelines and standards throughout the development and maintenance of the website using the HTML, pyramid, flask, Docker, ajax, CSS, Cassandra, JavaScript. Developed and debugged Single Page Applications (SPA) using AngularJS, Angular 6,7,8, ES6 and Redux. Experienced with both HUE UI for accessing HDFS files and data. Wrote complex SQL Queries for implementing the business logic. Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators. Used AWS Simple workflows for automating and scheduling our data pipelines. Implemented end-to-end data ingestion, enrichment, and accessibility solution on AWS, utilizing services such as AWS Glue, AWS Lambda, Amazon Redshift. Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers. Created Pipeline to extract data from on premises source systems to azure cloud data lake storage; Extensively worked on copy activities and implemented the copy behavior such as flatten hierarchy, preserve hierarchy and Merge hierarchy; Implemented Error Handling concept through copy activity. Design, create, revise and manage reports generated from operational and analytical systems using SSRS, Tableau, Power BI, and Crystal Reports. Developed the data pipeline using python, pandas, oracle, SQLite, and tornado, which collects the data from Oracle database, performs quantitative data analysis and statistical analysis using Pandas, stores the computed results in SQLite database and uses Tornado and Gramex to render data visualizations. Created entire application using Python, Django, ajax, flask, MySQL, PostgreSQL, MongoDB, pyramid and Linux. Created Several XML Schemas and Developed Stored PL/SQL Procedures and Packages to automatically create and drop table indexes. Generated Python Django forms to record data of online users and used PyTest for writing test cases. Built scalable and robust data pipelines for Business Partners Analytical Platform to automate their reporting dashboard using SparkSQL and Pyspark, and also scheduled the pipelines. Wrote build/integration/installation scripts in python and bash as needed. Assisted with development of web applications Flask, Pyramid, Django, and Plone. Worked on Data modeling, Advanced SQL with Columnar Databases using AWS. Convert legacy reports from SAS, Looker, Access, Excel and SSRS into Azure Power BI and Tableau. Setup the incremental to merge separate blob partition files using delta within Azure Databricks notebook. Wrote BTEQ scripts to load the 10 staging tables needed to process the ETL load into Vertex. Integrated Pig with Hadoop ecosystem tools like Hive, HBase, Presto and Spark to enable advanced analytics and data modeling. Used Databricks for encrypting data using server-side encryption. Working with CI/CD tools such as Jenkins and version control tools Git, Bitbucket. Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators both old and newer operators. Performed information purging and applied changes utilizing Databricks and Spark information analysis. Did performance tuning and optimization of Databricks notebooks / clusters by tweaking python code. Worked in container-based technologies like Docker, Kubernetes and OpenShift. Developed RESTful API'S using Python with flask and Django framework and done the integration of various data sources including Java, JDBC, RDBMS, Unix & Shell Scripting, Spreadsheets, and Text files. Loaded data into Spark Data Frames and used Spark-SQL to explore data insights. Environment: AWS, ETL, YARN, HDFS, Hive, Eclipse, Sqoop, Bash, Spark, Spark-SQL, Spark Streaming, Scala, Linux file system, Python, Django, Pyramid, flask, Linux Shell Scripting, Oozie, Domo, Tableau, NoSQL, MongoDB, Cassandra Volkswagen Credit - Libertyville, IL Sep 2020 - Nov 2021 Sr. Data Engineer/ Django Responsibilities: Involved in the project life cycle including design, development and implementation and verification and validation Extensively utilized Python frameworks like Django, Flask, PyUnit, and libraries like matplotlib. Gathered design requirements from the client, analyzed, and provided solutions and met design requirements. Coded and Managed data ingestion into SOLR using spark jobs to achieve NRT (spark streaming) as well as batch processing (Spark SQL). Used Python, Django and Pyramid frameworks to develop applications and responsible for both back-end programming and front-end functionality using JavaScript, Ajax, Docker, react.JS, and other technologies. Extracted the data from MySQL into HDFS using Sqoop export/import and also handled importing of data from various data sources, performed transformations using Pig and loaded data into HDFS. Experience building distributed high-performance systems using Spark and Scala. Updated and manipulated content and files by using python scripts. Administered Tableau server including creating User Rights Matrix for permissions and roles, monitoring report usage and creating sites for various departments. Implemented business logic using Python and used HTML, pyramid, MongoDB, AWS, jQuery, PostgreSQL, CSS, ajax, flask, node JS, JavaScript and Implemented a Continuous Delivery pipeline with Docker, Jenkins and GitHub. Developed C# scripts to validate the columns of the flat files and created temporary tables using dynamic SQL queries to load data from multiple data files into single table. Wrote installation scripts in python and bash as needed. Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation. Created/ran MAKE files to build the Code using Tornado. Extensively utilized Databricks notebooks for interactive analysis utilizing Spark APIs. Used advanced SQL methods to code, test, debug, and document complex database queries. Used Pandas, OpenCV, NumPy, Seaborn, TensorFlow, Keras, Matplotlib, Sci-kit-learn, NLTK in Python for developing data pipelines and various machine learning algorithms. Used Git/Gitlab for version control, while managing project tasks and issues through Jira. Written complex SQL queries to build KPI tables using Aggregations, Analytical and Window functions. Maintained DevOps pipelines for integrating Salesforce code and manage continuous and manual deployments, for ANT and SFDX deployment, to the lower and higher environments. Developed Data pipelines using python for medical image pre-processing, Training and Testing. Worked on Building data pipelines in airflow in GCP for ETL related jobs using different airflow operators. Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users. Maintain source file, cross reference table and output file location and parameters in YAML file for Kedro pipeline to retrieve and upload data files. Performance Tuning of Stored Procedures and SQL queries using SQL Profiler and Index Tuning Wizard in SSIS. Responsible for creating a Data pipeline flows, scheduling jobs programmatically (DAG) in Airflow workflow engine, and providing support for the scheduled jobs. Use Jupyter Notebook and I Python for testing design concepts and solutions. Use GitHub for version control and creation of repositories. Responsible for updating project status on JIRA. Scheduled and ran Airflow DAG jobs to sync and update cross reference tables in Google Drive to AWS S3. Update stakeholders on the results of the conversion program in daily standup meetings and log status in JIRA. Environment: Python, PyCharm, Hadoop, Django, Pyramid, Flask, Bash, AWS S3, AWS Redshift, Pandas, NumPy, Spark, GitHub, SQL, Jira, Agile and MacOS. KeyBank - Cincinnati, OH Sep 2019 - Aug 2020 Data Engineer/ Flask Responsibilities: Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval. Worked on Data mapping, logical data modeling, created class diagrams and ER diagrams and used SQL queries to filter data within the Oracle database. Created a user review data classification pipeline using Apache Spark and Apache Airflow. Created a data extraction pipeline for migrating user review data from PostgreSQL dB to AWS Redshift. Deployed AWS Lambda functions to ingest user purchase data on a daily basis and stage it in S3. Designed a REST API backend using Django to provide access to marketplace trends dataset, this was used by product management team. Used AWS lambda to run servers without managing them and to trigger to run code by S3 and SNS. Setup storage and data analysis tools in Amazon Web Services (AWS) cloud computing infrastructure. Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification. Enhancement and Modifications according to the Business need of Pyramid reports Performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Databricks. Used Python scripts to update content in the database and manipulate files. Generated Python Django Forms to record data of online users. Knowledge on Git Ops (CI/CD) pipelines and bit exposure in writing and executing CI/CD pipelines. Used Python and Django creating graphics, XML processing, data exchange and business logic implementation Updated and manipulated content and files by using python scripts. Migrated key data pipeline components from MATLAB to Python, deployed using Lambda and used CloudWatch for monitoring, reducing firmware maintenance costs for product. Implemented web applications in Flask and spring frameworks following MVC architecture. Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud. Designed and developed the UI of the website using HTML, XHTML, AJAX, CSS, and JavaScript. Used Django configuration to manage URLs and application parameters. Built various graphs for business decision making using Python matplotlib library. Environment: AWS, Python, API, PySpark, Databricks, Pyramid, Django, Java, Oracle, Snowflake, Teradata, Tableau, Unix/Linux, Oracle/SQL & DB2, Agile, Apache Airflow, EMR, JavaScript. KinderCare Portland, OR May 2016 Aug 2019 Data Engineer/Python Responsibilities: Analyzed pre-existing predictive model developed by advanced analytics team and factors considered during model development. Analyzed metadata and processed data to get better insights of the data. Analyze and prepare data, identify the patterns on dataset by applying historical models. Designed and developed data management system using MySQL. Built application logic using Python. Angular.js is used to build efficient backend for client web application. Used Python to extract information from XML files. Worked on development of SQL and stored procedures on MYSQL. Designed and developed a horizontally scalable APIs using Python Flask. Used supervised machine learning techniques such as Logistics Regression and Decision Tree Classification. Worked on datasets containing large values which are structured and unstructured data. Performed Data cleaning process- Forward filling methods on dataset for handling missing values. Created initial data visualizations in tableau to provide basic insights of data to the project stakeholders. Performed extensive exploratory data analysis using Teradata to improve the quality of the datasets Experienced in various Python libraries like Pandas, One dimensional NumPy and Two dimensional NumPy. Worked for the Analytics team, to update the regular reports and providing solutions. Creating visualizations for the data extracted with the help of Tableau. Identifying patterns and meaningful insights from data by analyzing it. Twisting SQL queries for improving performances. Performing Data Modeling in Tableau Desktop. Environment: MS Access, SQL Server, Data modeling, Python (Pandas, NumPy, Sci-kit learn), PyTorch, Flask, Tableau, Excel Harvard Pilgrim Healthcare - Boston, MA Sep 2014 Apr 2016 Python Developer Responsibilities: Writing Python scripts to parse XML documents as well as JSON based REST Web services and load the data in database. data and for building the pipeline. Experienced in the conversion of unstructured data in the format of CSV, and JSON into structured data in parquet form and saved in AWS s3 buckets. Experienced in the creation of Data zones that included AWS Glue databases and AWS Glue catalog that help in giving easy access to business and Tableau connection setup. Performed SQL optimization for the creation of AWS Views in Glue DB. Writing ORM s for generating complex SQL queries and building reusable code and libraries in Python for future use. Working closely with software developers and debug software and system problems. Profiling Python code for optimization and memory management and implementing multithreading functionality. Involved in creating stored procedures that get the data and help analysts to spot the trends. Designed and developed the server module and resolved issues and responsible for its enhancements. Worked on Django ORM module for signing complex queries. Used WPF to create user interfaces for Windows operating system. Implemented business logic in Python to prevent, detect and claim duplicate payments Rewrite existing Python/Django modules to deliver certain format of data. Used Django Database APIs to access database objects. Developed tools using Python, Shell scripting, XML to automate some of the menial tasks. Developed scalable and effective applications to mane technology tradeoffs. Maintained web servers and platforms in cloud with collaborations with outside vendors. Used GitHub for version control. Performed troubleshooting, fixed, and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team. Implemented automation using its Python API for test case scenario. Environment: Python, Oracle, JSON, XML, Django, API, SQL, REST, AWS. Keywords: cprogramm cplusplus csharp continuous integration continuous deployment machine learning user interface javascript business intelligence sthree database active directory information technology microsoft procedural language Illinois Kentucky Massachusetts Ohio |