Home

Rakshita Tarigopala - Data Engineer
[email protected]
Location: Naperville, Illinois, USA
Relocation: Yes
Visa: GC
Name: Rakshita T
Email: [email protected]


Sr. Data Engineer

PROFESSIONAL SUMMARYJSON

IT Professional having more than 9+ years of experience with strong background in end-to-end enterprise Data Warehousing and Big Data Projects.
Experience in installation, configuration, management, supporting, and monitoring Hadoop clusters using various distributions such as Apache Hadoop, Cloudera Hortonworks, and various cloud services like AWS, GCP.
Experience in Installation and Configuring Hadoop Stack elements MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Oozie, and Zookeeper.
Proficient with complex workflow orchestration tools namely Oozie, Airflow, Data pipelines and Azure Data Factory, CloudFormation & Terraforms.
Implemented Data warehouse solution consisting of ETLS, On-premise to Cloud Migration and good expertise building and deploying batch and streaming data pipelines on cloud environments.
Worked on Airflow 1.8(Python2) and Airflow 1.9(Python3) for orchestration and familiar with building custom Airflow operators and orchestration of workflows with dependencies involving multi-clouds.
Orchestration experience using Azure Data Factory, Airflow 1.8 and Airflow 1.10 on multiple cloud platforms and able to understand the process of leveraging the Airflow Operators.
Developed and Deployed various Lambda functions in AWS with in-built AWS Lambda Libraries and deployed Lambda Functions in Scala with custom Libraries.
Experience in Data Extraction, Transformation and Loading of data from multiple data sources into target databases, using Azure Databricks, Azure SQL, PostgreSQL, SQL Server, Oracle.
Experienced in databricksfor processing and transforming bulk of data using databricks machine learning, managing entire machine learning cycle using MLflow.
Experience new features implemented by Azure to reproduce and troubleshoot Azure end-user issues and provide solutions to mitigate the issue.
I have extensive experience with Azure cloud technologies like Azure Data Lake Storage, Azure Data Factory, Azure SQL, Azure Data Warehouse, Azure Synapse Analytical, Azure Analytical Services, Azure HDInsight, and Databricks.
Developed an automated process in Azure cloud which can ingest data daily from web service and load into Azure SQL DB.
Implemented CI/CD solution using Git to configure big data systems on Amazon Web Services cloud.
Developed Streaming pipelines using Azure Event Hubs and Stream Analytics to analyze data for dealer efficiency and open table counts for data coming in from IOT enabled poker and other pit tables.
Analyzed data where it lives by Mounting Azure Data Lake and Blob to Databricks.
Used Logic App to take decisional actions based on the workflow.
Developed custom alerts using Azure Data Factory, SQLDB and Logic App.
Developed Databricks ETL pipelines using notebooks, Spark Data frames, SPARK SQL, and python scripting.
Knowledge in automated deployments leveraging Azure Resource Manager Templates, DevOps, and Git repository for Automation and usage of Continuous Integration (CI/CD).
Experienced in data processing and analysis using Spark, HiveQL, and SQL.
Extensive experience in Writing User-Defined Functions (UDFs) in Hive and Spark.
Worked on Apache Sqoop to perform importing and exporting data from HDFS to RDBMS/NoSQL DBs and vice-versa.
Working experience on NoSQL databases like HBase, Azure, MongoDB, and Cassandra with functionality and implementation.
Worked extensively over semi-structured data (fixed length & delimited files) for data sanitation, report generation, and standardization. Design and Develop ETL jobs using DataStage tool to load data warehouse and Data Mart.
Designed and developed Security Framework to provide access to objects in AWS S3 using AWS Lambda, and DynamoDB.
Worked with business owners to identify key data elements that will be monitored on an ongoing basis to ensure a high level of data quality in the data warehouse.
Designed and created Data quality baseline flow diagrams which includes error handling and test plan flow data.
Hands on experience on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.
writing complex SQL queries to analyze data and communicating data quality issues to business.
Performed Data analysis, created GAP analysis document and performed data quality checks.
Designed a resilient framework using AWS which involves read/streaming data from producers/source through Apache Kafka process to ingest data to S3 and in house validation tool is integrated to take care of data quality checks.
Experience on Aws and Kubernetes based container deployment to create self-environments for dev teams and containerization of env s delivery for releases. Container management done using docker by writing docker files and setup the automated build on docker hub and installer and configured Kubernetes.
Extensive experience working with AWS Cloud services and AWS SDKs to work with services like AWS API Gateway, Lambda, S3, IAM, and EC2.
Also worked on a POC to check the compatibility issues before migrating a few services to GCP.
Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
Experience with Apache Beam for building data processing pipelines and working with GCP Dataflow for stream and batch processing.
Experience with GCP Data Catalog for metadata management and ensuring data governance and data lineage.
Skilled in building and optimizing ETL/ELT processes to extract, transform, and load data into GCP data storage and processing systems.
Using g-cloud function with Python to load Data into Big query for on arrival csv files in GCS bucket.
Write a program to download a SQL Dump from their equipment maintenance site and then load it in a GCS bucket.
Excellent understanding of Zookeeper for monitoring and managing Hadoop jobs.


TECHNICAL SKILLS

Big Data Technologies Hadoop, MapReduce, Spark, HDFS, Sqoop, YARN, Oozie, Hive,
Impala, Zookeeper, Apache Flume, Apache Air ow, Cloudera, HBase
Programming Languages Python, PL/SQL, SQL, Scala, C, C#, C++, T-SQL, Power Shell Scripting,
JavaScript
Cloud Services Azure Data Lake Storage Gen 2, Azure Data Factory, Blob storage, Azure SQL DB, Databricks, Azure Event Hubs, AWS RDS, Amazon SQS, Amazon S3, AWS EMR, Lambda, AWS SNS, GCP Big Query, Dataproc, Data Flow, Cloud Composure, Cloud Functions.
Databases MySQL, SQL Server, Oracle, MS Access, Teradata, and Snow ake
NoSQL Data Bases MongoDB, Cassandra DB, HBase
Development Strategies Agile, Lean Agile, Pair Programming, Waterfall, and Test-Driven
Development.
Visualization & ETL tools Tableau, Informatica, Talend, SSIS, and SSRS
Version Control & Containerization tools Jenkins, Git, and SVN
Operating Systems Unix, Linux, Windows, Mac OS
Monitoring tool Apache Air ow, Control-M

PROFESSIONAL EXPERIENCE

Kaiser Permanente, Oakland, CA
Duration: Feb 2022 - present
Role: Azure Data Engineer

Responsibilities:
Design and implement data solutions in Azure for Kaiser Permanente, considering scalability, security, and compliance requirements.
Develop and maintain data pipelines using Azure Data Factory, ensuring efficient and reliable data movement and transformation.
Implement data integration and orchestration workflows using Azure Logic Apps, ensuring seamless data flow across systems.
Design and optimize data storage and retrieval using Azure Blob Storage, Azure Data Lake Storage, or Azure SQL Database.
Developed custom alerts using Azure Data Factory, SQLDB and Logic App.
Developed Databricks ETL pipelines using notebooks, Spark Data frames, SPARK SQL and python scripting.
Developed complex SQL queries using stored procedures, common table expressions (CTEs), and temporary table to support Power BI reports.
Develop and deploy Azure Databricks or HDInsight clusters for big data processing and analytics.
Implement data transformation and manipulation using Azure Data Lake Analytics, Azure SQL Data Warehouse, or Azure Synapse Analytics.
Develop and manage Azure Stream Analytics jobs for real-time data processing and insights.
Implement data governance and security measures, ensuring compliance with healthcare industry regulations (e.g., HIPAA) and Kaiser Permanente's data governance policies.
Implement data quality assurance processes, including data validation, data cleansing, and anomaly detection.
Collaborate with cross-functional teams to gather data requirements and design data models and schemas.
Develop and maintain data pipelines for ingesting and processing electronic health records (EHR) and other healthcare-related data sources.
Implement data encryption and access control mechanisms to protect sensitive data in Azure.
Collaborate with data scientists and analysts to operationalize machine learning models in Azure for predictive analytics and insights.
Develop and deploy Azure Functions or Azure Logic Apps for serverless data processing and automation.
Implement data monitoring and alerting using Azure Monitor, Azure Log Analytics, or Azure Application Insights.
Used Azure Cloud Slack, Azure Data Factory, Azure DevOps, Azure Databricks, MlFlow, Spark, and VS Code.
Optimize data processing performance by tuning and optimizing Azure resources, such as VM sizes, scaling configurations, and parallelization techniques.
Collaborate with data visualization experts to design and implement Power BI dashboards and reports for data visualization and insights.
Develop and maintain Azure DevOps pipelines for CI/CD (continuous integration/continuous deployment) of data solutions.
Implement data archiving and retention strategies using Azure storage tiers and lifecycle management policies.
Collaborate with Azure solution architects to design and implement hybrid cloud architectures for securely connecting on-premises and Azure data environments.
Troubleshoot and resolve data-related issues in Azure environments, collaborating with Azure support and other teams as needed.
Implement disaster recovery and business continuity measures for Azure data solutions, including backup and restore strategies.
Developed python API s using FastAPI and Flask to serve ML Models as a consumable endpoint. Automate ML Models deployments using MLFlow and GitLab CI/CD
Stay updated with the latest Azure data services, features, and best practices, actively researching, and exploring new technologies and trends.
Conduct knowledge sharing sessions and provide mentorship to junior data engineers and team members.
Adhere to Kaiser Permanente's project management methodologies and quality standards, ensuring timely and high-quality delivery of data engineering projects.
Environment: Azure SQL, Azure Storage Explorer, Azure Storage, Azure Blob Storage, Azure Backup, Azure Files, Azure Data Lake Storage, SQL Server Management Studio 2016, Visual Studio 2015, VSTS, Azure Blob, Power BI, PowerShell, C# .Net, SSIS, DataGrid, ETL Extract Transformation and Load, Business Intelligence (BI).


Client: Walmart Inc, Bentonville, AR Duration: Jan 2020 Jan 2022
Sr. Data Engineer

Responsibilities:
Design and implement scalable and efficient data solutions in AWS for Walmart, utilizing services such as AWS Glue, AWS Athena, AWS S3, Redshift, and Databricks.
Develop and maintain data ingestion and transformation pipelines using AWS Glue, ensuring reliable and automated data processing and integration.
Implement data cataloging and metadata management using AWS Glue, facilitating data discovery and governance across Walmart's data ecosystem.
Proficient in implementing and managing data pipelines using Databricks on AWS platform.
Extensive experience in designing and optimizing Spark jobs for big data processing in Databricks.
Extensive experience in designing, implementing, and managing data pipelines on AWS using services such as AWS Glue, AWS Data Pipeline, or AWS Step Functions.
Strong proficiency in SQL and data modeling, with expertise in designing and optimizing data schemas for relational and non-relational databases on AWS, such as Amazon RDS, Amazon Redshift, and Amazon DynamoDB.
Design and optimize data storage and retrieval using AWS S3, ensuring durability, scalability, and cost-effectiveness for Walmart's data assets.
Develop and maintain data lake architectures in AWS S3, organizing and structuring data for easy access, analysis, and reporting.
Develop and optimize ETL processes using AWS Glue, transforming, and cleaning data for downstream analytics and reporting.
Implement data transformation and aggregation using AWS Glue and Databricks, enabling efficient data processing and analytics.
Strong knowledge of data warehousing concepts and experience in building and maintaining data warehouses on AWS using tools like Amazon Redshift.
Developed and optimized data ingestion processes from various sources into AWS data lakes using AWS Glue, AWS Kinesis, or AWS DataSync.
Utilize AWS Athena for interactive and ad-hoc query analysis on data stored in AWS S3, enabling self-service analytics and exploration for Walmart's data users.
Design and implement scalable and performant data warehouse solutions using AWS Redshift, supporting large-scale analytics, and reporting needs.
Develop and optimize data pipelines using AWS Glue, Redshift, and Databricks, ensuring efficient data movement, processing, and integration across systems.
Collaborate with cross-functional teams to gather data requirements, design data models, and implement data solutions aligned with Walmart's business needs.
Implement data security and access controls using AWS IAM, ensuring compliance with regulatory requirements and Walmart's data governance policies.
Optimize query performance in AWS Athena and Redshift by implementing appropriate data partitioning, compression, and indexing strategies.
Collaborate with data scientists and analysts to deploy and operationalize machine learning models in AWS Databricks, enabling advanced analytics and insights for Walmart.
Design and implement scalable and cost-effective data archiving strategies using AWS S3 Glacier or AWS Glacier Deep Archive, ensuring data retention compliance.
Implement data monitoring and alerting using AWS CloudWatch, proactively identifying, and resolving data-related issues to minimize disruptions.
Collaborate with data visualization experts to design and implement interactive dashboards and reports using tools like AWS Quick Sight or Tableau.
Develop and maintain infrastructure-as-code using AWS CloudFormation or AWS CDK, enabling version-controlled and automated provisioning of data engineering resources.
Implement disaster recovery and business continuity strategies for AWS data solutions, including data replication, backup, and recovery mechanisms.
Stay updated with the latest AWS data services, features, and best practices, actively researching, and exploring new technologies and trends in the data engineering space.
Conduct performance tuning and optimization of AWS data solutions, employing techniques such as query optimization, resource scaling, and data caching.
Proficient in implementing and managing data pipelines using Databricks on AWS platform.
Extensive experience in designing and optimizing Spark jobs for big data processing in Databricks.
Troubleshoot and resolve data-related issues in AWS environments, collaborating with AWS support and other teams as needed.
Developed REST API endpoint for Azure MLFlow Model Serving
Conduct knowledge sharing sessions and provide mentorship to junior data engineers and team members, fostering a culture of continuous learning and growth.
Adhere to Walmart's project management methodologies and quality standards, ensuring timely and high-quality delivery of data engineering projects.
Collaborate with AWS solution architects to design and implement secure, scalable, and cost-efficient cloud architectures tailored to Walmart's data needs.
Environment: AWS (EC2, S3, Redshift, Lambda, AWS Glue, EMR, EBS, ELB, RDS, SNS, SQS, VPC, IAM Cloud formation, CloudWatch, ELK Stack), Ansible, Python, Shell Scripting, PowerShell, GIT, Jira, Docker, Unix/Linux, DynamoDB, Kinesis, Code Deploy, Code Pipeline, Code Build, Code Commit, Splunk.

Citi Bank, Charlotte, NC Duration: Aug 2018 Dec 2019 Hadoop Developer

Responsibilities:
Developed and maintained data pipelines using Hadoop ecosystem technologies, such as HDFS, MapReduce, Hive, and Spark, to enable efficient data processing and analysis.
Implemented data ingestion processes to acquire, validate, and transform structured and unstructured data from various sources into the Hadoop cluster.
Designed and optimized data storage and retrieval using HDFS and Hive, ensuring high performance and scalability for large-scale data sets.
Developed Hive and Spark SQL queries to extract and transform data for reporting, analytics, and machine learning initiatives.
Utilized Spark for distributed data processing, implementing complex transformations, aggregations, and analytics on large-scale datasets.
Implemented data partitioning and bucketing techniques in Hive and Spark to optimize query performance and improve data processing efficiency.
Designed and developed data models and schemas in Hive for structured data storage, enabling efficient querying and analysis.
Developed and maintained ETL workflows using Apache Airflow or other workflow management tools, orchestrating data processing tasks across the Hadoop ecosystem.
Utilized AWS services such as EMR (Elastic MapReduce), S3 (Simple Storage Service), and Glue to leverage cloud-based data processing and storage capabilities.
Implemented data integration and synchronization between on-premises Hadoop clusters and AWS cloud environments, ensuring seamless data movement and accessibility.
Developed and deployed Spark-based machine learning models using libraries like MLlib or TensorFlow, enabling advanced analytics and predictive insights.
Collaborated with data scientists to design and implement data preprocessing and feature engineering pipelines using Spark, facilitating machine learning model development.
Implemented data quality checks and validation processes within the data pipelines to ensure data accuracy, consistency, and integrity.
Conducted performance tuning and optimization of Hadoop and Spark jobs, leveraging techniques like data compression, partition pruning, and memory management.
Designed and implemented data archiving and backup strategies using Hadoop and AWS storage services, ensuring data retention and disaster recovery capabilities.
Collaborated with cross-functional teams to understand data requirements and design scalable data solutions that align with Citi Bank's business objectives.
Implemented data security measures, including encryption, access controls, and auditing, to protect sensitive data assets and comply with regulatory requirements.
Conducted data profiling and data lineage analysis using tools like Apache Atlas or AWS Glue, ensuring data governance and traceability.
Collaborated with business stakeholders to identify and address data engineering challenges and provide data-driven solutions that enhance operational efficiency and decision-making processes.
Actively stayed updated with the latest advancements in Hadoop, Hive, Spark, and AWS technologies, exploring new tools, frameworks, and best practices to enhance data engineering capabilities.
Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Hive, Spark, AWS.


Smart Edge Solutions, Hyderabad, India Duration: Apr 2017 Jun 2018
Talend / ETL Developer

Responsibilities:

Worked on SSAS in creating data sources, data source views, named queries, calculated columns, cubes, dimensions, roles and deploying of analysis services projects.
SSAS Cube Analysis using MS-Excel and PowerPivot.
Implemented SQL Server Analysis Services (SSAS) OLAP Cubes with Dimensional Data Modeling Star
and Snowflakes Schema.
Developed standards for ETL framework for the ease of reusing similar logic across the board.
Analyze requirements, create design and deliver documented solutions that adhere to prescribed Agile development methodology and tools.
Responsible for creating fact, lookup, dimension, staging tables and other database objects like views, stored procedure, function, indexes, and constraints.
Monitoring the Data Quality, generating weekly/monthly/yearly statistics reports on production processes - success / failure rates for causal analysis as maintenance part and enhancing exiting production ETL Process
Developed complex Talend ETL jobs to migrate the data from at files to database.
Implemented custom error handling in Talend jobs and worked on di erent methods of logging.
Followed the organization defined Naming conventions for naming the Flat file structure, Talend Jobs and daily batches for executing the Talend Jobs
Exposure of ETL methodology for supporting Data Extraction, Transformation and Loading process in a corporate-wide ETL solution using Talend Open Source for Data Integration
worked on real time Big Data Integration projects leveraging Talend Data integration components.
Analyzed and performed data integration using Talend open integration suite.
Wrote complex SQL queries to inject data from various sources and integrated it with Talend.
Worked on Talend Administration Console (TAC) for scheduling jobs and adding users.
Worked on Context variables and defined contexts for database connections, file paths for easily migrating to di erent environments in a project.
Developed mappings to extract data from di erent sources like DB2, XML files are loaded into Data Mart.
Created complex mappings by using di erent transformations like Filter, Router, lookups, Stored procedure, Joiner, Update Strategy, Expressions and Aggregator transformations to pipeline data to Data Mart.
Involved in designing Logical/Physical Data Models, reverse engineering for the entire subject across the schema.
Scheduling and Automation of ETL processes with scheduling tools in Autosys and TAC.
Scheduled the work ows using Shell script.
Used Talend most used components (tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput & tHashOutput and many more)

Environment: Talend 5.x,5.6, XML files, DB2, Oracle 11g, SQL server 2008, SQL, MS Excel, MS Access, UNIX Shell Scripts, TOAD, Autosys.

SEANERGY DIGITAL INDIA PVT LTD, Hyderabad, India
Duration: Jun 2014 Mar 2017 Python Developer

Responsibilities:

Used Test driven approach for developing the application and implemented the unit tests using Python Unit test framework.
Migrated successfully the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.
Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access.
Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.
Performed API testing by utilizing POSTMAN tool for various request methods such as GET, POST, PUT, and DELETE on each URL to check responses and error handling.
Created Python and Bash tools to increase efficiency of retail management application system and operations; data conversion scripts, AMQP/Go MQ, REST, JSON, and CRUD scripts for API Integration.
Performed debugging and troubleshooting web applications using Git as a version-controlling tool to collaborate and coordinate with the team members.
Developed and executed various MySQL database queries from python using python -MySQL connector and MySQL database package.
Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using SQL Alchemy and PostgreSQL.
Created a web application using Python scripting for data processing, MySQL for the database, and HTML CSS, jQuery, and High Charts for data visualization of the served pages.
Generated property list for every application dynamically using Python modules like math, glob, random, itertools, functools, NumPy, matplotlib, seaborn and pandas.
Added the navigations and paginations and filtering columns and adding and removing the desired columns for view utilizing Python based GUI components.

Environment: SQLite, MySQL, PostgreSQL, Python, Git, CRUD, POSTMAN, RESTful web service, SOAP, HTML, Git, CSS, jQuery, Django 1.4.

EDUCATIONAL DETAILS:
Bachelors in computer science and Engineering from Amrita Sai Institute of Science and Technology, 2014
Keywords: cprogramm cplusplus csharp continuous integration continuous deployment machine learning message queue business intelligence sthree database active directory information technology golang microsoft procedural language Arkansas California North Carolina

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1301
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: