Home

siddartha - data engineer
siddartha.j@3ktechnologies.com
Location: Lincoln, Missouri, USA
Relocation: St. Louis, Missouri
Visa:
PRITHVI BELLAMKONDA

Professional Summary

Skilled Data Engineer with over 7 years of experience providing IT development and data warehousing leadership in a number of industries. Extensive experience in all aspects of solution and data architecture including client requirement elicitation, case management, implementations, logical and physical database design, project roadmaps, and resource allocations in IaaS and Cloud environments. Continually exceeds expectations by building valuable relationships and works well with people at all levels of an organization including executive management, team members, and clients.
Skills

Enterprise Technology Infrastructure
Data Services & Data Warehousing
Solution & Data Architecture
Azure Databricks
SnowFlake
Azure Datafactory
Application Upgrades & Migrations Cloud Infrastructure services
Logical & Physical Database Design
Task Plan Scheduling & Mapping
Project-Specific Resource Allocation
Google Cloud Architecture & Data Engineering
Excellent Verbal & Written Communication
Information Systems Design & Implementation
Work History
Senior Data Engineer 07/2024 to Current
3k Technologies LLC Client: Optum

Enhanced pipeline robustness by introducing conditional logic and dynamic parameterization for varying data sources and formats.
Automated CSV file ingestion and processing workflows using Python and Spark, improving data migration efficiency and reducing manual effort.
Designed and implemented Azure Data Factory (ADF) pipeline monitoring solutions to track data flow and ensure successful execution.
Configured and managed Azure Key Vault for secure storage of secrets, credentials, and connection strings used in ADF pipelines.
Utilized Databricks notebooks to perform advanced transformations, aggregations, and data cleansing tasks in Spark.
Optimized incremental data loads by implementing watermarking and delta processing to handle high-volume data changes efficiently.
Established data archival policies for aged datasets, ensuring efficient storage management and compliance with retention requirements.

Analyzed and resolved data discrepancies between source (Teradata) and target (Snowflake/Databricks) systems during migration.
Customized Snowflake table structures, clustering keys, and performance tuning for faster querying and reduced compute costs.
Developed Python-based scripts for pre-migration data validation and post-migration verification to ensure 100% data accuracy.
Implemented robust version control for ADF pipelines, maintaining a comprehensive repository of changes and updates.
Orchestrated data workflows across environments by leveraging Azure Integration Runtime and managed identities in ADF.
Conducted impact assessments for schema changes on downstream applications and implemented solutions to mitigate potential disruptions.
Assisted in designing and configuring Azure Blob Storage for staging data during migration, ensuring seamless integration with Databricks.
Collaborated with DevOps teams to deploy ETL pipelines in production using CI/CD pipelines and Azure DevOps tools.
Created and managed partitioned tables in Snowflake to optimize query performance on large datasets.
Troubleshot and resolved latency issues in data pipelines by analyzing execution logs and optimizing Spark cluster configurations.
Provided technical training and knowledge transfer to team members on ADF, Databricks, and Snowflake platforms to ensure seamless project continuity.
Analyzed historical data patterns to design effective ETL strategies for bulk data migration and incremental updates.
Collaborated with business analysts and stakeholders to define data migration requirements and ensure alignment with organizational goals.
Conducted performance benchmarking and load testing on migrated data pipelines to validate system reliability under peak loads.
Managed resource scaling and cost optimization strategies for Azure resources during migration to control project budgets.
Identified and implemented improvements in ETL processes by leveraging Azure-native features and industry best practices.

Senior Data Engineer 01/2021 to 06/2024
University of Missouri St. Louis, MO
Client: College of Nursing

Developed tailored data pipelines to support a variety of projects within the College of Nursing. This involved working with Azure tools like Data Bricks, Data Factory, Synapse Analytics, Azure Functions, Logic Apps, and ADLS Gen2, ensuring that both research and administrative teams could manage and analyze large volumes of healthcare data effectively.
Designed and implemented data transformation processes for key research projects, such as Patient Outcomes in Post- Surgical Care and Chronic Disease Management. By using Azure Data Bricks and Data Factory, I helped streamline data processing and ensured that research teams could access insights faster.
Created data pipelines specifically for nursing clinical studies and simulation training programs. Utilizing tools like Delta Lake, Delta Live Tables, and Data Catalogs, I improved how we handled research data, making it easier for teams to focus on what really matters patient outcomes and student learning.
Led the development of comprehensive data integration pipelines using Azure Data Factory and Data Bricks, enhancing healthcare research outcomes.
Designed workflows with Logic Apps and Function Apps to automate the synchronization of clinical data, reducing manual intervention.

Built secure, cloud-based data lakes using Azure Data Lake Gen2 and Azure Key Vault to store and protect sensitive healthcare information.
Analyzed and processed clinical data using PySpark and Spark SQL, supporting the College s research initiatives on patient
outcomes and treatment efficacy.
Developed SQL-based reporting solutions, providing faculty with detailed insights into student performance and resource allocation.
Managed data visualization projects using Power BI to provide insights into student performance and curriculum effectiveness.
Optimized healthcare data retrieval by partitioning and indexing Azure SQL databases, ensuring faster access to critical information.
Designed real-time streaming solutions to capture live patient data for ongoing research projects, integrating Azure Event Hubs and Stream Analytics.
Developed Python scripts for automating data extraction and transformation processes, significantly reducing manual data handling efforts in research projects.
Facilitated data-driven decision-making through Power BI dashboards, offering leadership insights into resource allocation, student outcomes, and research performance.
Worked on optimizing Spark jobs to handle the large datasets we work with, from student performance data to patient recovery records. This made it possible to process and analyze these datasets faster and more efficiently, supporting projects like Nursing Interventions in Geriatric Care and Mental Health Outcomes for Nurses.
Used Azure Synapse Analytics to integrate advanced analytics into our data workflows, helping faculty and students gain
deeper insights into their research on topics like Telehealth Service Delivery and Nurse Burnout Prevention.
Developed serverless solutions using Azure Function Apps to streamline data processing for educational programs like Virtual Simulation Training for Nursing Students, ensuring the cloud infrastructure stayed cost-effective and responsive to our needs.
Automated several key workflows using Azure Logic Apps, simplifying how data flows between departments for projects like
Predictive Analytics in Nursing Education and making processes more efficient across the board.
Built real-time streaming data pipelines using Azure Event Hub and Databricks Auto Loader. These pipelines are especially useful for live research data, such as real-time patient monitoring for studies in ICU Nursing Care.
Focused on data security by implementing Azure Key Vault to protect sensitive information, such as patient records and confidential research data, ensuring we stayed compliant with healthcare regulations like HIPAA.
Used Pandas Data Frame, Spark Data Frame, and RDDs to manipulate and analyze nursing research data, whether it was
evaluating Student Performance in Clinical Rotations or analyzing Nurse-to-Patient Ratios.
Continuously worked on tuning Spark jobs for better performance, making sure we could process large datasets, like those used
in Emergency Care Workflow Studies, with speed and accuracy.
Maintained data quality by developing processes to validate, clean, and transform the data. This ensured the reliability of the
findings in research studies, such as those focused on Reducing Medication Errors in Hospitals.
Leveraged Big Data technologies like HDFS, YARN, and Spark to help optimize workflows for both research and administrative
purposes, including projects like Faculty Workload Analysis and Student Graduation Outcomes.
Fine-tuned data systems using Hive for improved query performance, particularly when analyzing large datasets from healthcare research projects.
Implemented solutions to sync research data across systems using Apache Sqoop and Azure Data Factory. This helped automate workflows and ensure that researchers always had up-to-date information.
Used Apache Kafka and Spark Streaming to enable real-time data analysis, which proved invaluable for research teams
working on live studies such as Post-Operative Recovery Monitoring.

Designed production-ready data pipelines using Azure Data Factory to support research projects like Data-Driven Nursing
Education Strategies, ensuring smooth deployment and minimal downtime.
Managed NoSQL databases like Azure Cosmos DB for flexible, scalable data solutions, particularly in projects dealing with electronic health records and large-scale data storage.
Developed pipelines in Snowflake to handle large-scale research data analysis. By using SnowSQL, Snowpipe, and other Snowflake services, I made sure that our nursing researchers could access and analyze their data efficiently.
Applied advanced Snowflake features such as Zero Copy Cloning and Time Travel to preserve data integrity for longitudinal studies, helping researchers keep track of changes and access historical data when needed.
Created interactive data visualizations using Power BI to help faculty and students better understand their data, whether it was related to research outcomes or student progress.
Worked with various data formats like ORC, Parquet, and Avro to meet the diverse needs of our nursing research teams and administrative departments.
Set up and managed cloud clusters on Azure VMs for large-scale research data processing, making sure the systems ran smoothly and stayed optimized using Azure Monitor and Log Analytics.
Built Spark applications and used Matillion ETL to handle large volumes of data, streamlining data flows for projects like Student
Success Tracking and Clinical Simulation Evaluation.
Enabled real-time data movement using tools like Spark Structured Streaming, Kafka, and Elasticsearch, helping teams monitor real-time data in ongoing research projects, with Power BI dashboards kept up to date using Azure Functions.
Utilized Azure services like Databricks, Synapse Analytics, and Data Factory to transform and automate data processing for various nursing research and educational projects, ensuring a scalable and robust environment for both faculty and students.


Data Engineer 08/2016 to 07/2019
IBM India Private Limited Client: Telstra

Developed data pipelines that processed huge amounts of telecom data using Azure Data Bricks and Azure Data Factory. This work was crucial for analyzing network performance and understanding customer behavior, making it easier for teams to make informed decisions.
Developed and optimized large-scale data pipelines using Azure Data Factory and Azure Data Bricks, improving the analysis of telecom datasets for real-time decision-making.
Leveraged Spark, Hadoop, and Hive to process and manage high-volume telecom data, enhancing network monitoring and customer usage analysis.
Handled 2,466 dockets with a 99.89% successful resolve rate with 1,378 of them for business-critical incidents.
Streamlined workflows by automating data integration processes with Azure Logic Apps and Function Apps, reducing operational delays.
Built real-time streaming pipelines with Azure Event Hubs and Kafka to monitor network traffic, ensuring proactive issue resolution.
Developed efficient data pipelines using SQL and Python, automating the extraction, transformation, and loading (ETL) of telecom data for better analysis.
Managed data security by implementing encryption strategies using Azure Key Vault, safeguarding sensitive telecom data.
Utilized Spark Structured Streaming and Kafka to process real-time telecom data, improving network monitoring and performance analysis.

Managed data security by implementing encryption strategies using Azure Key Vault, safeguarding sensitive telecom data.
Optimized SQL queries for performance tuning and real-time analytics, ensuring quick access to critical telecom data in Azure SQL Database.
Enhanced data retrieval performance through advanced partitioning and bucketing techniques in Hive, reducing query execution times.
Applied hands-on expertise in AWS services like EC2, S3, Glue, and Lambda to create hybrid cloud solutions for telecom data processing.
Used Eclipse and IntelliJ IDEA to build and debug Spark applications. This hands-on coding experience allowed me to tackle real-world challenges like real-time network monitoring and customer analytics, giving me a chance to creatively solve problems.
Designed workflows in Azure Data Bricks that automated the transformation of telecom datasets. This meant less manual work for the team and improved data quality, which helped us focus more on insights rather than fixing data issues.
Optimized Spark jobs to enhance the efficiency of processing large telecom datasets. By fine-tuning our processes, we were able to quickly analyze customer usage and network performance, leading to faster, data-driven decisions.
Leveraged Azure SQL Data Warehouse to manage and analyze telecom data. This integration allowed us to quickly pull insights on customer trends and network performance, which was vital for improving our services.
Created serverless workflows using Azure Functions and Logic Apps to automate everyday tasks like syncing billing information. This not only streamlined our processes but also saved time and reduced errors.
Built real-time streaming pipelines using Azure Event Hubs and Spark Streaming. This was a game changer for monitoring network traffic live, allowing us to quickly respond to issues and keep our services running smoothly.
Ensured data security with Azure Key Vault, protecting sensitive customer information. This was crucial in maintaining trust with our clients and meeting regulatory requirements.
Developed Spark applications that processed large volumes of telecom data, enabling us to gain insights into customer behavior and network usage. This hands-on experience deepened my understanding of big data technologies and their applications.
Used Hive to optimize data storage and queries, which significantly improved how we retrieved and processed telecom data. Techniques like bucketing and partitioning were essential for handling large volumes of data efficiently.
Worked with Apache Sqoop to synchronize data between our Hadoop environment and Azure SQL. This ensured that our billing and customer data were always up-to-date and reliable.
Created real-time monitoring solutions with Kafka and Spark Streaming. This setup helped us keep an eye on network performance and respond to any issues swiftly, enhancing our service reliability.
Designed data solutions using Snowflake for managing telecom datasets, which simplified our analytics processes. Using SnowSQL and Snowpipe made data ingestion a breeze and allowed us to get insights faster.
Built interactive dashboards in Power BI to visualize key metrics like customer churn and network performance. These visuals helped our team and executives make better decisions based on solid data.
Worked with various data formats like ORC, Parquet, and Avro to ensure we were storing and processing data efficiently. This attention to detail paid off in quicker access to the information we needed.
Managed databases on Azure SQL and Cosmos DB, ensuring they were optimized for performance. This meant our CRM and billing systems could run smoothly, providing a better experience for our customers.
Set up multi-node clusters on Azure VMs, which improved our ability to handle large telecom workloads. Monitoring these systems with Azure Monitor helped us catch issues before they became problems.
Developed Spark applications that tackled key tasks like analyzing CDRs and monitoring network logs. This hands-on experience not only improved our processes but also taught me a lot about big data technologies in a practical setting.

Recognized by COO for exceptional work on a business-critical incident wherein 911 hotline was down early in the morning.

Training, Licenses & Certifications
Google Certified Professional Data Engineer 2022
Learned how to choose appropriate storage systems including relational, NoSQL and analytical databases.
Deployed machine learning models and applied multiple machine learning techniques to different use cases
Monitored data pipelines and machine learning models as well as evaluated the quality of machine learning models
Designed scalable distributed data intensive applications and migrated data warehouses from on-prem to Cloud.
Explored concepts in machine learning, such as backpropagation, feature engineering, and overfitting/underfitting.
Gained understanding of how to ingest data, create processing pipelines in Dataflow, and deploy relational databases.
Designed highly performant Bigtable, BigQuery, and Cloud Spanner databases as well as query Firestore databases.
Explored concepts in backpropagation, stochastic gradient descent, overfitting/underfitting, and feature engineering.
Used Data Engineering services to design, deploy, and monitor data pipelines and advanced database systems.

Google Certified Professional Cloud Architect 2022
Gained a broad understanding of Google Cloud and its services along with the purpose of each Google Service.
Explored the variety of computation options available including compute engine, Cloud Function, and Kubernetes.
Determined how to decide which compute service should be used and various machine learning and AI offerings.
Examined data storage options like Persistent disk, Local Disk, Cloud SQL, Cloud DataStore, and BigQuery.
Learned how to use load balancer in GCP and the various data storing options in Cloud for different use cases.
Captured logs for applications, monitored and debugged applications in the Cloud, and used IAC to create resources.
Explored how to run big data processing pipeline in Google Cloud and how the CI/CD pipeline works in GCP.
Learned what are the various cloud offerings in the Google Cloud platform and how to use those offerings.
Reviewed best practices to store and analyze data and how to migrate existing Bigdata applications in GCP.
Learned how to distribute request traffic among application backends using various load balancer options in GCP.




Microsoft Certified: Azure Data Engineer 5B585E4EC3034756
Microsoft Azure Infrastructure Solutions Certification Number: G365-7914
AWS Certified Solutions Architect Associate Validation Number: FNW3MDE1CNQEQZW7
Big Data Foundations 101 by IBM Certification Validation Number: BD0101EN

Technical Skills
Cloud Platforms & Services
Azure: Azure Data Factory, Azure Data Bricks, Azure Synapse Analytics, Azure Blob Storage, Logic Apps, Function Apps, Azure Data Lake Gen2, Azure SQL Database, Azure Key Vault, Azure DevOps
AWS: EC2, S3, Glue, Lambda Functions
Big Data Technologies
MapReduce, Hive, Tez, HDFS, YARN, PySpark, Hue, Kafka, Spark Streaming, Oozie, Sqoop, Zookeeper, Apache Airflow
Hadoop Distributions
Cloudera, Hortonworks
Programming Languages

SQL, PL/SQL, Hive Query Language, Python, Scala, Java, Azure Machine Learning
Web Technologies
JavaScript, JSP, XML, RESTful Services, SOAP, FTP, SFTP
Operating Systems
Windows (XP/7/8/10), UNIX, Linux (Ubuntu, CentOS)
Build Automation Tools
Apache Ant, Maven, AutoSys, Toad
Version Control Systems
GIT, GitHub
Development Environments & Design Tools
Eclipse, IntelliJ IDEA, Visual Studio, SSIS, Informatica, Erwin, Tableau, Power BI, SAP Business Object
Databases
MS SQL Server (2016/2014/2012), Azure SQL Database, Azure Synapse, Oracle (11g/12c), Cosmos DB, MongoDB, Cassandra, HBase, MS Excel, MS Access
Education
University of Missouri Master's Degree in Computer Science 2020
Developed AI for converting complex mathematical problem descriptions into simple English
Developed a fast genetic algorithm in C++ for sentence prediction and for playing a good tic-tac-toe game


Veltech MultiTech Engineering College bachelor s degree in computer science, Chennai, India 2016
Keywords: cplusplus continuous integration continuous deployment artificial intelligence business intelligence sthree database information technology microsoft procedural language Missouri

To remove this resume please click here or send an email from siddartha.j@3ktechnologies.com to usjobs@nvoids.com with subject as "delete" (without inverted commas)
siddartha.j@3ktechnologies.com;5000
Enter the captcha code and we will send and email at siddartha.j@3ktechnologies.com
with a link to edit / delete this resume
Captcha Image: