Amulya - Senior Data Engineer |
amulya.ak92@gmail.com |
Location: Dallas, Texas, USA |
Relocation: |
Visa: |
AMULYA KOLIPAKA
amulya.ak92@gmail.com 405-698-0422 https://www.linkedin.com/in/ammulya-koli-99503227b/ 10+ years of experience in Data Engineering, specializing in ETL, data pipelines, and data modeling. Strong expertise in C# for developing scalable data integration solutions, ETL automation, and API development within Azure and AWS ecosystems. Proficient in .NET Core, Visual Studio, and C# libraries for building high-performance data processing applications. Experience integrating C# applications with Azure services (Azure Data Factory, Azure SQL Database, and Azure Functions). Hands-on experience in multi-threading, parallel processing, and performance optimization in C# to handle large-scale datasets efficiently. Expertise in real-time data processing technologies such as Apache Storm, Kafka, Spark Streaming, and Flume to enable real-time analytics and integration with diverse data sources. Worked extensively in developing EL/ELT for supporting Data Extraction, Transformation and Loading using ODI. Experience in Data Warehousing using Oracle Data Integrator(ODI), Configured and setup ODI, Master Repository, work repository, Projects, Models, Sources, Targets, Procedures, Packages, Knowledge Modules, Interfaces, Scenarios, Contexts, etc. TECHNICAL SKILLS: Database Query Languages SQL, Teradata, NoSQL (MongoDB) Programming Languages Python, Java, Scala Big Data - Hadoop Ecosystem Hadoop, Hive, Spark, MapReduce, Pig Big Data - Spark Technologies PySpark , Spark SQL, Spark Streaming ETL Tools Talend, Informatica, SSIS Data Visualization & BI Tableau, Power BI, Looker, QlikView Data Warehouse Snowflake, Redshift, Azure Synapse Non-Relational Databases MongoDB , Cassandra, DynamoDB RDBMS Oracle, MySQL, PostgreSQL , SQL Server Logging Tools Logstash, Fluentd, Splunk, ELK Stack Messaging Tools Apache Kafka, RabbitMQ , AWS SNS/SQS Orchestration Apache Airflow, Kubernetes, Luigi, Docker DevOps Docker, Jenkins, Kubernetes ,Ansible Cloud Real Time Processing Amazon Kinesis , Azure Stream Analytics PROFESSIONAL EXPERIENCE Role: Senior Data Engineer-ETL Client: Truist Financial, Charlotte, NC Aug 2021- Present Responsibilities: Engaged in comprehensive involvement from requirements gathering to deployment, covering system analysis, design, development, testing, and deployment phases. Developed and maintained C# applications for data processing, ETL workflows, and API-based data integrations within the Azure cloud environment. Designed and implemented C#-based data validation and transformation modules in Azure Data Factory and Databricks. Debugged and optimized C# scripts in Visual Studio, ensuring seamless integration with Azure SQL Database and Azure Data Lake. Built and deployed SSIS packages using Visual Studio, integrating with Azure DevOps for version control and automation. Utilized Terraform and CI/CD pipelines (Azure DevOps) to automate C#-based data pipeline deployments. Designed and implemented Apache Cassandra clusters to meet specific application requirements, considering factors such as data modeling, replication strategy, consistency levels, and fault tolerance. Constructed end-to-end Data pipelines employing Azure Data Factory (ADF) services, facilitating the seamless loading of data from On-premises to Azure SQL server for effective Data orchestration. Spearheaded the development of data pipelines utilizing Azure Services such as Data Factory and Databricks Notebooks, ensuring smooth data flow from Legacy systems and SQL servers to Azure Data Warehouse. Devised intricate ETL jobs for visual data transformation using data flows in Azure Databricks and Azure SQL Database. Developed and maintained C# applications for data processing and integration within the Azure cloud environment using Visual Studio as the primary IDE. Played a pivotal role in data movement activities including transformations and control functions like Copy data, Data flow, Get Metadata, Lookup, Stored procedure, and Execute Pipeline. Executed Azure Copy activity and dataflow jobs using Azure Data Factory, ensuring efficient data movement. Employed various Azure Data Factory activities like Lookups, Stored procedures, If condition, For each, Set Variable, Append Variable, Get Metadata, Filter, and Wait. Designed, developed, and deployed SQL Server Integration Services (SSIS) packages within Visual Studio to automate ETL processes and data transformations, improving data pipeline efficiency. Implemented and managed Kafka-based data pipelines in Azure environments, ensuring real-time data processing and integration with Azure Data Lake and Azure SQL Database. Designed and managed Terraform configurations to provision and manage cloud resources across multiple environments, ensuring infrastructure consistency and reducing manual errors. Developed CI/CD pipelines using Git, Maven, and Azure DevOps to automate deployment and testing of data pipelines. Implemented version control and rollback strategies for enhanced reliability. Implemented best practices for Terraform module development, enabling code reuse, standardization, and simplifying infrastructure provisioning across projects. Led successful implementations of Oracle Fusion Analytics Warehouse solutions, aligning them closely with client requirements and business objectives. Designed and developed a machine learning pipeline on GCP, using Cloud AI Platform and AutoML, to predict patient readmission rates and identify high-risk patients, resulting in reduced readmission rates and improved patient care. Designed and implemented scalable data warehouse solutions using technologies like SQL Server, Azure Synapse Analytics, and Snowflake, optimizing data storage and retrieval for large-scale analytical workloads. Worked on pipelines and mechanisms to ingest data from various sources into BigQuery. Developed a disaster recovery plan on GCP, using Cloud Storage and Compute Engine, to ensure business continuity in the event of a system failure or natural disaster, resulting in minimal downtime and no data loss. Provided expert guidance on Oracle Fusion Analytics Warehouse architecture, data modeling strategies, and industry best practices, ensuring optimal system design and performance. Debugged and optimized C# code within Visual Studio to improve performance and ensure seamless integration with Azure services like Azure Data Factory, Azure SQL Database, and Azure Databricks. Collaborated closely with clients to thoroughly understand their business objectives, data sources, and reporting needs, fostering strong client relationships and delivering tailored solutions. Skilled in automating infrastructure provisioning, configuration management, and deployment pipelines using tools like Google Cloud Deployment Manager, Terraform, and Jenkins, thereby streamlining development workflows and increasing efficiency. Integrated Apache Kafka with Azure services such as Azure Event Hubs and Azure Databricks for real-time data streaming and analytics, enabling seamless data movement across multiple systems. Designed and implemented complex data pipelines using Apache Airflow, orchestrating workflows efficiently to streamline data processing and analysis. Developed and maintained ETL (Extract, Transform, Load) pipelines to ingest data from various sources into the data warehouse, ensuring data consistency, quality, and integrity across the system. ntegrated SQL/SSIS projects in Visual Studio with Azure DevOps for version control, ensuring seamless collaboration, code versioning, and traceability of changes within the ETL workflows. Designed and developed custom reports, dashboards, and analytics solutions using Oracle Fusion Analytics Warehouse tools, leveraging advanced features to meet specific client needs and enhance decision-making capabilities. Implemented Performance tuning techniques in Azure Data Factory and Synapse services to enhance processing efficiency. Configured Logic Apps for email notifications to end-users and stakeholders utilizing web services activity. Led the migration of Jenkins pipelines to Azure DevOps (ADO), streamlining CI/CD processes for enhanced deployment efficiency. Utilized Visual Studio for debugging and troubleshooting C# applications that interacted with various Azure services, ensuring data consistency and minimizing errors in data transfer processes. Closely worked with Datastax & Hortonworks Support in resolving Cassandra / Kafka issues or bugs with fixes/patching developed for an existing/new JIRA opened within DSE / Hortonworks. Few of Solution provided for issues listed below Experienced in designing and implementing modular Terraform configurations, enabling code reuse, scalability, and maintainability across projects and environments. Role: Data Engineer- ETL Client: Amway Corp ADA, MI Jan 2020 July 2021 Responsibilities: Designing and implementing pipelines, data flows, and intricate data transformations using Azure Data Factory (ADF) and PySpark with Databricks. Performing data ingestion into Azure Data Lake and Azure Data Warehouse (DW) for data migration, processing the data efficiently in Azure Databricks. Executing complex ETL Azure Data Factory pipelines utilizing mapping data flows with multiple input/output transformations. Developed RESTful APIs in C# for seamless data exchange between Azure Data Factory and SQL Synapse Analytics. Utilized C# Azure SDKs to interact with Azure Cognitive Services and Azure Functions for automated data processing. Integrated C#-based SQL/SSIS projects in Visual Studio with Azure DevOps, improving ETL workflow management and deployment automation. Collaborated with DevOps and development teams to optimize Terraform scripts for cost efficiency, resource utilization, and performance, resulting in significant cost savings and improved application performance. Executed data integration tasks, including ETL processes, to seamlessly populate the analytics warehouse with relevant data, ensuring data accuracy and completeness for robust analysis. Utilized Azure DevOps Repositories for source control of SSIS project code, maintaining a clear version history, branch management, and collaboration with team members. Worked with Azure SDKs and C# libraries to integrate Azure Cognitive Services, Azure Functions, and other cloud-based services into applications developed using Visual Studio. Optimized query performance and reduced data processing time by tuning SQL queries, indexing strategies, and optimizing ETL workflows within the data warehouse environment. Developed comprehensive documentation, training materials, and conducted knowledge transfer sessions for clients and internal teams, facilitating effective system understanding and user proficiency. Knowledgeable about GCP security best practices and compliance standards, adept at implementing identity and access management (IAM) policies, network security controls, and encryption mechanisms to safeguard sensitive data and ensure regulatory compliance. Configured and optimized Airflow components for scalability and performance, enabling seamless handling of large-scale data processing tasks. Created custom logging mechanisms in C# to track data pipeline performance and error handling. Optimized C#-based SQL queries and indexing strategies to improve database performance and query execution speed. Collaborated with cross-functional teams to integrate Airflow into existing infrastructure, enhancing operational efficiency and reducing manual intervention. Implemented monitoring and alerting mechanisms within Airflow to proactively identify and resolve workflow issues, minimizing downtime and ensuring SLA adherence. Maintained up-to-date knowledge of the latest Oracle Fusion Analytics Warehouse features, enhancements, and industry trends, applying them to continually improve solution delivery and client satisfaction. Working with Azure Blob and Data Lake storage, loading data into Azure SQL Synapse Analytics (DW). Employing Azure Key Vault as a centralized repository for maintaining secrets, referencing these secrets in Azure Data Factory and Databricks notebooks. Migrated on-premise data warehouse systems to cloud platforms like Azure Synapse Analytics and AWS Redshift, reducing infrastructure costs and improving system scalability and availability. Familiar with integrating Terraform with other tools and services such as version control systems (e.g., Git), CI/CD pipelines (e.g., Jenkins, GitLab CI), and configuration management tools (e.g., Ansible) to streamline the development and deployment workflow. Automating and scaling applications using Kubernetes best practices. Establishing a common SFTP download/upload framework using Azure Data Factory and Databricks. Creating Databricks ETL pipelines using notebooks, Spark DataFrames, SPARK SQL, and Python scripting. Developed and debugged RESTful APIs in C# for data integration tasks within Azure ecosystems, ensuring that APIs efficiently handled large datasets and scaled according to project needs. Implementing Databricks Job workflows, extracting data from SQL Server, and uploading files to SFTP using PySpark and Python. Automated the build and release pipeline for SQL/SSIS projects in Visual Studio through Azure DevOps, reducing manual deployment effort and ensuring smooth transitions from development to production. Actively engaged in staying updated with the latest GCP technologies and best practices through self-learning, online courses, and community forums, committed to sharing knowledge and collaborating with cross-functional teams to drive innovation and success. Experienced in real-time data processing and streaming using technologies like Apache Kafka and Apache NiFi. Conducted performance analysis and tuning of Cassandra clusters, optimizing configurations, compaction strategies, and read/write operations to improve throughput, latency, and overall system performance. Developing Azure Logic Apps with various triggers. Collaborating with ARM templates for production deployment using Azure DevOps. Monitored and optimized the performance of SSIS packages, leveraging Azure DevOps pipelines to roll out updates and improvements quickly, reducing data processing times. Integrated data from multiple heterogeneous sources (relational databases, APIs, flat files, etc.) into a centralized data warehouse, enabling seamless reporting and analytics across business units. Designed and developed a machine learning pipeline on GCP, using Cloud AI Platform and AutoML, to predict patient readmission rates and identify high-risk patients, resulting in reduced readmission rates and improved patient care. Developed a disaster recovery plan on GCP, using Cloud Storage and Compute Engine, to ensure business continuity in the event of a system failure or natural disaster, resulting in minimal downtime and no data loss. Collaborated with healthcare providers and data scientists to develop custom algorithms and models for analyzing and visualizing healthcare data, using PySpark and TensorFlow on GCP, resulting in improved healthcare outcomes and reduced costs. Mentored and trained junior data engineers on GCP and healthcare data best practices, resulting in increased team productivity and successful completion of multiple healthcare projects. Building the Logical and Physical data model for Snowflake as per required changes. Defining virtual warehouse sizing for Snowflake to accommodate different types of workloads. Implemented major and minor upgrades for Existing Cassandra cluster. Involved in planning, designing and implementing the multi data center DSE Cassandra cluster. Collaborated with Architects, Product Owner's, and Scrum Master on the evaluation of the feasibility, time required to implement requirements. Knowledgeable about Terraform best practices and optimization techniques, including state management, variable usage, and provider configuration, to ensure efficient resource utilization and cost optimization in cloud environments. Led the maintenance and periodic upgrading of data warehouse systems, ensuring seamless transitions with minimal downtime and maintaining high system performance. Experienced in integrating Oracle Fusion applications with other enterprise systems, ensuring seamless data flow and interoperability across the organization's IT landscape. Developed policies and procedures designed to enhance and maintain the integrity of the database environment. Educated programmers about database concepts and efficient access techniques and assists them in analysis and problem resolutions pertaining to database. Configured Performance Tuning Monitoring & Backup for Cassandra Read and Write processes for fast 1/0 operations and low latency. Wrote unit and integration tests for C# code in Visual Studio, ensuring robust validation of data processes and preventing potential failures in Azure-based data systems. Proficient in monitoring the health and performance of GCP resources using tools like Stackdriver Monitoring and Logging, skilled at identifying and resolving issues promptly to minimize downtime and optimize resource utilization. Role: Data Engineer Client: Dell, Round rock, TX Nov 2017 Dec 2019 Responsibilities: Deployed infrastructure leveraging AWS services such as EC2, S3, DynamoDB, Lambda, Elastic File System, RDS, VPC, Direct Connect, Route53, CloudWatch, CloudTrail, and CloudFormation to automate operations effectively. Utilized DAX Queries to generate computed columns in Power BI, enhancing data analysis capabilities. Architected multi-tier applications on AWS CloudFormation with a focus on high availability, fault tolerance, and auto-scaling, leveraging services like EC2, Route53, S3, RDS, DynamoDB, SNS, and SQS. Developed full-stack applications using Python, Django, MySQL, and Linux, along with implementing a fully automated continuous integration system using Git, Gerrit, Jenkins, MySQL, and custom Python and Bash tools. Conducted data ingestion from various sources including RESTful APIs, databases, and CSV files, ensuring seamless data integration. Developed and maintained C# applications for data processing and integration within the AWS ecosystem, utilizing Visual Studio as the primary IDE to ensure efficient handling of data pipelines and services. Migrated data pipelines from Cloudera Hadoop clusters to AWS EMR clusters, enhancing efficiency and scalability. Established seamless integration between Apache Kafka and PySpark Streaming for data consumption, custom operations, and analysis. Handled issues, bugs and provide solutions with root cause analysis and involve Datastax for product support if needed Active co-ordination & follow-up is being done all hardware, network issues by working with AppleNetwork/System engineering teams Technical leadership for resolving complex programming tasks. Several operational tasks are performed along with other tasksto managethe health of allthe clusters are also taken upregularly * Responsible for ensuring quality deliverables within the stipulated timelines Leveraged Power BI and Power Pivot for data analysis prototyping and visualization using Power View and Power Map for report visualization. Built custom Kafka producers and consumers to meet project requirements effectively. Optimized Spark job performance through tuning techniques and caching strategies for PySpark operations. Contributed to migrating on-premises Hadoop systems to the AWS platform. Managed deployments in EKS managed Kubernetes, setting up multi-node clusters, and deploying containerized applications. Successfully migrated existing cron jobs to AWS Lambda, improving scalability. Executed Hadoop streaming processes for processing large text datasets across various file types. Utilized Amazon EMR for efficient and scalable Big Data processing on EC2 and S3. Developed robust data ingestion and transformation services in C# using AWS SDK for .NET within Visual Studio, integrating with services such as Amazon S3, Amazon DynamoDB, and Amazon Redshift. Developed and deployed Django-based RESTful microservices on AWS servers, optimizing information rendering and response times. Managed schema evolution and versioning for data stored in the AWS Glue Data Catalog, ensuring compatibility and consistency across data transformations and downstream analytics applications. Developed server-based web traffic statistical analysis tools using Flask and Pandas. Leveraged AWS SageMaker for end-to-end machine learning model development and deployment. Role: Data Analyst Client: Sutherland Global Services, India June 2015 - Aug 2017 Responsibilities: Collaborated closely with the sourcing team to comprehend the structure of data files and the segmentation of information within them. Orchestrated regular data transfers from Cassandra to Hadoop, executing specific tasks to organize and import the data effectively. Installed and configured Hadoop MapReduce HDFS, and developed multiple Java MapReduce jobs for data cleaning and preprocessing. Established tables in Hive, populated them with data, and formulated Hive queries to trigger background processes that organized data using MapReduce. Managed and reviewed Hadoop log files to ensure system health and performance. Demonstrated a profound understanding of Hadoop's core components including HDFS, application manager, node master, resource manager, name node, data node, and the underlying principles of map-reduce. Implemented a system within MapReduce to sift through data, filtering out invalid or redundant records. Developed efficient Hive queries to extract processed data. Designed Hive tables tailored to specific requirements, utilizing fixed and flexible partitions to optimize performance. Role: Jr. ETL Consultant Client: Avon Technologies Pvt Ltd, India Aug 2013 - May 2015 Responsibilities: Designed, developed, and maintained Business Intelligence (BI) solutions using SQL, encompassing data warehouse and data mart structures. Collaborated closely with business analysts to gather requirements and executed the implementation of data marts. Developed SQL procedures, functions, triggers, and views to support both Online Transaction Processing (OLTP) and Extract, Transform, Load (ETL) applications. Utilized a variety of SSIS control flow tasks and data transformations, including data conversion, derived column, lookup, fuzzy lookup, conditional split, and aggregate transformations, within ETL processes. Implemented event handlers, package configurations, logging, and system/user-defined variables within SSIS packages. Integrated source control, project coordination, and task management into the BI solution. Generated ad-hoc reports using Report Builder and maintained Report Manager for SQL Server Reporting Services (SSRS). Developed optimized queries for high-performance reporting and rendered reports in HTML, XML, PDF, and Excel formats using SSRS. Utilized Oracle Data Integrator (ODI) to develop processes for data extraction, cleansing, transformation, integration, and loading into the data warehouse database. Contributed to the development, enhancement, and maintenance of both internal and external reports/applications supporting various business processes. Developed SSRS reports on SQL Server and possessed a strong understanding of SQL Server Analysis Services (SSAS) and OLAP cube architecture. Employed SSIS for ETL jobs to extract, clean, transform, and load data into the data warehouse. Keywords: csharp continuous integration continuous deployment artificial intelligence business intelligence sthree active directory information technology Colorado Michigan North Carolina Texas |