Maneesh Gorantala - Sr. Data Engineer |
[email protected] |
Location: Tampa, Florida, USA |
Relocation: Yes |
Visa: GC |
Sr. Data Engineer
Name: Maneesh G Email: [email protected] Phone: +1 (727) 304-2965 _________________________________________________________________________________________________ Professional Summary: Experienced Data Engineer with over 10+ years in the IT industry, specializing in designing and implementing robust data solutions using Azure, AWS and GCP technologies such as Data Lake, Data Factory, Synapse, AWS Data Pipelines, GCP Dataproc. Proficient in developing automated data pipelines and workflows using Azure DevOps, Jenkins, and Ansible, ensuring efficient deployment and operations. Skilled in managing and administering databases with Azure SQL, Cosmos DB, and implementing security measures with Azure AD and Key Vault. Expert in utilizing Azure Service Bus and Event Hub for building scalable and secure event-driven architectures, enhancing system responsiveness and data distribution. Demonstrated expertise in AWS services including EC2, S3, Redshift, and AWS Glue, optimizing cloud solutions for data storage, processing, and analytics. Strong background in configuring and maintaining cloud infrastructure using Terraform, CloudFormation, and managing code repositories with Git and Bitbucket. Developed custom scripts and extensions using Python and Java Script to expand Superset s functionality and integrate with other BI tools. Collaborated with cross-functional teams, contributing to the Agile development process and ensuring timely delivery of analytics solutions. Advanced proficiency in scripting with Python, Shell Scripting, and PowerShell, automating tasks and enhancing system operations. In-depth knowledge of big data technologies such as Spark, Hive, HBase, and Kafka, delivering powerful data processing and analytical solutions. Experience in GCP platform, adept at using BigQuery, Dataflow, Dataproc, and managing data with GCS Bucket and Cloud SQL. Experience as a Snowflake Developer with strong understanding of Snowflake architecture, SQL, and data warehousing concepts with Python scripting exp. Capable of implementing complex data ingestion and transformation processes using Informatica 6.1 and ETL tools, ensuring data quality and consistency. Expert in performance tuning and query optimization using PL/SQL and database management systems like Oracle 9i and Teradata. Developed and maintained analytical solutions with Business Objects and MS Excel, providing critical business insights and reporting capabilities. Understand indexing strategies in Neo4j for efficient querying and Neo4j from various sources such as CSV files, JSON, or relational databases. Developed microservices using Spring Boot, enabling scalable and independent service deployment Utilized Jira for project management and tracking, ensuring alignment with project goals and timely delivery of data engineering solutions. Hands-on experience with Azure VM creation, Azure WebApp, and Function App deployment, facilitating flexible and scalable web services. Experience in Configure and optimize Snowflake accounts, warehouses, and databases. Graph DB support graph query languages like Neo4j which allow to express complex patterns and traverse relationships effectively Strong analytical skills, capable of interpreting complex data sets and integrating multiple data sources using technologies like Sqoop and Cloudera. Education: Bachelor of Business Administration in Sri Satya Sai University, Bhopal, MP 2013. Technical Skills: Azure Cloud Azure Data Lake, Azure Data factory, Azure Synapse, Azure DataBricks, Azure SQL, Azure SQL MI, Azure VM creation, Azure Function App, Azure WebApp, Azure AD, Azure Service Bus, Cosmos DB, Log Analytics, AKS, Event Hub, Service Bus, Key Vault, App Insights, ACR AWS Cloud EC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, CloudFormation, CloudWatch, ELK Stack, Redshift, Dynamo DB, Kinesis, AWS Lamda, AWS Data Pipe Lines, AWS Glue, AWS Redshift, AWS S3, Code Deploy, Code Pipeline, Code Build, Code Commit, Cloudera GCP Cloud Bigquery, Gcs Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Dataproc, Vm Instances, Cloud SQL, MySQL, Postgres, SQL Server, GCP Dataflow, GCP Dataproc, Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data Catalog, GCP Databricks Programming Python, Scala, Shell Scripting, PowerShell, WebLogic, JBOSS, WebSphere, Unix/Linux DevOps & CI/CD Jenkins, Ansible, Azure DevOps, Git, Maven, Bitbucket, Terraform, Splunk, SonarQube, Spring boot Database Management Oracle 9i, Teradata V2R12, Teradata SQL Assistant, PL/SQL Data Processing Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, Project Management Jira, Quality Center Business Intelligence Business Objects XIR2, MS Excel 2007, ETL Tools, Informatica 6.1 Professional Experience Kobie marketing, St Petersburg, FL May 2023 Till Date Sr. Azure Data Engineer Responsibilities: Developed data integration pipelines using Azure Data Factory and Azure Data Lake to aggregate data from various sources, enhancing data availability and quality for analysis. Designed and implemented data storage solutions on Azure Data Lake and Cosmos DB, ensuring optimal data retrieval for business intelligence and reporting purposes. Utilized Azure Synapse to perform complex data transformations and aggregations, significantly speeding up data processing tasks for analytics. Configured and maintained Azure DataBricks clusters for real-time data processing, employing Python scripts for data manipulation and analysis. Built and maintained backend services using Java, ensuring they were robust and capable of handling high traffic. Snowflake integrates seamlessly with popular data engineering tools and platforms, such as Apache Spark, Apache Airflow, and dbt (data build tool). Designed and implemented a secure event-driven architecture using Azure Event Hub, Service Bus, and Azure Function App, improving data flow and system responsiveness. Orchestrated container management using Azure Kubernetes Service (AKS) and Azure Container Registry (ACR) to streamline application deployment and scaling. Developed and monitored data pipelines in Azure Synapse and Azure Data Factory, which facilitated timely and accurate data availability for decision-making. Automated data consistency checks and error logging mechanisms using Azure SQL, enhancing data integrity across distributed databases. Exposure to Graph DB like Neo4j and applying graph algorithms to analyze connected data. Built and deployed scalable web applications using Azure WebApp and integrated them with backend services to provide seamless user experiences. Configured Azure Service Bus and Event Hub for cross-application communication, enabling robust and scalable event handling mechanisms. Developed microservices using Spring Boot, enabling scalable and independent service deployment Leveraged Git and Maven for source code management and project builds, maintaining high standards of code integrity and version control. Implemented continuous integration and deployment pipelines (CI/CD) using Jenkins, Ansible, and Azure DevOps, which enhanced team productivity and code quality. Neo4j conferences and webinars to network with professionals in the field, learn about best practices, and stay updated on the latest developments in graph databases Conducted data migration projects from legacy systems to Azure DataLake and Cosmos DB, ensuring data accuracy and minimal downtime. Developing ETL pipelines in and out data warehouse using combination of Python and Snowflake Snow SQL. Utilized Python for scripting automation tasks and data processing jobs, increasing operational efficiency and reducing manual intervention. Monitored and adjusted Azure VM performance and settings to optimize cost and resource usage across different project environments. Developed security protocols and best practices using Azure AD and Key Vault to enhance data protection and access management. Collaborated with cross-functional teams to align data engineering practices with overall business strategies, ensuring data-driven decision-making processes. Environment: Azure Data Lake, Azure Data factory, Azure Synapse, Azure DataBricks, Jenkins, Ansible, Shell Scripting, Azure AD, Azure Service Bus, Azure SQL, Cosmos DB, Log Analytics, AKS, Neo4j, Event Hub, Snowflake, Service Bus, Key Vault, App Insights, Azure VM creation, ACR, Azure Function App, Azure WebApp, and MI, SSH, YAML, WebLogic, Python, Azure DevOps, Git, Maven, Jira. Blue Cross Blue Shield, Boston, MA Feb 2021 Apr 2022 Sr. Azure Data Engineer Responsibilities: Developing Spark Frameworks with Python, and applied principles of functional programming to process complex structured data sets. Knowledgeable about applying ML techniques like classification, regression, clustering on big data with PySpark, ML lib, Scikit-Learn. Automated CI/CD pipelines for deploying Scala and Python to Azure Databricks using REST API and Azure Pipelines. Using RESTful APIs to integrate HubSpot with other data systems. Developed data pipelines, transformations, and models using Matillion ETL to migrate data to Snowflake data warehouse. Optimized query performance and snowflake architecture. I have managed multiple Azure tenants to organize and isolate resources effectively across different projects and teams. This involves creating and configuring Azure Active Directory (Azure AD) tenants to ensure secure access and management of cloud resources. Analyzing SQL scripts and designed the solution to implement using PySpark Designed technical data solutions and implemented data governance processes for accuracy, reliability and compliance. Deployed ETL workflows with Azure Data Factory (ADF) and SSIS packages to extract, transform and load data from SQL Server databases, excel and file sources into Data Warehouse. Power BI offers automation capabilities through APIs and Power BI Service. Neo4j provides comprehensive resources, including beginner-friendly guides and advanced. Skilled with streaming frameworks like Apache Flink, Kafka Streams for real-time data pipelines. Applied ML techniques like classification, regression, clustering on big data with PySpark, Scikit-Learn. Extracting data from HubSpot into data warehouses or data lakes. Master the GCP services commonly used as BigQuery, Cloud Storage, Cloud Dataflow, Cloud Pub/Sub, Cloud Composer, and Cloud Bigtable. Hands-on experience in implementing Python programming and PySpark within Azure Data Factory (ADF) Exposure to Graph DB like Neo4j and applying graph algorithms to analyze connected data. Proficient at developing logical and physical data models for big data platforms like Hive, Spark, Kafka. Knowledgeable about Scala, Airflow, JIRA, Git flow. Knowledgeable about security concepts like Kerberos, SSL, Sentry for authenticated Hadoop access. Integrated Azure Databricks with Power BI and Tableau for building analytics dashboards and visualizations on processed data. Developing ETL pipelines in and out data warehouse using combination of Python and Snowflake Snow SQL. Developed Spark code using Python and Spark-SQL for faster testing and processing of data. Designed and deployed GCP and Matillion ETL for extracting, transforming and loading large datasets into analytics databases. Improved data quality and reduced ETL runtimes. Extensive knowledge of Snowflake Database, including database schema and table structures. Experience in fact dimensional modeling (STAR, Snowflake), transactional modeling, and SCD (Slowly Changing Dimension). Develop RDD's/Data Frames in Spark Frameworks using and apply several transformation logics to load data from Hadoop Data Lakes. Designed and implemented Lambdas to configure DynamoDB Autoscaling and developed a Data Access Layer to access Azure DynamoDB data efficiently. Environment: Azure Cloud, GCP, Azure Data Factory (ADF v2), Java, PySpark, Neo4j, HubSpot, CI/CD pipelines, ETL, API s, Snowflake, Micro services, Spring boot Data warehouse, Kubernetes, Hadoop, Docker, Azure Data Lake, SQL server, Teradata, Kafka, UNIX Shell Scripting, Databricks, Python, Data Modelling, Cosmos DB. COX Automotive, Austin, TX Feb 2020 - Jan 2021 AWS Data Engineer Responsibilities: Designed and implemented secure and scalable data processing pipelines using AWS Data Pipe Lines and AWS Glue, enhancing data transformation and load processes. Managed and optimized AWS Redshift clusters and S3 buckets for efficient data storage and retrieval, supporting Automotive analytics initiatives. Configured and maintained AWS EC2 instances and EBS volumes to ensure high availability and performance of data applications. Developed automation scripts in Python, Shell Scripting, and PowerShell to streamline deployment processes and reduce manual intervention. Worked on Amazon Web service (AWS) to integrate EMR with Spark 2 and S3 storage and Snowflake. Expertly managed data ingestion processes, implementing cleansing and transformation procedures using AWS Lambda, AWS Glue, and Step Functions to ensure optimal data quality and integrity Integrated Superset with multiple data sources, including MySQL, PostgreSQL, and AWS Redshift, to ensure comprehensive data availability. Designed custom visualizations to meet specific business needs, leveraging JSON configurations and Superset s advanced charting options Orchestrated and automated infrastructure provisioning using Terraform and CloudFormation, aligning for seamless environment management. Administered database solutions on DynamoDB and conducted performance tuning to handle large-scale datasets effectively. Experience in fact dimensional modeling (STAR, Snowflake), transactional modeling, and SCD. Leveraged Spark, Hive, and Spark SQL for complex data processing tasks, driving analytics and decision support. Expertly managed data ingestion processes, implementing cleansing and transformation procedures using AWS Lambda, AWS Glue, and Step Functions to ensure optimal data quality and integrity. Utilized Splunk for log management and operational intelligence, gaining insights into application performance. Implemented secure data transfer mechanisms using AWS S3 and Sqoop, ensuring compliance with Automotive data protection regulations. Managed code repositories and version control using GIT, Bitbucket, and Code Commit, enhancing team collaboration and source code management. Supported data warehousing solutions and conducted ETL processes using AWS Glue and Redshift, optimizing data for reporting and analytics. Automated deployment and integration tasks using AWS Code Deploy, Code Pipeline, and Code Build, reducing deployment cycles and increasing productivity. Working with AWS EMR to run Apache Spark and Hive applications Developed and maintained AWS SNS and SQS for messaging and queuing services, enhancing communication between distributed application components. Implemented continuous integration and continuous deployment (CI/CD) pipelines using Maven, JBOSS, and WebSphere, ensuring smooth and efficient code releases. Configured Spark Streaming and Kafka for handling high-throughput, real-time data feeds, supporting timely data analysis and reporting. Maintained and enhanced Cloudera clusters for big data processing, leveraging the platform s robust processing capabilities to handle complex data sets. Executed disaster recovery plans and data backups using AWS technologies, ensuring data integrity and availability across all systems. Provided technical support and training to team members on AWS services and big data technologies, fostering a knowledgeable and skilled team. Environment: AWS (EC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, Cloud formation, CloudWatch, ELK Stack), Bitbucket, AWS EMR, Ansible, Python, Shell Scripting, PowerShell, GIT, Terraform, Redshift, Maven, Snowflake, Unix/Linux, Dynamo DB, AWS Redshift, AWS S3, AWS Data Pipe Lines, AWS Glue, Code Deploy, Code Pipeline, Code Build, Splunk, Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, AWS EC2, S3, Cloudera FINRA, Rockville, MD Nov 2018 Jan 2020 GCP Data Engineer Responsibilities: Engineered and deployed scalable data processing systems using GCP Dataflow and GCP Dataproc, optimizing data operations for broker dealer systems. Managed and configured BigQuery for data analysis, enabling quick access to insights from large datasets and improving decision-making processes. Automated data transfer processes using Cloud Storage Transfer Service, ensuring efficient data synchronization across multiple storage solutions. Developed and maintained data pipelines using Apache Beam and Cloud Dataflow, facilitating seamless data integration and processing. Operated and monitored VM Instances to support high-performance computing tasks, ensuring robust backend services for financial platforms. Utilized Cloud SQL, MySQL, Postgres, and SQL Server for database management, providing reliable storage and quick data retrieval. Implemented data storage solutions using GCP Bucket, optimizing data accessibility and security for FINRA S global operations. Developing ETL pipelines and data warehouse using combination of Python and Snowflake Snow SQL. Designed and executed ETL processes that integrate with Cloud Composer, automating workflows and enhancing data reliability and quality. Created and maintained data catalogs using Data Catalog, enabling effective data governance and metadata management. Scripted automation and maintenance tasks using Cloud Shell and Gsutil, enhancing system management and operational efficiency. Developed custom functions with G-Cloud Function to handle specific data processing needs, increasing flexibility and scalability. Involved in Migrating Objects from Teradata to Snowflake and created Snow pipe for continuous data load. Leveraged GCP Databricks for collaborative data science operations, enabling advanced analytics and machine learning capabilities. Utilized Python and Scala programming languages to develop robust data processing, adhering to best coding practices Integrated Spark, Hive, and Spark SQL for complex data transformations and batch processing, improving data handling capabilities. Orchestrated data backup and disaster recovery strategies using GCP technologies, ensuring data integrity and availability across services. Conducted performance tuning on GCP services to ensure optimal efficiency and cost-effectiveness in data operations. Developed security protocols and compliance checks within GCP environments, adhering to industry standards and regulations. Provided technical leadership and training to new team members on GCP tools and best practices, fostering a culture of knowledge-sharing. Participated in the strategic planning of data architectures, aligning with FINRA s objectives to enhance customer experience. Monitored and managed GCP environments using integrated monitoring tools, ensuring high performance and minimal downtime. Contributed to innovation in data management and analytics, proposing new tools and technologies that align with emerging trends. Support data migration to GCP, overseeing the seamless transition of data systems from on-premises to cloud-based solutions. Environment: GCP, Bigquery, GCP Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Cloud SQL, MySQL, Postgres SQL, SQL Server, GCP Dataflow, Data migration, Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data Catalog, GCP Databricks, Python, Scala, Spark, Hive, Spark SQL. Green Mountain Power, Colchester, Vermont Nov 2017 Oct 2018 Data Engineer Responsibilities: Utilized Azure cloud platforms (HDInsight, DataBricks, Data Lake, Blob storage, Data Factory, and Data Storage Explorer). Optimized Hadoop algorithms with Spark Context, Spark SQL, Data Frame, Pair RDD, and Spark YARN. Experience analysing Azure data storages with Databricks to gain insights using Spark cluster capabilities. Created ETL pipelines in Spark with Python and workflows in Airflow. Developed data ingestion, aggregation, Data integration, and advanced analytics with Snowflake and Azure Data Factory. Create robust data management frameworks that support accurate and reliable data-driven decision-making. Developed Spark/Scala scripts to extract data from various sources and provide insights and reports as needed. Used Azure Data Factory to orchestrate Databricks data preparation and load them into SQL Data warehouse. Created Spark SQL scripts to process imported data and existing RDDs, utilizing partitioning, caching, and checkpointing. JSON is important for efficiently storing and transmitting data within GCP services. Automated installation and configuration of Scala, Python, and Hadoop, as well as their dependencies, resulting in properly configured files. Configured Hadoop cluster for effective data processing and analysis. Extracted real-time data from Spark data streaming and Kafka, translated it to RDD, processed it into Data Frames, and loaded it into HBase. Maintain comprehensive documentation of the Data Modeling (ER) and database schema. Created Spark jobs with Scala and Spark SQL for faster testing and data processing. Converted unstructured data into structured data using Spark. Developed processes and coordinated Hadoop tasks using Airflow to automate data extraction from warehouses and weblogs. Successfully ingested data from MySQL, MSSQL, and MongoDB into HDFS for analysis with Spark, Hive, and Sqoop. Experience Working with Cosmos DB (mongo API). Optimized PL/SQL queries for faster stored procedure runtime. Converted JSON data to Pandas Data Frame and saved it in Hive tables. Designed dashboards using Power BI by team requirements. Environment: Azure, DataBricks, Data Lake, Blob storage, Data Integration, Azure Data Factory, Data Streaming, Scala, Python, SQL, Hadoop (HDFS, Yarn, MapReduce, Hive, Sqoop), Spark, Kafka, Data Modeling (ER), Zookeeper, Airflow, HBase, Oracle, MySQL, Postgres, Snowflake, Cassandra, MongoDB, Cosmos DB, PowerBI. Ibing Software Solutions Pvt Ltd, Hyderabad, India. Jan 2016 - Sep 2017 Data Engineer / Hadoop Engineer Responsibilities: Experienced with Big Data components such as HDFS, MapReduce, YARN, Hive, HBase, Sqoop, Pig, and Ambari, as well as the Hortonworks distribution. Using the Hadoop ecosystem, primary duties include developing scalable distributed data solutions. Used Sqoop to import data from Oracle, MySQL, and DB2, transformed the data using Hive and MapReduce, and then loaded the data into HDFS. Contributed to the creation of ETL pipelines for analytics using Python and SQL and examined use cases before loading data into HDFS. Azure is a comprehensive cloud platform with various data-related services and tools, including support for Hadoop. Created HBase tables to store information. implemented some analytics-related Hive queries. Developed methods for cleaning data and creating UDFs with Pig scripts, HiveQL, and MapReduce. Modified and enhanced pre-existing Python script modules based on requirements. Participated in planning and carrying out Oozie workflows for pig jobs. Zookeeper was used for distributed coordination across clusters, and Ambari was used for monitoring. Helped create Talend ETL jobs and push data to a data warehouse. Gathered, cleaned, and extracted data from several sources to create dashboards utilizing Tableau and analytical tools. Participates in data inspection, cleansing, transformation, and modeling processes to find relevant information, draw conclusions, and aid in decision-making. Environment: HDFS, MapReduce, YARN, Hive, Python, HBase, Data Modeling (ER), Azure, ETL, Sqoop, XML, JSON, CSV, Oracle, MySQL, SQL, Hive, HiveQL, Zookeeper, Oozie workflows, Tableau, SVN. Miraki Technologies - Hyderabad, India Jul 2013 - Dec 2015 Data Engineer Responsibilities: Utilized MS Excel 2007 to manipulate, analyse, and visualize large datasets, providing actionable insights that supported key business decisions. Developed and maintained complex PL/SQL queries and scripts to extract, transform, and load data, ensuring data accuracy and availability for reporting purposes. Employed Informatica 6.1 for robust ETL processes, enhancing data integration and consistency across multiple databases and systems. Managed and optimized databases using Oracle 9i, significantly improving data retrieval times and system performance. Optimized data queries and managed database performance using Azure SQL and Azure SQL Managed Instance (MI), reducing query times by up to 40%. Utilized Teradata SQL Assistant to write and optimize SQL queries, which helped in achieving better data manipulation and extraction processes. Automated repetitive data processing tasks using PL/SQL scripts, reducing manual effort and increasing process efficiency. Provided training and support to junior analysts on the use of Business Objects, ETL tools, and Informatica, fostering a knowledge-sharing environment and enhancing team productivity. Environment: Quality Center, MS Excel 2007, PL/SQL, Business Objects XIR2, Azure, ETL Tools, Informatica 6.1, Oracle 9i, Teradata V2R12, Teradata SQL Assistant. Keywords: continuous integration continuous deployment machine learning business intelligence sthree database active directory information technology microsoft procedural language Florida Maryland Massachusetts Michigan Texas |