Jeevan Tirumalagiri - Senior Data Engineer |
[email protected] |
Location: Charlotte, North Carolina, USA |
Relocation: Yes |
Visa: GC |
Jeevan Kumar Tirumalagiri
Sr. Data Engineer Email: [email protected] Ph: +1(940)268-3691 LinkedIn: http://www.linkedin.com/in/jeevan-kumar-t-6980912b7 PROFESSIONAL SUMMARY Senior Data Engineer with 9 years of experience specializing in developing scalable data pipelines and ETL processes using AWS Glue, Apache Kafka, and Spark, enhancing data flow and storage solutions. Proficient in managing and analyzing large datasets using Amazon S3, SQL, NumPy, and Pandas, driving insightful business decisions. Demonstrated expertise in Spark 1.6/2.0 and Pyspark for complex data processing, contributing to significant improvements in data analysis and processing speed. Versatile in handling data lake architectures and AWS cloud services, ensuring optimized data storage and accessibility for diverse applications. Expert in Python programming, automating data workflows and integrating machine learning models for predictive analytics and data insights. Extensive experience with Cloudera Stack, managing HBase, Hive, Impala, and Pig for robust big data ecosystem support. Proficient in streamlining data flows with NiFi and Spark Streaming, enabling real-time data processing and analytics. Skilled in implementing ELK/Splunk for log management and data visualization, enhancing operational intelligence and data-driven strategies. Advanced knowledge of RESTful API, JSON, XML, and SOAP UI for efficient data integration and web services development. Deep understanding of database management using MySql, Cassandra, and Mongo Db, ensuring data integrity and performance. Experienced in cloud-based data warehousing and analytics using GCP BigQuery, Azure Data Lake, and Snowflake, providing scalable and cost-effective solutions. Proficient in Jenkins, Docker, and Kubernetes for CI/CD pipelines, containerization, and orchestration, enhancing deployment efficiency and scalability. Strong background in data visualization and reporting tools like Tableau, Power BI, and MicroStrategy, translating complex data into actionable insights. Experienced with GCP Dataproc, Dataflow, and Azure Data Factory for cloud-native data processing and integration services. Expert in NoSQL database technologies with extensive experience in designing, implementing, and managing scalable and high-performance MongoDB databases. Proficient in developing robust data models, performing data migration, and optimizing database performance in MongoDB and other NoSQL platforms. Skilled in integrating NoSQL databases with various data processing and analytics tools, enhancing data-driven decision-making and operational efficiency. Expertise in data security and compliance, utilizing Azure Storage, Cloud Spanner, and Cloud SQL for secure data handling and transactions. Proficient in system-level programming with Golang and C, specializing in developing high-performance, scalable backend services and system components for real-time data processing and analytics. Experienced in utilizing distributed computing frameworks like Spark or Flink for building and optimizing large-scale data processing pipelines, enabling advanced data analytics and machine learning applications. Adept in container orchestration with Kubernetes, managing cloud-native applications' deployment, scaling, and operational workflows, enhancing system reliability and deployment efficiency. Skilled in Apache Airflow and Oozie for workflow scheduling, automating and managing data pipelines for improved efficiency and reliability. Advanced user of Terraform and Ansible for infrastructure as code and configuration management, streamlining cloud infrastructure provisioning and maintenance. Adept in leveraging Salesforce for CRM data integration and analytics, enhancing customer engagement and business processes through data-driven insights. Strong analytical and problem-solving skills, with a proven track record of improving database systems to meet the dynamic needs of businesses. Technical Skills: Category Skills Cloud Platforms & Services AWS Glue, AWS cloud, Amazon S3, GCP, GCS, Azure, Azure Data Lake, Azure storage, Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, AWS Neptune, Azure Cosmos DB, Data Processing & Analytics Apache Kafka, Spark, Spark 1.6 / 2.0, Flink, Pyspark, SPARQL, ETL, Data Lake, NiFi, Spark Streaming, GCPs DataProc Big Query, GCP Dataflow, GCP Dataproc, Azure Data Factory, Data Flow, ETL Pipelines Database Management SQL Database, Mongo Db, MySql, Cassandra, Snowflake, SQL, SnowSQL Big Data Technologies Hadoop, Hive, Cloudera Stack, HBase, Impala, Pig, ELK/Splunk, Athena, Redshift DevOps & CI/CD Jenkins, Docker, Ansible, Terraform, Maven, GIT, Kubernetes Programming Languages Python, Scala, C, spark Golang, R, Shell Scripting Data Visualization Tableau, Power Bi, MicroStrategy, Quicksight, MS Office APIs & Web Services RESTFul API, JSON, JAXB, XML, WSDL, Soap UI Machine Learning & Statistics Cross Validation Software & Tools JMeter, ElasticSearch, Logstash, Kibana, Spring, Hibernate, Apache Airflow, Oozie, Web Sphere, Splunk, Tomcat, Linux, Red Hat, Salesforce, Data Bricks, GCP Databricks, Azure Data Bricks PROFESSIONAL EXPERIENCE Sr. Data Engineer Ascena Retail Group, Pataskala, Ohio November 2022 to Present Responsibilities: Developed and maintained ETL processes using AWS Glue and Apache Kafka to ensure efficient data flow and storage across various platforms including Amazon S3 and Data lake architectures. Engineered and optimized SQL queries and Spark scripts to perform complex data analysis, enhancing data retrieval efficiency and supporting data-driven decision-making. Leveraged AWS cloud services to deploy and manage scalable data infrastructure, improving system reliability and performance. Designed and implemented robust data pipelines using Pyspark and Spark 1.6 / 2.0, facilitating the processing of large datasets with high velocity and variety. Utilized Python and NumPy for data manipulation and analysis, enabling the extraction of meaningful insights from structured and unstructured data. Configured and managed Cloudera Stack, including HBase, Hive, Impala, and Pig, to support big data ecosystems and analytics applications. Automated data workflows using Apache Airflow and Oozie, ensuring efficient and error-free data processing cycles. Developed RESTful API services using JAX-RS, Spring, and Hibernate to facilitate seamless data integration and exchange between systems. Implemented data indexing and search solutions using ElasticSearch, Logstash, and Kibana (ELK), enhancing data visibility and accessibility. Managed HDFS systems, ensuring data integrity, scalability, and accessibility in distributed computing environments. Administered MySQL and Cassandra databases, optimizing data storage, retrieval, and management processes. Employed Spark Streaming for real-time data processing, enabling instant data analysis and insights. Designed and maintained Data lakes, centralizing raw data storage and providing a scalable data management solution. Utilized Tableau, MicroStrategy, and Quicksight for data visualization, presenting complex data in an easily understandable format for business stakeholders. Integrated SnowSQL and Snowflake technologies to enhance data warehousing capabilities, supporting scalable and efficient data storage solutions. Used Golang for developing high-performance microservices for real-time data processing and analytics within the company s data platform, enhancing system efficiency and throughput. Orchestrated data pipeline automation and monitoring using Jenkins, ensuring continuous integration and deployment (CI/CD) of data-driven applications. Implemented Athena and Redshift for efficient data querying and analysis in cloud environments, supporting scalable analytics solutions. Developed and enforced data quality frameworks using Spark and ETL processes, ensuring data accuracy and reliability. Configured NiFi flows for efficient data routing, transformation, and system integration, enhancing operational efficiency. Developed and optimized data pipelines for integrating external data sources into graph-based systems, leveraging tools like Apache Kafka and Spark for real-time and batch processing. Designed and implemented ontology-based data models to structure data within graph databases, enhancing semantic querying and data retrieval efficiency. Wrote advanced SPARQL queries for extracting insights and facilitating complex analytical tasks within graph databases. Collaborated with cross-functional teams to translate business requirements into scalable graph-based solutions, improving data connectivity and insights. Employed Scala for application development and data processing, leveraging functional programming paradigms for efficient data handling. Managed XML, JSON, JAXB, and WSDL for data interchange and service-oriented architecture (SOA) implementations, facilitating system interoperability. Utilized ELK/Splunk for data logging and analysis, enhancing system monitoring and operational intelligence. Optimized HBase and Impala configurations for high-performance data querying, supporting real-time analytics and decision-making. Leveraged Apache Kafka for building scalable, fault-tolerant messaging systems, enabling efficient data streaming and processing. Environment: AWS Glue, Apache Kafka, Amazon S3, SQL, Spark, AWS cloud, ETL, NumPy, Spark 1.6 / 2.0 , Pyspark,Data lake, Python, Cloudera Stack, HBase, Hive, Impala, Pig, NiFi, Spark, SnowSQL, Spark Streaming, ElasticSearch, Golang, Logstash, SPARQL, Kibana, JAX-RS, Spring, Hibernate, Apache Airflow, Oozie, RESTFul API, JSON, JAXB, XML, WSDL, MySql, Cassandra,HDFS, ELK/Splunk, Athena, tableau, redshift, scala, snow flake, Jenkins , MicroStrategy, Quicksight. Sr. Data Engineer Truist Bank, Charlotte, NC December 2021 to October 2022 Responsibilities: Developed scalable data pipelines in GCP Dataflow and GCP Dataproc for real-time and batch data processing, ensuring timely and accurate financial reporting. Utilized Pyspark and Hadoop for processing large datasets, improving the efficiency of data analysis tasks. Engineered and maintained SQL Database and MongoDB systems for optimized data storage and retrieval, supporting various banking operations. Developed scalable data pipelines to support real-time processing of digital wallet transactions, leveraging GCP Dataflow and Apache Beam for efficient batch and stream data processing. Designed conceptual, logical, and physical data models to accurately represent digital wallet transactions, customer interactions, and financial data, facilitating seamless integration with existing banking systems. Implemented SAS analytics for advanced financial modeling, contributing to strategic decision-making processes. Configured and managed Teradata systems, enhancing data warehousing capabilities and supporting complex query execution. Implemented CI/CD pipelines for the digital wallet s backend systems, using tools like Jenkins and Spinnaker to automate testing and deployment processes, improving development efficiency and product reliability. Leveraged GCP's BigQuery for fast, economical, and scalable analytics, enabling effective data-driven insights. Utilized Hive and Sqoop for data aggregation and transformation, facilitating seamless data integration across platforms. Developed Python scripts for data manipulation and analysis, automating routine data processing tasks. Managed Snowflake cloud data warehouse, optimizing data storage and computation for financial analytics. Created dynamic visualizations using Power BI, providing actionable insights into financial trends and patterns. Orchestrated data ingestion and processing workflows using Cloud Composer, ensuring smooth and efficient data pipeline operations. Implemented Cloud Pub/Sub for event-driven data integration, enhancing data availability and accessibility. Utilized Cloud Storage Transfer Service for efficient data migration between different cloud storage services. Configured Cloud Spanner and Cloud SQL for highly available and scalable database services, supporting critical banking applications. Employed Data Catalog for metadata management, improving data discoverability and governance. Developed and maintained GCP Databricks environments for collaborative data science and engineering projects. Engineered financial data models using GCS and BigQuery, facilitating advanced data analysis and reporting. Automated data cleansing and quality checks using GCP Dataprep, ensuring high data integrity and reliability. Managed data security and compliance within GCP and Cloud SQL environments, adhering to financial industry regulations. Utilized Data Flow for stream and batch data processing, optimizing financial data analysis and insights. Implemented GCS for secure and scalable cloud storage solutions, ensuring data availability and disaster recovery. Developed data integration solutions using Sqoop and Cloud Dataflow, streamlining data exchange between disparate data sources. Optimized BigQuery and Snowflake performance for financial data analytics, reducing query execution time and costs. Leveraged Kafka and MQ for building scalable, fault-tolerant messaging and streaming platforms, facilitating efficient data ingestion and real-time analytics. Leveraged Cloud Spanner for globally distributed database management, ensuring consistency and reliability across financial operations. Employed C programming for system-level applications, optimizing data processing routines and contributing to the development of low-latency, high-throughput data systems. Automated financial reports generation using Power BI and GCP Data Studio, enhancing reporting efficiency and accuracy. Environment: GCP, Pyspark, SAS, Hive, Sqoop,Teradata, GCPs DataProc Big Query, Hadoop, kafka, MQ, Hive, GCS, Python, C, Snowflake, Power Bi, Data Flow, SQL Database, Mongo Db, Data Bricks, GCP, GCS, BigQuery, GCP Dataprep, GCP Dataflow, GCP Dataproc, Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data Catalog, GCP Databricks. Data Engineer Abbvie Vernon Hills, IL July 2018 to November 2021 Responsibilities: Engineered data integration pipelines using Azure Data Factory, streamlining data flow from research datasets to Azure Data Lake, supporting immunology research projects. Managed and optimized Azure Storage solutions for secure and scalable storage of large-scale genomic data, enhancing data accessibility for oncology research. Developed and maintained robust data processing frameworks using Azure Databricks, facilitating advanced analytics on clinical trial data. Implemented JMeter and Kafka for real-time data ingestion and processing, improving data quality and speed for gastroenterology research outcomes. Automated deployment and configuration processes using Ansible and Jenkins, ensuring reliable and efficient application updates within research environments. Containerized research applications and data processing tools using Docker, enhancing portability and scalability across computing environments. Managed build and deployment pipelines using Maven and GIT, streamlining code integration and version control for data engineering projects. Administered Linux and Red Hat servers hosting data-intensive applications, ensuring high availability and performance for data analysis tools. Wrote and optimized Python scripts for data manipulation and analysis, extracting insights from complex biomedical data. Developed Shell Scripting routines for automating data processing tasks, reducing manual effort and increasing efficiency in data management. Configured and maintained MYSQL databases, supporting structured data storage for research findings and clinical data. Implemented Elastic Search for fast and scalable search capabilities across vast repositories of research documents and data. Utilized Golang for developing high-performance data processing tools, enhancing data throughput for large-scale data sets. Managed Web Sphere and Tomcat servers, ensuring robust hosting environments for web-based research data applications. Integrated Splunk for log management and analysis, monitoring data processing pipelines and ensuring system health. Automated testing and validation of web services using Soap UI, ensuring data integrity and reliability in research data exchange. Orchestrated containerized environments using Kubernetes, facilitating scalable and manageable deployment of data applications. Employed Terraform for infrastructure as code (IaC) management, automating cloud infrastructure provisioning and ensuring reproducibility. Developed PowerShell scripts for automation and configuration tasks, enhancing operational efficiency in cloud and on-premises environments. Ensured data pipeline integrity and security through continuous integration and delivery (CI/CD) practices using Jenkins and GIT. Optimized data query performance and analysis using Azure Data Lake and Azure Databricks, supporting fast-paced research and development. Implemented secure data exchange and APIs with Azure Data Factory, facilitating seamless data integration across research platforms. Orchestrated and managed containerized applications using Kubernetes, enhancing deployment processes, scaling, and system resilience across cloud environments. Automated environment setup and application deployment using Docker and Kubernetes, reducing setup times for data processing environments. Utilized Ansible for configuration management, ensuring consistent environments across development, testing, and production. Integrated Splunk for real-time monitoring and analytics of data operations, enhancing visibility and insights into data processing performance. Environment: Azure Data Factory, Azure Data lake, Azure storage, Azure Data Bricks, JMeter, Kafka, Ansible, Jenkins, Docker, Maven, Linux, Red Hat, GIT, Kubernetes, Python, Shell Scripting MYSQL, Elastic Search, Golang, Web Sphere, Splunk, Tomcat, Soap UI, Kubernetes, Terraform, PowerShell. Data Analyst Netenrich Technologies Pvt. Ltd. December 2016 to March 2018 Responsibilities: Developed Spark applications within Databricks to extract, transform, and aggregate data from various file formats using Spark-SQL, enabling in-depth analysis of customer usage, consumption trends, and behavior. Demonstrated proficiency in dimensional modeling, encompassing Snowflake schema, Star schema, transactional modeling, and Slowly Changing Dimension (SCD), contributing to robust model construction. Engaged actively in model development by identifying, collecting, exploring, and cleansing data, ensuring its quality and relevance for modeling purposes. Conducted thorough data cleaning and scaling operations to bolster data quality and prepare it for further analysis. Developed statistical models for diagnostics, prediction, and prescriptive solutions, operating in both distributed and standalone environments. Applied Python libraries including NumPy, Scikit-learn, and Matplotlib for data analysis, visualization, interpretation, and reporting of key insights. Designed and implemented NoSQL database schemas using MongoDB, optimizing for performance, scalability, and reliability. Managed MongoDB clusters, ensuring high availability, efficient indexing, and optimal shard configuration for distributed data processing. Led the migration of legacy systems to MongoDB, ensuring seamless data transfer, integrity, and consistency across different storage systems. Integrated MongoDB with various data sources and applications using ETL processes, facilitating real-time data synchronization and analytics. Leveraged leading text mining, data mining, and analytical tools, alongside open-source software, to conduct comprehensive research. Developed and maintained complex data models in NoSQL environments, addressing the needs for high-speed transactions and large-scale data storage. Utilized MongoDB s aggregation framework for data analysis and reporting, optimizing queries for faster response times and reduced server load. Optimized ETL procedures and implemented appropriate transformations to enhance data migration performance, aligning with project requirements. Employed Cross Validation, Log Loss Function, ROC Curves, and AUC for feature selection, ensuring rigorous evaluation of models' effectiveness. Generated dummy variables for specific datasets to facilitate regression analysis and improve model accuracy. Showcased strong data visualization skills using tools such as Matplotlib and the Seaborn package. Utilized Tableau to craft visually engaging data visualizations, dashboards, and comprehensive reports, effectively communicating findings to both the team and stakeholders. Environment: NumPy, Pandas, Tableau, MongoDB, ETL, Cross Validation, Python. PalTech, Hyderabad, India August 2014 to November 2016 Data Analyst Responsibilities: Facilitated communication between IT Technical Teams and end-users, acting as a liaison to understand and convey specific needs and requirements effectively. Utilized advanced data analysis techniques to predict variations aligned with market demands, contributing to informed decision-making. Developed a deep understanding of product knowledge, enabling accurate estimation of product costs for clients. Interpreted and analyzed results using various techniques and tools, ensuring comprehensive comprehension of data outcomes. Played a pivotal role in supporting the data warehouse by aligning and revising reporting requirements. Conducted test runs, implemented the latest software updates, and contributed to strategic decision-making processes. Monitored daily activities and performance using Salesforce reports and analysis, ensuring operational efficiency. Expanded understanding of ETL tools, pipelining, and data warehousing to enhance overall data management capabilities. Automated ETL transformations and executed complex SQL queries, resulting in a 40% improvement in report generation, data preparation, and predictive analytics for business growth. Proactively troubleshooted database report maintenance issues, ensuring smooth and uninterrupted data operations. Prepared detailed and comprehensive reports using Tableau, facilitating easy comprehension of project status and outcomes. Created presentations and dashboards using Tableau, MS Excel, and other MS tools to effectively meet client requirements. Environment: R, SQL Script, Salesforce, Tableau, ETL Pipelines, Data Warehouse, MS Office. Keywords: cprogramm continuous integration continuous deployment user interface message queue business intelligence sthree database rlang information technology microsoft Illinois North Carolina |