Home

Naveen kumar - Sr.Data Engineer and Big Data engineer
[email protected]
Location: Boston, Massachusetts, USA
Relocation: yes
Visa: H1b
Around 9 years of professional experience as a Software developer in design, development, deploying and supporting large scale distributed systems.
Around 6 years of extensive experience as a Data Engineer and Big data Developer specialized in Big Data Ecosystem-Data Ingestion, Modeling, Analysis, Integration, and Data Processing.
Extensive experience in providing solutions for Big Data using Hadoop, Spark, HDFS, Map Reduce, YARN, Kafka, Pig, Hive, Sqoop, HBase, Oozie, Zookeeper, Cloudera Manager, Horton works.
Strong experience working with Amazon cloud services like EMR, Redshift, DynamoDB, Lambda, Athena, Glue, S3, API Gateway, RDS, CloudWatch for efficient processing of Big Data.
Hands-on experience building PySpark, Spark Java and Scala applications for batch and stream processing involving Transformations, Actions, Spark SQL queries on RDD s, Data frames and Datasets.
Strong experience writing, troubleshooting and optimizing Spark scripts using Python, Scala.
Utilized AWS Lambda functions for event-driven data processing and integrated Spark with Amazon S3 for efficient data storage and retrieval.
Conducted performance tuning and optimization of Spark jobs to enhance data processing speed and reduce resource utilization, resulting in improved system efficiency.
Experienced in using Kafka as a distributed publisher-subscriber messaging system.
Strong knowledge on performance tuning of Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
Exceptionally good understanding of partitioning, bucketing concepts in Hive and designed both Managed and External tables in Hive.
Experience in importing and exporting data between HDFS and Relational Databases using Sqoop.
Experience in real time analytics with Spark Streaming, Kafka and implementation of batch processing using Hadoop, Map Reduce, Pig and Hive.
Experienced in building highly scalable Big-data solutions using NoSQL column-oriented databases like Cassandra, MongoDB and HBase by integrating them with Hadoop Cluster.
Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation Worked with
Extensive work on ETL processes consisting of data transformation, data sourcing, mapping, conversion and loading data from heterogeneous systems like flat files, Excel, Oracle, Teradata, MSSQL Server.
Experience of building ETL production pipelines using Informatica Power Center, SSIS, SSAS, SSRS.
Proficient at writing MapReduce jobs and UDF s to gather, analyze, transform, and deliver the data as per business requirements and optimize the existing algorithms for best results.
Experience in working with Data warehousing concepts like Star Schema, Snowflake Schema, DataMarts, Kimball Methodology used in Relational and Multidimensional data modeling.
Strong experience leveraging different file formats like Avro, ORC, Parquet, JSON and Flat files.
Sound knowledge on Normalization and Denormalization techniques on OLAP and OLTP systems.
Good experience with Version Control tools Bitbucket, GitHub, GIT.
Collaborated with bioinformatics scientists to design and deploy scalable storage solutions, leveraging cloud-based platforms like AWS S3, ensuring secure and compliant data storage.
Established and maintained a centralized NF-Core platform for streamlined bioinformatics workflows, ensuring consistency and reproducibility in genomic data analyses.
Proficient in leveraging AWS Databricks for big data processing, analytics, and machine learning tasks in a cloud-native environment.
Experience with Jira, Confluence and Rally for project management and Oozie, AirFlow scheduling tools.
Experienced in Strong scripting skills in Python, Scala and UNIX shell.
Involved in writing Python, Java API s for Amazon Lambda functions to manage the AWS services.
Experience in design, development and testing of Distributed Client/Server and Database applications using Java, Spring, Hibernate, Struts, JSP, JDBC, REST services on Apache Tomcat Servers.
Hands on working experience with RESTful API s, API life cycle management and consuming RESTful services
Have good working experience in Agile/Scrum methodologies, communication with scrum calls for project analysis and development aspects.
Worked with Google Cloud(GCP) Services like Compute Engine, Cloud Functions, Cloud DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation using GCP

Technical Skills:
Programming Languages: Python, Scala, SQL, Java, C/C++, Shell Scripting
Web Technologies: HTML, CSS, XML, AJAX, JSP, Servlets, JavaScript
Big Data Stack: Hadoop, Spark, MapReduce, Hive, Pig, Yarn, Sqoop, Flume, Oozie, Kafka, Impala, Storm
Cloud Platform: Amazon Web Services (AWS), Google Cloud Platform (GCP),
Relational databases: Oracle, MySQL, SQL Server, DB2, PostgreSQL, Teradata, Snowflake
NoSQL databases: MongoDB, Cassandra, HBase, Pig
Version Control Systems: Bitbucket, GIT, SVN, GitHub
IDEs: PyCharm, IntelliJ IDEA, Jupyter Notebooks, Google Colab, Eclipse
Operating Systems: Unix, Linux, Windows


Professional experience:

Health Plans Inc., Westborough, MA Dec 2021 to Till Date
Big Data Engineer
Responsibilities:

Worked on building the data pipelines (ELT/ETL Scripts), extracting the data from different sources (DB2, AWS S3 files), transforming and loading the data to the Data Warehouse (Snowflake).
Designed and developed ETL (Extract, Transform, Load) workflows within IICS, ensuring data quality and accuracy.
Developed custom Java applications to process and transform data within Hadoop using MapReduce and Spark.
Worked on adding the Rest API layer to the ML models built using Python, Flask & deploying the models in AWS BeanStalk Environment using Docker containers.
Designed and implemented scalable pipelines for processing next-generation sequencing (NGS) data, optimizing alignment algorithms and storage structures to handle terabytes of genomic data efficiently.
Developed and maintained a centralized data repository using technologies like MongoDB and Apache Cassandra, enabling quick access to diverse datasets for research teams.
Developed custom ETL processes to extract, transform, and load data from various sources into Palantir's data infrastructure, optimizing for performance and efficiency.
Successfully led the implementation and migration of on-premises data warehousing systems to Snowflake cloud data platform, resulting in improved scalability, performance, and cost efficiency.
Utilized Snowflake's features such as clustering, partitioning, and materialized views to enhance data retrieval and minimize processing time.
Designed and implemented ETL pipelines on Databricks using PySpark and Scala, processing terabytes of data daily from various sources, improving data quality and transforming raw data into valuable insights and loading it into data warehouses.
Integrated Databricks with other AWS services, such as S3, Redshift, Glue, and IAM, to build end-to-end data solutions and ensure secure and seamless data access and management.
Automated the scheduling of Glue jobs using AWS Step Functions, providing real-time insights to the analytics team.
Designed and orchestrated automated data workflows using Apache Airflow or other workflow management tools, ensuring data pipelines are reliable and well-managed.
Developed Kinesis Data Streams for ingesting and processing financial transaction data, achieving sub-second data availability for analytics.
Developed custom Jenkins jobs/pipelines that contained Bash shell scripts utilizing the AWS CLI to automate infrastructure provisioning.
Developed a user-eligibility library using Python to accommodate the partner filters and exclude these users from receiving the credit products.
Built the data pipelines to aggregate the user click stream session data using spark streaming module which reads the clickstream data from Kinesis streams and store the aggregate results in S3 and data and eventually loaded to AWS Snowflake warehouse.
Developed the AWS Lambda server less scripts to handle ad-hoc requests.
Performed Cost optimization reduced the infrastructure costs.
Implementation of Informatica as the primary Business Intelligence (BI) tool for big data analytics.
Leveraged Informatica Intelligent Cloud Services (IICS) to build and maintain data pipelines for processing large-scale data in a big data environment.
Engineered and maintained data warehouses, leveraging technologies such as Apache Spark and Apache Hadoop for distributed computing and data processing.
Worked on building the data pipelines using PySpark ( AWS EMR), processing the data files present in S3 and loading it to Snowflake.
Architected and implemented real-time data streaming solutions using Apache Spark on the AWS cloud platform.
Designed and deployed scalable Spark clusters for processing large volumes of data in real-time, ensuring optimal performance and low-latency data processing.
Integrated Spark Streaming with AWS services, such as Amazon Kinesis and AWS Glue, to build end-to-end data pipelines.
Implemented a Kafka-based data ingestion solution for real-time transfer from Mainframe to EERA.
Designed and maintained Kafka connectors and producers for efficient data ingestion with reduced latency, optimizing Kafka cluster performance for scalability.
Developed custom Kafka Streams applications for data processing and transformation, improving data quality and integrity before loading into EERA. Additionally, implemented MongoDB as a NoSQL database solution for efficient storage and retrieval of structured and unstructured data.
Implemented fault-tolerant and scalable data processing workflows, enhancing the organization's ability to derive actionable insights from streaming data sources.
Spearheaded the integration of ENOVIA PLM into the company's data ecosystem, facilitating seamless collaboration between product lifecycle management and data engineering teams.
Engineered data pipelines to extract, transform, and load (ETL) data from ENOVIA PLM, ensuring the availability of up-to-date product data for analytics and reporting purposes.
Implemented data quality checks and monitoring processes to ensure the integrity and consistency of ENOVIA PLM data.
Implementation and maintenance of robust version control workflows using Bitbucket, GitHub, and Git to manage the codebase of our data pipelines and analytics projects.
Established automated CI/CD pipelines using Bitbucket Pipelines and GitHub Actions, which accelerated the deployment of data pipelines.
Engineered and maintained robust data pipelines for large-scale datasets, utilizing Palantir's proprietary data integration and analytics platform.
Designed and implemented a Delta Lake platform using Databricks Runtime 8.5 for efficient data processing and analytics. Developed data ingestion pipelines using Spark SQL and Scala to ingest structured and semi-structured data from various sources (e.g., APIs, databases) into Delta Lake. Optimized Delta Lake tables for performance by partitioning data and utilizing Delta Lake features such as Z-ordering and Bloom filters.
Implemented end-to-end data integration solutions involving SAP ERP and SAP HANA, enabling seamless flow of business-critical data across the organization.
Engineered data pipelines to extract, transform, and load (ETL) data from SAP systems, optimizing data structures.
Collaborated with SAP functional consultants to design and implement data models in SAP HANA, supporting advanced analytics and real-time reporting initiatives.
Spearheaded the Teradata migration project, successfully transitioning the company's data warehousing solutions from legacy systems to Teradata, resulting in improvement in query performance and reduction in data storage costs.
Other activities include supporting and keeping the data pipelines active, working with Product Managers, Analysts, Data Scientist & addressing the requests coming from them, unit testing, load testing and SQL optimizations in DB2 server.
Utilized Power BI and Tableau to create executive-level and operational dashboards, enabling real-time monitoring of key business metrics.
Conducted regular monitoring and tuning of Tableau Server on AWS, addressing resource bottlenecks and optimizing infrastructure to meet growing business demands.
Created SpotIQ dashboards and visualizations for key business metrics.
Environment: Groovy, Python, Flask, Numpy, Pandas, DB2, Cassandra, AWS EMR, Spark, AWS Kinesis, AWS Redshift, AWS EC2, AWS S3, AWS BeanStalk, AWS Lambda, AWS data pipeline, AWS cloud-watch, Docker, graph database, CouchDB, Shell scripts, Looker.


PepsiCo Peeramcheru, Telangana, India May 2019 to Jun 2021
Big Data Engineer
Responsibilities:


As a Data Engineer I am responsible for building scalable distributed data solutions using Hadoop.
Involved in Agile Development process (Scrum and Sprint planning).
Handled Hadoop cluster installations in Windows environment.
Migrated data warehouses to Snowflake Data warehouse.
Developed robust and scalable ETL processes to extrat, transform, and load data from diverse sources into Snowflake, contributing to real-time and batch data integration.
Experience building and deploying cloud infrastructure using Terraform.
Designed and developed ETL (Extract, Transform, Load) workflows using Informatica PowerCenter to ingest, process, and analyze large volumes of data.
Developed custom ETL (Extract, Transform, Load) processes to ingest and preprocess data from various sources, ensuring compatibility with the graph database schema.
Developed robust ETL pipelines using Databricks notebooks and Spark jobs, transforming raw data into valuable insights and loading it into data warehouses.
Proficiently managed virtual warehouses and effectively scaled them based on varying workloads to achieve maximum efficiency and performance.
Defined virtual warehouse sizing for Snowflake for different type of workloads.
Demonstrated knowledge of AWS, Azure, Google Cloud Platform, and other cloud providers.
Integrated Snowflake with cloud storage services like AWS S3, Azure Data Lake, and data lake architecture.
Orchestrated complex data workflows using Databricks notebooks, Apache Airflow, and AWS Step Functions, ensuring timely and reliable data ingestion and transformation.
Collaborated with cross-functional teams, leveraging version control tools to manage Databricks notebooks effectively and promote a collaborative development environment.
Ability to design, develop, and implement Terraform scripts for infrastructure automation
Proven understanding of the principles of Infrastructure as Code (IaC)
Worked with Google Cloud (GCP) Services like CoFmpute Engine, Cloud Functions, Cloud DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation using GCP.
Involved in migration of an Oracle SQL ETL to run on Google cloud platform using cloud Dataproc & BigQuery, cloud pub/sub for triggering the Apache Airflow jobs.
Utilized AWS Lambda functions for event-driven data processing and integrated Spark with Amazon S3 for efficient data storage and retrieval.
Conducted performance tuning and optimization of Spark jobs to enhance data processing speed and reduce resource utilization, resulting in improved system efficiency.
Extracted data from data lakes, EDW to relational databases for analyzing and getting more meaningful insights using SQL Queries in DB2 DB and PySpark.
Developed PySpark script to merge static and dynamic files and cleanse the data and Created Pyspark procedures, functions, packages to load data.
Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Wrote Sqoop Scripts for importing and exporting data from RDBMS (DB2) to HDFS.
Designing and implementing MongoDB-based data storage solutions for our big data projects. Successfully managed large-scale, NoSQL databases to store and retrieve diverse data types efficiently.
Designed and implemented data ingestion processes, utilizing MongoDB's aggregation framework to preprocess and cleanse data before storage.
Design migration of a legacy data storage system to CouchDB, overseeing the data modeling and schema design to align with best practices and project requirements.
Conducted comprehensive data modeling workshops to identify and define relationships within complex datasets, ensuring a solid foundation for graph database implementation.
Set up Data Lake in Google cloud using Google cloud storage, BigQuery and Big Table.
Developed scripts in BigQuery and connecting it to reporting tools.
Designed workflows using Airflow to automate the services developed for Change data capture.
Carried out data transformation and cleansing using SQL queries and PySpark.
Used Kafka and Spark streaming to ingest real time or near real time data in HDFS.
Worked related to downloading BigQuery data into Spark data frames for advanced ETL capabilities.
Worked on PySpark APIs for data transformations.
Built reports for monitoring data loads into GCP and drive reliability at the site level.
Spearheaded the integration of Mainframe data with the EERA, leveraging Kafka to achieve near real-time data synchronization.
Participated in daily stand-ups, bi-weekly scrums and PI panning.
Developed and maintained custom reports and dashboards to track key performance indicators and identify trends.
Designed and maintained Teradata data models, optimizing database structures for performance and scalability, and ensuring data integrity and security through the implementation of access controls.
Spearheaded the adoption of Git and GitLab as version control tools for managing ETL processes and data transformation scripts in a large-scale big data environment.
Used Informatica Power BI Visualizer to create interactive dashboards that allow users to drill down into the data and explore different dimensions.
Environment: Hadoop 3.3, GCP, BigQuery, Big Table, Spark 3.0, PySpark, Sqoop 1.4.7, ETL, HDFS, Snowflake DW, DB2, MapReduce, Kafka 2.8, Informatica, and Agile process.


PepsiCo Peeramcheru, Telangana, India Aug 2017 to May 2019
ETL Developer
Responsibilities:

Extensive experience in working with AWS cloud Platform (EC2, S3, EMR, Redshift, Lambda and Glue).
Working knowledge of Spark RDD, Dataframe API, Data set API, Data Source API, Spark SQL and Spark Streaming.
Developed Spark Applications by using Python and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources.
Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop.
Using SparkContext, Spark-SQL, Spark MLlib, Data Frame, Pair RDD and Spark YARN.
Used Spark Streaming APIs to perform transformations and actions on the fly for building common.
Learner data model which gets the data from Kafka in real time and persist it to Cassandra.
Developed Kafka consumer API in python for consuming data from Kafka topics.
Consumed Extensible Markup Language (XML) messages using Kafka and processed the XML file using Spark Streaming to capture User Interface (UI) updates.
Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
Load D-Stream data into Spark RDD and do in memory data Computation to generate output response.
Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a Data pipeline system.
Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for data sets processing and storage.
Experienced in Maintaining the Hadoop cluster on AWS EMR.
Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables.
Configured Snow pipe to pull the data from S3 buckets into Snowflakes table.
Stored incoming data in the Snowflakes staging area.
Created numerous ODI interfaces and loaded into Snowflake DB.
Worked on Amazon Redshift for shifting all Data warehouses into one Data warehouse.
Good understanding of Cassandra architecture, replication strategy, gossip, snitches etc.
Designed columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
Used the Spark Data Cassandra Connector to load data to and from Cassandra.
Worked from Scratch in Configurations of Kafka such as Mangers and Brokers.
Experienced in creating data-models for Clients transactional logs, analyzed the data from Cassandra.
Tables for quick searching, sorting and grouping using the Cassandra Query Language.
Tested the cluster performance using Cassandra-stress tool to measure and improve the Read/Writes.
Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables.
Stored in Hive to perform data analysis to meet the business specification logic.
Used Apache Kafka to aggregate web log data from multiple servers and make them available in downstream systems for Data analysis and engineering type of roles.
Worked in Implementing Kafka Security and boosting its performance.
Experience in using Avro, Parquet, RCfile and JSON file formats, developed UDF in Hive.
Developed Custom UDF in Python and used UDFs for sorting and preparing the data.
Worked on Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML, CSV and generated Bags for processing using pig etc.
Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.
Written several Map Reduce Jobs using Pyspark, Numpy and used Jenkins for Continuous integration.
Setting up and working on Kerberos authentication principals to establish secure network communication.
On cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, map R, HDFS, Hive, Pig, Apache Kafka, Sqoop, Python, Pyspark, Java, Shell scripting, Linux, MySQL Oracle Enterprise DB, SOLR, Jenkins, Eclipse, Oracle, Infomatica, Git, Oozie, Tableau, MySQL, Soap, Cassandra and Agile Methodologies.

ValueLabs, Hyderabad, Telangana, India Sep 2016 to Aug 2017
ETL Developer
Responsibilities:

Participate in requirement grooming meetings which involves understanding functional requirements from business perspective and providing estimates to convert those requirements into software solutions (Design and Develop & Deliver the Code to IT/UAT/PROD and validate and manage data Pipelines from multiple applications with fast-paced Agile Development methodology using Sprints with JIRA Management Tool)
Responsible to check data in DynamoDB tables and to check EC2 instances are upon running for
(DEV, QA, CERT and PROD) in AWS.
Analysis on existing data flows and create high level/low level technical design documents for business stakeholders that confirm technical design aligns with business requirements.
Creation and deployment of Spark jobs in different environments and loading data to no sql database Cassandra/Hive/HDFS. Secure the data by implementing encryption-based
Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups, Optimized volumes, and EC2 instances and created monitors, alarms, and notifications for EC2 hosts using Cloud Watch.
Developing code using Apache Spark and Scala, IntelliJ, NoSQL databases (Cassandra), Jenkins, Docker pipelines, GITHUB, Kubernetes, HDFS file System, Hive, Kafka for streaming Real time streaming data, Kibana for monitor logs etc. authentication/authorization to the data Responsible to deployments to DEV, QA, PRE-PROD (CERT) and PROD using AWS.
Scheduled Informatica Jobs through Autosys scheduling tool.
Created quick Filters Customized Calculations on SOQL for SFDC queries, Used Data loader for ad hoc data loads for Salesforce
Extensively worked on Informatica power center Mappings, Mapping Parameters, Workflows, Variables and Session Parameters.
Responsible for facilitating load data pipelines and benchmarking the developed product with the set performance standards.
Used Debugger within the Mapping Designer to test the data flow between source and target and to troubleshoot the invalid mappings.
Worked on SQL tools like TOAD and SQL Developer to run SQL Queries and validate the data.
Study the existing system and conduct reviews to provide a unified review on jobs.
Involved in Onsite & Offshore coordination to ensure the deliverables.
Involving in testing the database using complex SQL scripts and handling the performance issues effectively.
Environment: Apache spark 2.4.5, Scala2.1.1, Cassandra, HDFS, Hive, GitHub, Jenkins, kafka, SQL Server 2008, Salesforce Cloud, Visio, TOAD, Putty, Autosys Scheduler, UNIX, AWS, WinScp, Salesforce data loader, SFDC Developer console.

British Telecom, Hyderabad, India Dec 2014 to Sep 2016
Database Developer
Responsibilities:

Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
Worked on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.
Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV files using Spark
Created environment to access Loaded Data via Spark SQL, through JDBC ODBC (via Spark Thrift Server).
Developed real time data ingestion/ analysis using Kafka / Spark-streaming.
Configured Hive and written Hive UDF's and UDAF's Also, created Static and Dynamic with bucketing as required.
Worked on writing Scala programs using Spark on Yarn for analyzing data.
Managing and scheduling Jobs on a Hadoop cluster using Oozie.
Created Hive External tables and loaded the data into tables and query data using HQL.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Developed Oozie workflow for scheduling and orchestrating the ETL process and worked on Oozie workflow engine for job scheduling.
Managed and reviewed the Hadoop log files using Shell scripts.
Migrated ETL jobs to Pig scripts to do transformations, even joins and some pre-aggregations before storing the data onto HDFS.
Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
Real time streaming, performing transformations on the data using Kafka and Kafka Streams.
Built NiFidataflow to consume data from Kafka, make transformations on data, place in HDFS & exposed port to run Spark streaming job.
Developed Spark Streaming Jobs in Scala to consume data from Kafkatopics, made transformations on data and inserted to HBase.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Experience in managing and reviewing huge Hadoop log files.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting.
Worked with Avro Data Serialization system to work with JSON data formats.
Used Amazon Web Services (AWS) S3 to store large amounts of data in identical/similar repositories.
Environment: Spark, Spark SQL, Spark Streaming, Scala, Kafka, Hadoop, HDFS, Hive, Oozie, Pig, Nifi, Sqoop, AWS (EC2, S3, EMR), Shell Scripting, HBase, Jenkins, Tableau, Oracle, MySQL, Teradata and AWS.
Keywords: cprogramm cplusplus continuous integration continuous deployment quality analyst machine learning user interface access management business intelligence sthree database active directory rlang information technology Massachusetts

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1393
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: