Akshay - GCP Data engineer |
[email protected] |
Location: Boston, Massachusetts, USA |
Relocation: yes |
Visa: H1B |
PROFESSIONAL SUMMARY:
o Around 10.5 years of IT experience in a variety of industries working on Big Data technology using technologies such as Cloudera and Hortonworks distributions. Hadoop working environment includes Hadoop, Spark, Map Reduce, Kafka, Hive, Ambari, Sqoop, HBase, and Impala. o Demonstrated broad knowledge of Data Engineering GCP concepts, with expertise in key GCP services, including Dataflow, GKE (Google Kubernetes Engine), IAM (Service Accounts, Roles), Cloud Composer, BigQuery, Cloud Functions, Pub/Sub, and Workload Identity. o Implemented OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse. o Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop, Map reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume o Worked with Cloudera and Hortonworks distributions. o Worked on distributed frameworks such as Apache Spark and Presto in Amazon EMR, Redshift and interact with data in other AWS data stores such as Amazon 53 and Amazon Dynamo DB. o Experience in writing code in PySpark and Python to manipulate data for data loads, extracts, statistical analysis, modeling, and data munging. o Experience in working with product teams to create various store level metrics and supporting data pipelines written in GCP s big data stack o Hands on experience with Google cloud services like GCP, Big Query, GCS Bucket and G-Cloud Function o Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds. o Can work parallel in both GCP and Azure Clouds coherently o Experience in developing end-to-end ETL pipelines using Snowflake, Alteryx, and Apache NiFi for both relational and non-relational databases (SQL and NoSQL). o Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data. o Substantial experience in Spark integration. o Sustaining the PySpark and Hive code by fixing the bugs and providing the enhancements required by the Business User. o Developed Scala scripts using both Data frames/SQL and RDD/Map Reduce in Spark for Data Aggregation, queries and writing data back into OLTP system through SQOOP. o Experience in working with NoSQL databases like HBase and Cassandra. o Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig. o Performing ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark. o Implemented data model to create both SQL and Mongo databases o Experience in Creating ETL mappings using Informatica to move Data from multiple sources like Flat files, Oracle into a common target area such as Data Warehouse. o Integrated Oozie with Hue and scheduled workflows for multiple Hive, Pig and Spark Jobs. o Hands on experience in using other Amazon Web Services like Elastic Map Reduce (EMR), Autoscaling, RedShift, DynamoDB, Route. o Hands on learning with different ETL tools to get data in shape where it could be connected to Tableau through Tableau Data Extract. o Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement o Expertise in Hadoop - HDFS, Hive, Spark, Sqoop, Oozie, and HBase YARN, Name Node, Data Node and Apache Spark Keywords: database information technology |