Resume View

Home

Jessy - Hadoop Admin

Location: Remote, Remote, USA

Relocation: yes

Visa: H1B

Deepthi Tushara
Hadoop Engineer

Ph: 813-524-0103
Email : jessy@rurisoft.com

Professional Summary:
Overall 8 years of experience in IT, which includes experience in Hadoop environment as an administrator and developer.
Hands on experience in deploying and managing the multi - node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, SPARK, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
Installation and configuration of HADOOP cluster and maintenance, cluster monitoring. Troubleshooting.
Assisted in designing, developing and architecture of HADOOP ecosystem.
Hands on experience on configuring a Hadoop cluster in a professional environment and on VMWare, Amazon Web Services (AWS) using an EC2 instance and IBM Bluemix.
Worked at optimizing volumes and AWS EC2 instances and created multiple VPC instances. Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
In depth knowledge and good understanding of Hadoop daemons: NameNode, DataNode, Secondary NameNode, Resource Manager, NodeManager.
Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
Knowledge of multiple distributions/platforms (Apache, Cloudera, Hortonworks). Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem. Also, handled importing of various data sources, performed transformation using Hive, Pig, and loaded data into HBase.
Knowledge in Programming MapReduce, HIVE and PIG.
Knowledge in NoSQL databases like HBase and Cassandra.
Exposure in setting up data importing and exporting tools such as Sqoop from RDBMS to HDFS.
Advanced knowledge in Fair and Capacity schedulers and configuring schedulers in cluster.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
Supported technical team for automation, installation and configuration tasks.
Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
Having Strong Experience in LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux. Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
Experience in providing security for Hadoop Cluster with Kerberos.
Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
Experience in HDFS data storage and support for running MapReduce jobs.
Working knowledge on Oozie, a workflow scheduler to manage the jobs that run on PIG, HIVE and sqoop.
Excellent interpersonal and communication skills, technically competent and result-oriented with problem solving and leadership skills.
Research oriented, motivated, proactive. Self-starter with strong technical, analytical skills. Technical Skills:
Skill Name Expertise
Operating Systems RedHat LINUX Release 5.x / 6.x, Cent OS 6.x / 7.x, Microsoft Windows 2000 / 2003 / 2008,
Big Data Domains Hortonworks HDP 2.6, HDF 3.1.1, HDFS, Yarn,Hive, HBase, Oozie, zookeeper, kafka, scala, spark
Scheduler Control-M
Languages Python, SQL
RDBMS MySQL
Scripting Shell Scripting, Python
Monitoring Tools Hortonworks Ambari , Cloudera Manager
Cloud Platforms AWS
IDE Tools MS Excel, MS project
DevOps Tools Docker , Kubernetes, Jenkins, Gitlab
Methodologies Agile, SDLC
Work Experience
Verizon,Inc. Tampa, FL (USA). Aug 2020 Till date Senior Hadoop Engineer
Description:
The main objective of the project is to monitor the large set of Hadoop clusters and maintain the retail sight with no down time and resolve the traffic for the site usage and plan the capacity and to minimal the load on different clusters accordingly. Working on the cluster upgradation, maintenance and monitoring end to end and performing automation for installation by using cloud infra.
Responsibilities:
Installing and configuring the IBM BigInsights Hadoop cluster for the Development and Qa environment.
Preparing the shell-scripts to create the Local and HDFS file-system.
Creating the Encryption zones to enable Transparent Data Encryption (TDE) in production Environment.
Creating users and groups using user management tool (FreeIPA).
Enabling the Kerberos in the cluster and generating the keytabs for the process users and adding it to the user s bash.
Setting the umask for users in both Local RHEL and also in HDFS.
Creating Hive Databases/Schemas and tables and storage based authentication for Hive and Impersonation.
Hive Trustore set up for beeline connectivity.
Synchronizing the hive tables with BigSql using HCatalog and querying the tables using Data Server Manager (DSM).
Setting proper ACL s on the both Local and HDFS file-system to prevent access for unwanted users and groups.
Creating Capacity-Scheduler YARN queues and sharing the percent of resources between each queue.
Validating the final production cluster setup in IBM Bluemix cloud environment. Automating the data fetching accelerators from multiple data sources/servers using oozie workflows.
Involved in Built and Deployment of applications in Production cluster.
Writing oozie workflows for the jobs, and scheduling the jobs with defined frequency and automating the entire process.
Checking logs to figure out the issues of failed jobs and clearing logs.
Monitoring the jobs in production and troubleshooting the failed jobs and configuring e-mail notification for the failed jobs using SendGrid.
Performance tuning of the cluster.
Migration of setup from IBM BigInsights 4.1 cluster to 4.2 cluster and replicating the cluster settings. Listing the pre-defined alerts in the cluster and setting the e-mail notification for the alerts, which are in high priority.
Environment: RHEL, IBM BigInsights (4.2), MapReduce, HIVE, PIG, Oozie, HCatalog, BigSQL, DataServerManager, Kerberos, KNOX, SQOOP.
Verizon, Irving, TX(USA). May2019 July 2020
Hadoop Administrator
Description:
Data Movement Framework is a Verizon enhanced developed tool to ingress the data from multiple source systems to systems to Hadoop Data Platform and egress the data from Hadoop Data Platform to external data system after doing required data cleansing and data validation. This framework is built using Apache Hadoop Ecosystems tools and technologies and provides an easy to use UI to configure and submit workflows to seamlessly move data between Hadoop and Other Data Platforms.
Responsibilities:
Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
Hands on experience on Hortonworks Upgrade from HDP2.6.X to HDP3.0.
Good experience on cluster audit findings and tuning configuration parameters. Deployed high availability on the Hadoop cluster quorum journal nodes.
Implemented automatic failover zookeeper and zookeeper failover controller.
Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
Good experience with Hadoop Ecosystem components such as Hive, HBase, Sqoop, Oozie. Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
Help design of scalable Big Data clusters and solutions.
Commissioning and Decommissioning Nodes from time to time.
Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis. Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
Involved in converting HIVE/SQL queries into spark transformations using Spark RDDs and Scala. Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
Implemented Rack Awareness for data locality optimization.
Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
Implemented Name node backup using NFS.
Work with network and Linux system engineers/admin to define optimum network configurations, server hardware and operating system.
Production support responsibilities include cluster maintenance.
Environment: RHEL, CDH 5.4, HDFS, HUE, Oozie, HIVE, Sqoop, Zookeeper, Spark, kafka, Unix scripts, YARN, Capacity Scheduler, Kerberos, Oracle, MySQL, Ganglia, Nagios.
Hashmap Inc, Roswell, GA (USA). June 2018 April 2019
Role: Big Data Administrator.
Description:
HashMap has singular focus on Big Data. HashMap has deep roots in the Hadoop community, with strong partnerships, and passion for unlocking the power of data assets, HashMap has the unique ability to architect, design, and administer any type of Hadoop-based framework.
Responsibilities:
Worked on developing architecture document and proper guidelines.
Installed and configured Hadoop cluster of Hadoop Distribution (HDP 2.6.0) using Ambari(2.7). Setup ACL/SSL security for different users and assign users to multiple topics. Creating new user accounts and assigning pools for the application usage.
Cluster maintenance using Ambari for adding nodes and decommission dead nodes. Familiar with importing and exporting data using Sqoop from RDBMS MySQL. Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound, or CPU bound.
Worked on setting up high availability for major productioncluster.
Monitor the improvement of CPU utilization and maintain it.
Performance tune and manage growth of the O/S, disk usage, and network traffic Responsible for building scalable distributed data solutions using Hadoop.
Involved in loading data from LINUX file system to HDFS.
Perform architecture design, data modelling, and implementation of Big Data platform and analytic applications for the consumer products.
Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
Optimization and Tuning the application.
Created User Guide Development and Training overviews for supporting teams. Import data from external table into HIVE by using load command
Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
Monitoring the lab environment (open stack machines), installation and upgrade of centos. Configuration and upgradation of open stack machines.
Setting up of multi-node cluster, monitoring, troubleshooting.
Automation of scripts using chef tool.
Installation of docker and setting up of kubernetes cluster.
Deploying ELK application using Docker onto kubernetes cluster.
Automation using Jenkins pipeline using dockerfiles and Gitlab to manage code. Good understanding on cluster configurations and resource management using YARN Environment Hadoop, Hortonworks HDF, HDP, Linux, Apache Yarn.
Environment: RHEL OS 6.6/6.8, CentOS 7, HDP (2.5.0, 2.6.0, 2.6.4), HDF (3.0.0,3.1.1), Azure HDI, HDFS, MapReduce, Tez, YARN, Pig, Hive, HBase, Sqoop, Oozie, Zookeeper, Ambari, Kafka, Spark, Storm, NIFI, Kerberos, LDAP/AD, SSL, Ranger, MySql
University: Central Michigan University Mt. Pleasant, Michigan Jan 2016 Jan 2018 Graduate Assistant - Big Data(Spark, Scala)/Linux
Responsibilities:
Hands on experience in developing Batch Processing pipeline from end to end by using spark data frames and Scala
Migrated Billions of Historic Data into Azure CosmosDB by developing applications using Spark
Populating the Rest API Data into many fields as per the requirement and sending the batches of data to the Kafka and storing the result into Hbase
Worked on HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
Created data science workflows using Packer, Terraform, and Zeppelin which greatly increased productivity.
Experienced import/export data into Hive from relational data base and Tera Data using Sqoop.
Developed Spark application for Generating UUIDs for the Chillers, registering them and setting up the permissions through Rest API Services by using spark
Hands on experience on creating Docker images and running it on Kubernetes Cluster Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Performed different types of transformations and actions on the RDD to meet the business requirements.
Developed and configured Kafka to pipeline server logs data into spark streaming for real time processing
Developed Micro Services using Spring MVC, Spring Boot and Spring Cloud. Used Microservices architecture with Spring Boot-based services interacting through a combination of REST and Spring Boot.
Used Microservices designed with the individual database and project with no dependencies.
Experience in Elastic Search in Hadoop that bridges that gap, letting us the leverage for the best of Hadoop's big data analytics and the real-time analytics
Enabled speedy reviews and first mover advantages by using Apache Oozie workflow engine for managing and scheduling Hadoop jobs
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables
Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing
Environment: Scala, Spark SQL, Azure, Apache Zeppelin, Spark Streaming, Kafka, Oozie, Rest API, Redis Cache, Maven, Jenkins, Kubernetes
Wipro Technologies, Bangalore, India Oct 2015 May 2018
Role: Hadoop Developer
Responsibilities:
Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and pre-processing.
Involved in loading data from UNIX file system to HDFS.
Installed and configured Hive and also written Hive UDFs.
Importing and exporting data into HDFS and Hive using Sqoop
Used Cassandra CQL and Java API's to retrieve data from Cassandra table.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files. Worked hands on with ETL process.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Extracted the data from Teradata into HDFS using Sqoop.
Analyzed the data by performing Hive queries and running Pigscripts to know user behavior. Exported the patterns analyzed back into Teradata using Sqoop.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager. Installed Oozie workflow engine to run multipleHive.
Developed Hivequeries to process the data and generate the data cubes for visualizing.
Environment: Hadoop, MapReduce, HDFS, UNIX, Hive, Sqoop, Cassandra, ETL, Pig Script, Cloudera, Oozie.
Keywords: quality analyst user interface active directory information technology microsoft Florida Georgia Montana Texas

To remove this resume please click here or send an email from jessyrurisoft2@gmail.com to usjobs@nvoids.com with subject as "delete" (without inverted commas)

jessyrurisoft2@gmail.com;245

Enter the captcha code and we will send and email at jessyrurisoft2@gmail.com
with a link to edit / delete this resume
Captcha Image: