Home

Vamsi - Data Engineer
[email protected]
Location: Portland, Oregon, USA
Relocation: yes
Visa: GC
PROFESSIONALSUMMARY:
MADHU CHILUKURI
SENIORDATAENGINEER
Email: [email protected]
PH: +1(731) 900-6998

Around 10 years of IT experience in software design, development, implementation, and support of business applications.
Experience in Big data Hadoop, Hadoop Ecosystem components like Map Reduce, Sqoop, Flume, Kafka, Pig, Hive,
Spark, Storm, H Base, Airflow, Oozier, and Zookeeper.
Strong experience in migrating other databases to Snowflake.
Worked extensively on installing and configuring Hadoop ecosystem architecture work.
Components Hive, SQOOP, H Base, Zookeeper, and Flume.
Good Knowledge in writing Spark Applications in Python (Pyspark).
Working with the data extraction, transformation, and load using Hive, Sqoop, and H Base.
Experienced in working with Spark ecosystem using SCALA and HIVE Queries on different data formats like Text file
and parquet.
Strong experience in Ab initio consulting with ETL, data mapping, transformation and loading from source to target
databases in a complex and high-volume environment.
Effectively managed and prioritized work, collaborated with cross-functional teams, and delivered high-quality solutions
on time by implementing agile methodologies such as Scrum and Kanban.
Extensively worked on Spark using Scale on the cluster for computational (analytics), installed it on top of Hadoop
performed advanced analytical applications by making use of Spark with Hive and SQL/Oracle.
Hands-on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed
File System (HDFS), Map Reduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCOOP, Hue, Scala , AWS
Glue, Lambda functions, Step functions, Cloud Watch, SNS, Dynamo DB, and SQS.
Good Knowledge of Amazon Web Service (AWS) concepts like EMR and EC2 web services which provide fast and
efficient processing of Teradata Big Data Analytics.
Expertise in containerization technologies, including Docker and Kubernetes, and containerized data application
deployment and management on GCP Kubernetes Engine (GKE) to ensure scalability and dependability.
Led data integration and migration initiatives from on-premises or other cloud platforms to Google Cloud Platform,
ensuring minimal business disruption and a seamless transition.
Well versed with various Abs Initio parallelism techniques and implemented Ab Initio Graphs using Component
Parallelism, Pipeline Parallelism, Data parallelism and Multi File System (MFS) techniques.
Expertise in all components such as transform components, database components, sort, partition/de-partition in the GDE
of Ab Initio for creating, executing, testing and maintaining graphs in Ab Initio and also experience with Abs Initio Co
operating System in application tuning and debugging strategies.
Excellent hands on Experience in Vertex Upgrade.
Adequate knowledge of SQL and experience of using SQL Server 2012. work experience in Ecommerce web application.
Strong expertise in the creation and performance tuning of business reports using tools like OBIEE, BI Publisher,
Microsoft Power BI, Tableau and SQL Server tools like SQL Server Analysis Services (SSAS), SQL Server Reporting
Services (SSRS).
Utilizing Stack driver and other tools, I implemented monitoring and logging solutions on GCP in order to detect
anomalies, optimize resource utilization, and monitor performance in a cost-effective manner.
Excellent hands on Experience in End to End Testing on Vertex.
Working Experience in using EME Air Commands, Ab Initio Data Profiler/Data Quality .
Good Experience on Vertex Data Updates and Returns Updates
Exposure to AI and Deep learning platforms such as Tensor Flow, Keas, AWS ML, Azure ML studio.
Strong experience in using SQL for advanced analytical functions and writing complex PL/SQL packages, stored
procedures, functions, cursors, triggers, views, and materialized views.
Expertise in Big Data like Hadoop (Azure, Horton works, Cloudera) distributed systems, Mongo DB, and No SQL.
Strong expertise in the creation and performance tuning of business reports using tools like OBIEE, BI Publisher,
Microsoft Power BI, Tableau and SQL Server tools like SQL Server Analysis Services (SSAS), SQL Server Reporting
Services (SSRS).
Created interactive UI Design in OBIEE Answers and Dashboards, including experience in User Interface Design using
CSS, HTML, and Java script.
Hands-on Experience developing GCP Big Query projects with Airflow as a scheduler.
Possesses excellent knowledge of data bases, data warehouses, and business intelligence concepts. Expertise in designing
and developing ETL framework end to end from the source system to target OLAP systems using ETL tools such as
Talend, Data Stage, and SSIS.
Experience in both Logical and Physical data modeling for enterprise data warehouse and applications, ensuring
alignment with business requirements and performance optimization.
Expertise in architecting and designing Enterprise Data Warehouse solutions, ensuring scalability and efficiency for large
scale applications.
Experience in implementing/supporting data security on the group and UI security using web groups in OBIEE Web
Logic.
Had Experience using GCP Data Processing cluster to with Hadoop tools like Pyspark and Hive.
Experience in Developing Spark applications using Spark - SQL in Data bricks for data extraction,
Well-versed with Design and Architecture principles to implement Big Data Systems.
Acumen on Data Migration from Relational Database to Hadoop Platform using SQOOP.
Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, and join operations.
Good understanding of MPP databases such as HP Vertica and Impala.
Proficient in developing ETL processes using Dimensional Modeling for both ROLAP and MOLAP environments, along
with designing ETL technical architecture and managing BI load dependencies.
Hands-on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS
Expertise in relational databases like Oracle, My SQL, and SQL Server.
Strong analytical and problem-solving skills, highly motivated, good team player with very Good communication &
interpersonal skills.

TECHNICALSKILLS:

Languages R, SQL, Python, Shell scripting, Java, Scala, C++.
Databases Oracle 11g, SQL Server, MS Access, MySQL, MongoDB,
Cassandra, PL/SQL, T-SQL,
Big Data
Ecosystems
Hadoop, Map Reduce, HDFS, HBase, Hive, Pig, Impala, kafka, Spark
MLLib. Py Spark, Sqoop, AVRO.
BI and
Visualization
Tableau, Power BI, OBIEE, SSAS, SSRS, Informatica
Version Controls GIT, SVN, Git Lab, Bit bucket
Data Engineer/Big Data
Tools / Cloud /
Visualization / Other
Tools
Data bricks , Hadoop Distributed File System (HDFS), ERWIN, Data
Dimensional Modeling, Hive, Pig, Sqoop, Map Reduce, Flume, YARN,
Horton works, Cloudera, Mahout, ML lib, Oozie, Zookeeper, etc. AWS,
AWS Lambda, AWS Glue, Data stage 9x/7.5/7.1, Red shift, Athena, Azure
Data bricks , Azure Data Explorer, Azure HD Insight, GCP, Big Query, Pub
Sub, Sales force, Google Shell, Linux, Pu TTY, Bash Shell, Unix, etc.,
Tableau, Power BI, SAS, Mat plotlib, Sea born, Bokeh.


PROFESSIONALEXPERIENCE:

Nationwide, Columbus, OH April 2022 to Present
Sr.Data Engineer

Responsibilities:
Expertise in designing and deployment of Hadoop clusters and different Big Data analytic tools including Pyspark,
Hive, Snowflake, and Airflow as schedulers.
Implemented advanced procedures like text analytics and processing using in-memory computing capabilities like Apache
Spark written in Python.
Leveraged Azure Synapse's unified analytics platform to design and execute end-to-end data analytics solutions for big
data analytics, data integration, and data warehousing.
Deployed Azure Synapse workspaces and provisioned resources such as SQL pools, Apache Spark pools, and data
integration runtimes to support diverse data processing workloads.
Day to-day responsibility includes developing ETL Pipelines in and out of data ware house, develop major regulatory and
financial reports using advanced SQL queries in snowflake.
Working in ETL methodology for supporting Data Analysis, Extraction, Transformations and Loading, in a corporate
wide-ETL Solution using Ab Initio.
Updating and installing vertex returns and o-series and working on Vertex 9.0 End to End Testing.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Python.
Worked with Spark to create structured data from the pool of unstructured data received.
Documented the requirements including the available code which should be implemented using Spark, Hive, and HDFS.
Monitoring and maintaining the best possible performance, availability, and scalability for GCP database services,
including Cloud SQL, Big table, Fire store, and Spanner.
Implement one time Data Migration of Multistate level data from SQL server to Snowflake by using Python and Snow
SQL.
Created and executed end-to-end machine learning processes by automating model creation, training, and deployment
through the use of Amazon Sage Maker Pipelines.
Used Sage Maker Pipelines to define dependencies between various processing steps and guarantee consistent execution
while orchestrating sophisticated machine learning pipelines.
Trained Random forest algorithm on customer web activity data on media applications to predict the potential customers.
Worked on Google Tensor Flow, Keras API- convolution neural networks for classification problems.
Worked with ETL team involved in loading data to staging area to data warehouse. Provided all business rules for the
database for loading data.
Stage the API or Kafka Data (in JSON file format) into Snowflake DB by Flattening the same for different functional
services.
Using Sage Maker Processing or AWS Lambda, new pipeline components were developed to integrate custom data
preprocessing, feature engineering, or model evaluation logic, extending the capability of SageMaker Pipelines.
Collaborating to build and deploy GCP solutions that are in line with corporate goals with cross-functional teams made up
of developers, architects, and operations.
Chosen and produced information into CSV records and put away them into AWS S3 by utilizing AWS EC2 and
afterward organized and put away in AWS Redshift.
Developed machine learning models using Google Tensor Flow keras API Convolution neural networks for Classification
problems, fine-tuned the model performance by adjusting the epochs, bath size, Adam optimizer.
Developing and writing SQLs and stored procedures in Teradata. Loading data into Snowflake and writing Snow SQL
scripts.
Presented the results to business users, authorities, and other development and engineering teams through data
visualization tools like Tableau, Microsoft Power BI, and Oracle BI Publisher.
Created graphical reports, tabular reports, scatter plots, geographical maps, dashboards, and parameters on Tableau and
Microsoft Power BI.
Extract Real-time feed using Kafka and Spark streaming and convert it to RDD and process data in the form of Data
Frame and save the data as Parquet format in HDFS.
Experienced in transferring Streaming data, data from different data sources into HDFS, No SQL databases
Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into
the target database.
Creating Data bricks note books using SQL, Python, and automated notebooks using jobs.
Creating Spark clusters and configuring high-concurrency clusters using Azure Data bricks to speed up the preparation of
high-quality data.
Used Py Spark and Pandas to calculate the moving average and RSI score of the stocks and generated them in the data
warehouse.
Worked on design and development of Informatics mappings, workflows to load data into staging area, data warehouse
and data marts in SQL Server and Oracle.
Created the Materialized View to extract information from multiple data sources in the OBIEE administration.
Encoded and decoded JSON objects using PySpark to create and modify the Data frames in Apache Spark.
Created reports for business clients utilizing data visualization tools like OBIEE, Oracle Business Intelligence Publisher,
and Tableau.
Developed multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
Created on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
Optimized existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDDs.
Developed pipeline for POC to compare performance/efficiency while running pipeline using the AWS EMR Spark
cluster.
Extensive experience in building ETL jobs using Jupyter notebooks with Apache Spark.
Running analytics on power plant data using Pyspark API with Jupyter notebooks in on-premise cluster for certain
transforming needs.
Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
Real-time streaming of the data using Spark with Kafka.
Designed and developed data loading strategies, and transformation for business to analyze the datasets.
Experienced in writing Spark Applications in Scala and Python (Pyspark).
Implemented design patterns in Python for the application.
Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database
transform, and upload into the Data warehouse servers.
Develop quality code adhering to Python coding Standards and best practices.
Used Spark and Spark-SQL to read the parquet data and create the tables in hive using Python.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
Collected data using spark streaming in near-real-time and performed necessary transformations and aggregations to build
the data model persists the data in HDFS
Involved in performance tuning of spark jobs using Cache and using complete advantage of the cluster environment.

Environment: AWS, Hadoop, Hive, Java, Kafka, Pyspark, Yarn, Vertex 6.0DynamoDB, shell Scripting, Python, Spark and Scala,
Maven, Microsoft Power BI, OBIEE, My SQL, Airflow

Molina healthcare, Bothell, WA Sep 2019 Mar 2022
Sr.Data Engineer

Responsibilities:
Experience in Big Data Analytics and design in the Hadoop ecosystem using MapReduce Programming, Spark, Hive,
Pig, Sqoop, H Base, Oozie, Impala, Kafka.
Performing hive-tuning techniques like partitioning, bucketing, and memory optimization.
Worked on different file formats like parquet, orc, JSON, and text files.
Coordinated and automated data transformation and transfer activities across hybrid and multi-cloud systems using Azure
Synapse Data Integration.
Worked on Snow SQL and Snow pipe
Converted Talend Job lets to support the snowflake functionality.
Created Snow pipe for continuous data load.
Monitor and tune ETL processes for performance improvements; identify, research, and resolve data warehouse load
issues.
Involved in full life cycle Business Intelligence implementations and understanding of all aspects of an implementation
project using OBIEE.
Designed and implemented end-to-end data pipelines using Terraform and AWS services including S3, Red shift, and
Glue.
Proficient in managing GCP cloud infrastructure, including the creation and administration of IAM (Identity and Access
Management) policies, virtual machines, storage containers, and networks.
Enforced data privacy and regulatory compliance by implementing stringent data governance and security measures on
GCP, including encryption, access controls, and compliance policies.
Big Query was utilized to develop and optimize data analytics solutions on GCP, facilitating the analysis of large datasets
for business intelligence and decision-making in a timely and cost-effective manner.
Created productivity reports, Ad-hoc reports, and interactive dashboards, filters, prompts for business end-users on
OBIEE Answers and BI Publisher.
Designed interactive dashboards in OBIEE and BI Publisher using drill-down, guided navigation, prompts, filters, and
variables.
Utilizing AWS Code Pipeline and AWS Code Build, Sage Maker Pipelines were integrated with CI/CD pipelines to
automate the deployment of machine learning models into production settings.
Sage Maker Model Integrated In order to track model quality and drift in real-time, monitor into machine learning
pipelines. When performance deviations are found, alarms are sent out and workflows are retrained.
Automating the provisioning and maintenance of GCP resources through the use of technologies such as Terraform or
Deployment Manager to implement Infrastructure as Code (IaC).
Re-plat forming, re-architecting, or lift-and-shift tactics to optimize workloads for cloud environments.
Employing Git Lab CI/CD, Cloud Build, Jenkins, and other tools to create and manage CI/CD pipelines on GCP.
Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala, initially done using
Python (PySpark).
Data collection and transformation mappings and design of the data warehouse data model.
Used spark SQL to load data and created schema RDD on top of that which loads into hive tables and handled structured
using spark SQL.
Automated infrastructure provisioning and configuration using Terraform, reducing deployment time and improving
overall system reliability.
Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, H Base, Oozie,
Zookeeper, Sqoop, Spark, and Kafka.
Define virtual warehouse sizing for Snowflake for different type of workloads.
Clear and thorough documentation for team members and stakeholders was ensured by employing Markdown or AWS
Cloud Formation templates to define pipeline setups, dependencies, and execution routines.
Developed projects with GCP Big Query and Airflow as schedulers.
Created a Hadoop instance in the Data Proc and developed ETL pipelines using Spark Scala and Hive.
Loading data in Kafka every 15 minutes on an incremental basis to Big Query raw using Google Data Proc, GCS bucket,
Hive, Spark, Scala, Python, Gsutil, and shell script.
Monitor and tune ETL processes for performance improvements; identify, research, and resolve data warehouse load
issues.
Using rest API with Python to ingest data to from the Dashboard system to Big query.
As a Big Data Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest
utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, Mongo DB, Hive, Oozie, Flume, Sqoop
Talend, etc.
Explored with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark
Context, Spark -SQL, Data Frame, Pair RDDs, Spark, YARN, and pyspark.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such
as Java map-reduce Hive, Pig, and Sqoop.
The Data bricks platform follows best practices for securing network access to cloud applications.
Hands-on experience with git bash commands like git pull to pull the code from the source and develop it as per the
requirements, git add to add files, git commit after the code build and git push to the pre-prod environment for the code
review and later used screwdriver. Yaml which builds the code, generates artifacts which released into production
Performed data validation which does the record-wise counts between the source and destination.
Involved in the data support team as the role of bug fixes, schedule changes, memory tuning, schema changes loading the
historic data.
Worked on implementation of some checkpoints like hive count check, Sqoop records check, done file create check, done
file check, and touch file lookup.
Worked on both Agile methodologies.

Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, GitHub, Teradata
Big Data Integration, Impala, GCP, Data proc ,Airflow, Big query.

Global Atlantic financial group, Indianapolis, IN May 2017 Aug 2019
Data Engineer

Responsibilities:
Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node,
YARN, Spark and Map Reduce programming.
Converting the existing relational database model to the Hadoop ecosystem.
Worked with Linux systems and RDBMS database regularly to ingest data using Sqoop.
Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2
instances.
Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database
transform, and upload into the Data warehouse servers.
Documented architecture designs, data models, and deployment configurations within Azure Synapse workspaces,
ensuring clear and comprehensive documentation for stakeholders and team members.
Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations
and Aggregations to build the data model and persist the data in HDFS.
Managed and reviewed Hadoop and H Base log files.
Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive.
Designed and implemented HIVE queries and functions for evaluation, filtering, loading, and storing of data.
Analyze table data and implement compression techniques like Tera data Multi valued compression.
Involved in the ETL process from design, development, testing, and migration to production environments.
Involved in writing the ETL test scripts and guided the testing team in executing the test scripts.
Involved in performance tuning of the ETL process by addressing various performance issues at the extraction and
transformation stages.
Guide the development team working on PySpark as an ETL platform.
Writing Hadoop Map Reduce jobs to run on Amazon EMR clusters and creating workflows for running jobs.
Generating analytics reporting on probe data by writing EMR (elastic map reduce) jobs to run on Amazon VPC cluster
and using Amazon data pipelines for automation.
Have a good understanding of Teradata MPP architecture such as Partitioning, Primary Indexes,
Good knowledge of Teradata Unity, Teradata Data Mover, OS PDE Kernel internals, Backup and Recovery
Created HBase tables to store variable data formats of data coming from different portfolios.
Optimize the Pyspark jobs to run on Cabernets Cluster for faster data processing.
Created Partitions and buckets based on State to further process using Bucket-based Hive joins.
Involved in transforming data from Mainframe tables to HDFS, and H Base tables using Sqoop.
Creating Hive tables and working on them using Hive QL.
Creating and truncating H Base tables in hue and taking backup of submitter ID.
Developed data pipeline using Kafka to store data in HDFS.
Used Spark API over Hadoop YARN as an execution engine for data analytics using Hive.
Continuous monitoring and managing of the Hadoop cluster through Cloudera Manager.
Involved in the review of functional and non-functional requirements.
Developed ETL Process using HIVE and HBASE.
Prepared the Technical Specification document for the ETL job development.
Responsible for managing data coming from different sources.
Loaded the CDRs from relational DB using Sqoop and other sources to the Hadoop cluster by using Flume.
Installed and configured Apache Hadoop, Hive, and Pig environment.
Environment: Hadoop, HDFS, Pig, Hive, Java, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic
search, Linux- Ubuntu, Kafka.


GrapesoftSolutions, Hyderabad, India Nov 2015 Feb 2017
Data Engineer

Responsibilities:
Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node,
YARN, Spark and Map Reduce programming.
Converting the existing relational database model to the Hadoop ecosystem.
Worked on development of data ingestion pipelines using ETL tool, Talend & bash scripting with big data
technologies including but not limited to Hive, Impala, Spark, Kafka, and Talend.
Experience in developing scalable & secure data pipelines for large datasets.
Gathered requirements for ingestion of new data sources including life cycle, data quality check, transformations, and
metadata enrichment.
Collecting and aggregating large amounts of log data and staging data in HDFS for further analysis.
Monitor the Daily, Weekly, and Monthly jobs and provide support in case of failures/issues.
Delivered data engineer services like data exploration, ad-hoc ingestions, and subject-matter-expertise to Data scientists in
using big data technologies.
Build machine learning models to showcase big data capabilities using PySpark and ML lib.
Knowledge of implementing the JILs to automate the jobs in the production cluster.
Trouble shooted user' s analyses bugs (JIRA and IRIS Ticket).
Worked with the SCRUM team in delivering agreed user stories on time for every Sprint.
Worked on analyzing and resolving production job failures in several scenarios.
Implemented UNIX scripts to define the use case workflow to process the data files and automate the jobs.
Utilizing Terra form to automate infrastructure provisioning and setup for data processing pipelines, consistency and
scalability across cloud environments and platforms are guaranteed.
Used Terra form to manage infrastructure as code simplifies the deployment and upkeep of data engineering
infrastructure, allowing for version control, cooperation, and repeatability while cutting down on operational overhead.

Environment: Spark, Redshift, Python, Java, HDFS, Hive, Pig, Scala, Kafka, Shell scripting, Linux, Jenkins, Eclipse, Git, Oozie,
Talend.

Avon Technologies Pvt Ltd, Hyderabad, India June 2014 - Oct 2015
Data Engineer /ETL Developer

Responsibilities:
Responsibilities include gathering business requirements, developing strategy for data cleansing and data migration,
writing functional and technical specifications, creating source to target mapping, designing data profiling and data
validation jobs in Informatics, and creating ETL jobs informatics.
Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended
up to 24 nodes during production.
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion
and transformation in GCP.
Strong understanding of AWS components such as EC2 and S3.
Developed logistic regression models (Python) to predict subscription response rate based on customers variables like
past transactions, response to prior mailings, promotions, demographics, interests, and hobbies, etc.
Hands on experience with big data tools like Hadoop, Spark, Hive.
Built APIs that will allow customer service representatives to access the data and answer queries.
Designed changes to transform current Hadoop jobs to H Base.
Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.
Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes,
Troubleshooting, Manage and review data backups, Manage & review log files.
Worked closely with the ETL SQL Server Integration Services Developers to explain the Data Transformation.
The new Business Data Warehouse (BDW) improved query/report performance, reduced the time needed to develop
reports and established self-service reporting model in Cognos for business users.
Implemented Bucketing and Partitioning using hive to assist the users with data analysis.
Used Oozie scripts for deployment of the application and per force as the secure versioning software.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Develop database management systems for easy access, storage, and retrieval of data.
Perform DB activities such as indexing, performance tuning, and back up and restore.
Expertise in writing Hadoop Jobs for analyzing data using Hive QL(Queries), PigLatin(Data flow language), and
custom Map Reduce programs in Java.
Experience implementing machine learning back-end pipeline with Pandas, Numpy.
Designed and Developed data mapping procedures ETL-Data Extraction, Data Analysis and Loading process for
integrating data using R programming.

Environment: YARN, HDFS, Map Reduce, Hive, Oozie, HiveQL, Netezza, Informatica, H Base, Pig, My SQL, No SQL, Spark
Sqoop, Pentaho.
Keywords: cplusplus continuous integration continuous deployment quality analyst business analyst artificial intelligence machine learning user interface business intelligence sthree database active directory rlang information technology hewlett packard microsoft procedural language Colorado Delaware Idaho Ohio Washington

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];3907
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: