Home

Suresh - Bigdata Arcitect
[email protected]
Location: San Diego, California, USA
Relocation: Any
Visa: Green Card
Suresh


Summary:


Over 12+ years of experience in Software development lifecycle - Software analysis, design, development, testing, deployment and maintenance.
Working as Big Data Architect for the last 4 years and having a strong background of big data stack like Spark, Scala, Hadoop, Storm, Batch, HDFS, MapReduce, Kafka,Hive,Cassendra, Python,SQOOP, and PIG.
Hands-on experience with Apache Spark and its components (Spark core and Spark SQL)
Experienced in converting HiveQL queries into Spark transformations using Spark RDDs and Scala
Hands on experience in in-memory data processing with Apache Spark
Developed Spark scripts by using Scala shell commands as per the requirement
Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle
Broad understanding and experience of real-time analytics and batch processing using apache spark.
Hands-on experience in AWS (Amazon Web Services), Cassendra, python and cloud computing.
Experience with agile development methodologies like Scrum and Test-Driven Development, Continuous Integration
Ability to translate business requirements into system design
Experience in importing and exporting data from HDFS to RDBMS/ non-RDBMS and vice-versa using SQOOP
Analyzed large amounts of data sets by writing Pig scripts and Hive queries.
Hands on experience in writing pig Latin scripts and pig commands
Experience with front end technologies like HTML, CSS and JavaScript
Experienced in using tools like Eclipse, NetBeans, GIT, Tortoise SVN and TOAD.
Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/11g, MySQL and SQL Server.
Effective team player and excellent communication skills with insight to determine priorities, schedule work and meet critical timelines.
Certified in FINRA (Financial Industry Regulatory Authority, Inc






TECHNICAL SKILLS:

Big Data Apache Spark, Scala, Map Reduce, HDFS, HBase, Hive, Pig, SQOOP, PostgreSQL
Databases Oracle 9i/11g, My SQL, SQL Server 2000/2005
Hadoop distributions Cloudera, Hortonworks, AWS
DWH (Reporting) OBIEE 10.1.3.2.0/11g
DWH (ETL) Informatica Power Center 9.6.x
Languages SQL, PL/SQL, Python, Java
UI HTML, CSS, JavaScript
Defect Tracking Tools Quality Center, JIRA
Tools SQL Tools, TOAD
Version Control Tortoise SVN, GitHub
Operating Systems Windows[...], Linux/Unix

PROFESSIONAL EXPERIENCE:

Client Name: Anthem Inc, Virginia Beach, VA.
Duration: Apr-2018 to till date
Role: Senior Big Data Architect
Project Name: PDR (Pharmacy Data Resiliency)
PDR is the solution to create a consolidated data repository of source systems such as GBD facets and many others with member demographic & eligibility data to be sent via eligibility service APIs exposed via Agadia/PA HUB to enable pharmacy business to build prior authorizations. when primary SOA API/APIGEE is unable to respond to PA HUB requests within 5 seconds during SOA down time. SOA API switch redirects requests to PDR to receive membership data from the local repository.

Responsibilities:

Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC).
Extensively development experience in different IDE like Eclipse, Net Beans and IntelliJ.
Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/SQOOP).
Worked using Apache Hadoop ecosystem components like HDFS, Hive, SQOOP, Pig, and Map Reduce.
Good exposure to GitHub and Jenkins.
Exposed to the Agile environment and familiar with tools like JIRA, Confluence.
Provided recommendations to machine learning groups about customer roadmap.
Sound knowledge in Agile methodology- SCRUM, Rational Tools.
Lead architecture and design of data processing, warehousing and analytics initiatives.
Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue).
Identify query duplication, complexity and dependency to minimize migration efforts Technology stack: Oracle, Cloudera, Hortonworks HDP cluster, Attunity Visibility, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
As a POC, I used Spark for data transformation of larger data sets.
Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.

Enable and configure Hadoop services such as HDFS, YARN, Hive, Ranger, Hbase, Kafka, Sqoop, Zeppeline Notebook and Spark/Spark2.
Worked on Spark, Scala, Python, Storm Impala.
Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java to transform raw data from several data sources into forming baseline data.
Creating a dashboard on Tableau and Elastic search with Kibana.
Hands on expertise in running the SPARK & SPARK SQL.
Experienced in analyzing and Optimizing RDD's by controlling partitions for the given data.
Worked on MapR Hadoop platform to implement bigdata solutions using Hive, Map Reduce, shell scripting, and java technologies.
Struts (MVC) is used for implementation of business model logic..
Evaluate deep learning algorithms for text summarization using Python, Keras, Tensor Flow and Theano on cloudera Hadoop System
Experienced in querying data using Spark SQL on top of Spark engine.
Experience in managing and monitoring Hadoop clusters using Cloudera Manager.

Environment: Big Data, JDBC, NOSQL, Spark, YARN, HIVE, Pig, Scala, AWS EMR, Python, Hadoop, Redshift.


Client Name: CoreLogic Inc, Irvine CA
Duration: Apr-2017 to March 2018
Role: Senior Big Data Architect
Project Name: IDAP (Integrated Data and Analytics Platform)

CoreLogic is an Irvine, CA-based corporation providing financial, property and consumer information, analytics and business intelligence. The company analyzes information assets and data to provide clients with analytics and customized data services.

The current project is IDAP, which stands for - Integrated Data and Analytics Platform. The IDAP application is responsible for creating and managing organization inventory of all related industry groups and key stakeholders from CoreLogic who are participating in the creation / inclusion of the CoreLogic Unique Property ID into market ecosystems. Spur rapid adoption of the CoreLogic ID across the property ecosystem.

Responsibilities:
Analyzed large amounts of data sets to determine optimal ways to aggregate and report on it.
Analyzed large amounts of data sets to determine optimal ways to aggregate and report on it.
Hands-on experience in Spark,Cassendra,Scala,Python and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
Developed spark code and Spark-SQL , for faster testing and processing of data
Snapped the cleansed data to the Analytics Cluster for reporting purposes to Business.
Hands on experience on AWS platform with S3 & EMR.
Experience on working with different data types like FLATFILES, ORC, AVRO and JSON.
Automation of Business reports using Bash scripts in UNIX on Data lake by sending them to business owners.
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems and suggested some solution.

Environment: Apache Spark, Scala, Spark-Core, Spark-SQL, Python, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie, MySQL, Python,Java (jdk1.7), AWS


Client Name: Riverbed Technology, Sunnyvale CA
Duration: June-2016 to Mar-2017
Role: Senior Big Data Architect
Project Name: OMS (Order Management System)
The aim of this project is to develop an application that is used for order entry and processing. Orders can be received from businesses, consumers, or a mix of both, depending on the products. Offers and pricing will be done via catalogs and websites.
The Customer will get a Reference Number, while placing the order and before ordering the customer should register his details where he will get a permanent id of Registration, after verifying and identifying the Genuineness of the customer.
Responsibilities:
Build patterns according to business requirements to help find violations in the market and generate alerts by using Big Data technology (Hive, Tez ,spaerk,scala) on AWS
Worked as a Scrum Master, facilitating team productivities and monitoring project progress by applying Agile Methodology Scrum and Kanban on JIRA board to ensure quality of deliverables
Optimize the long-run pattern by writing shell-scripts and using optimization settings in Hive (e.g. successfully changed 20 hours daily pattern into 7 hours run by figuring out data skew in TB level table, which was adopted company-wise and saved around 50,000 USD per year)
Migrate on-prem RDBMS (Oracle, Greenplum) code into HiveQL and Spark SQL running on AWS EMR
Participate in Machine Learning project, including decision tree modeling and feature engineering
Responsible for ETL and data warehouse process to transfer and register data into AWS S3
Develop Hive UDF functions with Java and modify framework code with Python

Environment: Apache Spark, Scala, Spark-Core, Spark-Streaming, Python,Spark-SQL, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie, MySQL, Java (jdk1.7), AWS



Client Name: Verizon, Dallas
Duration: Dec 2015 to May-2016
Role: Senior Apache Spark Consultant
Project Name: Performance Management Information (PMI)
This project is all about to capture the customer's details and processing that data. For processing, large amounts of data. Once we gathered information we ran strategies on that collected data and loaded them into hive tables. From there we load that data into RDBMS tables using Sqoop.

Responsibilities:
Gather business requirements for the project by coordinating with Business users and data warehousing (front-end) team members.
Involved in products data injection into HDFS using Spark
Created partitioned tables and bucketed data in Hive to improve the performance
Use Amazon Web Services (AWS), EC2 for computing and S3 as storage mechanism.
Load data into MongoDB using hive-mongo connection jars for the purpose of reports generation.
Developed Spark scripts by using Scala shell commands as per the requirement.
Loaded the data into Spark RDD and do in memory data Computation to generate the output response.
Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
Handled importing of data from various data sources from Oracle into HDFS vice-versa using Sqoop.
Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
Migrating various Hive UDF's and queries into Spark SQL for faster requests.
Involved in creating Hive tables, loading with data and writing hive queries which run internally in MapReduce.
Environment: Apache Spark, Scala, Spark-Core, Spark-Streaming, Spark-SQL, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie,Python, MySQL, Java (jdk1.7), AWS


Client Name: Standard Chartered Bank, Chennai
Duration: Mar -2012 to Jan-2015
Role: Big Data Developer
Project Name: AML (Anti Money Laundering) cards
AML Cards is a compliance project handling all credit card transactions (both retail and consumer). The main goal is to detect fraud transactions and generate alerts on such transactions over a data of about 400 GB/Month for USA & Canada alone. The project is divided into two parts:
Segmentation (12-month historical data is provided to analysts).
Transaction Monitoring (alerts are generated on 12 months & recurring feed data. This is a rule based alert generation model).


Responsibilities:

Lead the AML Cards North America development and DQ team successfully to implement the compliance project.
Involved in the project from POC and worked from data staging till saturation of DataMart and reporting. Worked in an onsite-offshore environment.
Completely responsible for creating data models for storing & processing data and for generating & reporting alerts. This model is being implemented as standard across all regions as a global solution.
Involved in discussions and guiding other region teams on SCB Big data platform and AML cards data model and strategy.
Responsible for technical design and review of data dictionaries (Business requirement).
Responsible for providing technical solutions and work arounds.
Migrate the needed data from Data warehouse and Product processors into HDFS using SQOOP and importing various formats of flat files into HDFS.
Involved in discussion with source systems for issues related to DQ in data.
Implemented partitioning, dynamic partitions, buckets and Custom UDF's in HIVE.
Used Hive to process data and Batch data filtering.
Supported and Monitored Map Reduce Programs running on the cluster.
Monitored logs and responded accordingly to any warning or failure conditions.
Responsible for preserving code and design integrity using SVN and SharePoint.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Hive, Pig, HBase, Zookeeper, Oozie, MongoDB, Python,Java, Sqoop

Client: The Hackett Group - Hyderabad
Duration: Jun - 2006 to Feb - 2012
Role: Oracle Database Developer
Project Name: Data Quality Analysis (DQ Analysis)
The personal credit agency has to perform a chain of actions before launching their new application into the market. So, they have to study detailed data for which we help them by managing their data at the backend and help them to submit the organized data to FDA.
The goal was to maintain and enhance the program that will extract and transform data collected on a standard library and format consistent with the CDISC Study Data Tabulation model format (SDTM). In this project, conversion of source data is performed using a set of Oracle SQL scripts created using SQL, PL/SQL programming.

Responsibilities:
Designed, developed, and maintained an internal interface application allowing one application to share data with another.
Analyzed 90% of all changes and modifications to the interface application.
Coordinated development work efforts that spanned multiple applications and developers.
Developed and maintained data models for internal and external interfaces.
Worked with other Bureaus in the Department of State to implement data sharing interfaces.
Attended Configuration Management Process Working Group and Configuration Control Board meetings.
Performed DDL (CREATE, ALTER, DROP, TRUNCATE and RENAME), DML (INSERT, UPDATE, DELETE and SELECT) and DCL (GRANT and REVOKE) operations where permitted.
Design and develop database applications.
Design the database structure for an application.
Estimate storage requirements for an application.
Specify modifications of the database structures for an application.
Keep the database administrator informed of required changes.
Tune the application during development.
Establish an application's security requirements during development.
Created Functions, Procedures and Packages as part of the development.
Assisted the Configuration Management group to design new procedures and processes.
Lead the Interfaces Team with responsibility to maintain and support both internal and external interfaces.
Responsible for following all processes and procedures in place for the entire Software Development Life Cycle.
Wrote documents in support of the SDLC phases. Documents include requirements and analysis reports, design documents, and technical documentation.
Created MS Project schedules for large work efforts.

Environment: Oracle 9i, Informatica 7.1.x, Control-M, TOAD, Linux/Unix


EDUCATION:

Master of Computer Applications (MCA) from University of Madras -2005, India.
Keywords: user interface message queue sthree database information technology microsoft procedural language California Idaho Pennsylvania Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];2142
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: