Home

Dhanunjay - Data engineer
[email protected]
Location: Remote, Remote, USA
Relocation:
Visa:
DHANUNJAYA DOKKU
ETL Developer, Data Engineer, Tera data, Oracle, Hadoop, Spark, Python, Reltio, Collibra, Linux, Unix
Mob: +1 (619) 597-5870 [email protected]

Professional Summary:

Having 10+ years of IT experience in Analysis, Design, Development and Maintenance of various Business applications.
Worked closely with the Business Analysts to understand the requirements to develop better code.
Strong experience in Ab Initio Architecture, GDE, Co>Operating System.
Involved in all phases of DW&BI projects involving Requirements Gathering, Analysis, Design, Development, Testing, Implementation and Support.
Worked on various Ab initio components like Reformat, FBE, Sort, Rollup, join, Dedup Sort, Normalize, Lookups, Input Table, Output Table, Input File, Output File, Partition Components, etc.,
Working knowledge of AIR commands to check in/check-out, perform and other EME related operations, as well as M-commands.Used Ab initio as ETL tool to pull data from source systems, cleanse, transform, and load data into databases.
Involved in meetings with Business System Analysts and Business users to understand the functionality.
Created graphs using components like input/output table, input/output file, lookup, reformat, sort, dedup sorted, partition by key, partition by round-robin, broadcast, replicate, join, gather. etc.
Having Working Experience on Plans (Conduct>It), Continuous flows, XML components, Excel components in Ab initio.
Implemented Spark Streaming data pipelines to Ingest customer behavioral patterns from various sources on insurance domain.
Utilized Apache Spark with Python to develop and execute Big Data Analytics, identified areas of improvement in existing business by unearthing insights by analyzing vast amount of data.
Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools.
Implemented Spark Streaming job to process Petabytes of JSON data.
Perform Big data processing using Hadoop, MapReduce, Spark Scala, PySpark, Hive, Impala, HDFS.
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Collaborates with cross-functional team in support of business case development and identifying modeling method (s) to provide business solutions. Determines the appropriate statistical and analytical methodologies to solve business problems within specific areas of expertise.
Experience in designing, developing, execute and maintain data extraction, transformation, and loading for multiple corporate Operational Data Store, Data warehousing, and Data mart systems.
Used Dataiku as ETL tool as well Pyspark to pull data from various source systems, cleanse, transform, and load data into databases for business analytics and reports.
Enrich data from various sources like BC, CPC, SFMC, ADOBE, etc using Dataiku.
Worked with highly complex SQL queries & views based on the business requirements (BTEQ, SQL Scripts).
Worked with Hybrid joins for fulfilling the business requirements.
Ingested the reference data from Collibra to Teradata tables using Collibra REST API s.
Worked with MDM (Master Data Management) tool like Reltio to ingest member data from various sources using REST API s.
Proficient in SQL Server and T- SQL (DDL and DML) in constructing Tables, Normalization/ De normalization Techniques on database Tables.
Experience in Creating and Updating Clustered and Non-Clustered Indexes to keep up the SQL Server Performance.
Create / Manage Table Indexes, Query Tuning, Performance Tuning, Deadlock Management, Experience in Performance Tuning, Query Optimization
Experience in creating and managing fragmentation of Indexes to achieve better query performance. Experience in Performance Tuning and Query Optimization.
Implemented migration of our project from Unix to Linux environment.
Successfully Implemented migration of Project from Oracle to Teradata database.
Designed a Datawarehouse using Hive, created and managed hive tables in Hadoop.
Hands on in ETL process design and documentation like HLD, LLD and Data mapping documents.
Used PDL s to compute values for local parameters.
Strong SQL programming skills and some stored procedure development experience.
Well versed with various Ab Initio parallelism techniques and implemented number of Ab Initio Graphs using Data parallelism techniques.
Working experience on salesforce marketing cloud, salesforce email data and Have Knowledge salesforce components.
Developed Ab initio generic graphs to unload and load data from different kinds of sources and targets.
Hands on experience in development of Data Warehouses/Data Marts using Abinitio Co>OP, GDE, Component Library, Oracle and UNIX for mainly Banking/Financial/Credit Card industries.
Experience in Data Analysis, Data Validation, Data modeling, Data Cleansing, Data Verification and identifying data mismatch.
Experience in integration of various data sources with Multiple Relational Databases like Oracle, SQL Server, MS Access and Worked on integrating data from flat files.
Worked as scrum master and handled 15 people team to conduction daily stand-up meetings as well as assigned work.
Working experience with SSIS, modified existing daily, monthly packages, also migrated using Jenkins automated pipelines to multiple regions as part of SDLC.
Experience in defining, creating, documenting, verifying and executing Test cases and work with development team to resolve production issues, create basic test plan and performing functional testing, Integration testing and performance testing.
Having experience in implementing all phases of Data warehouse life cycle involving Design, Analysis, Development, Testing and extensive coding standards in areas of DWH.
Extensive knowledge of various Performance Tuning Techniques on Sources and Targets.
Extremely effective in System Development Life Cycle (SDLC) from analysis, design, development, testing and implementation in diverse range of software applications.
Team player and self-starter, capable of working independently, self-driven and can motivate a team of professionals.
Working experience with teams spread across many geographies and time zones
Quick learner and ability to meet deadlines and work under pressure.
Knowledge of System Analysis and Design and Object-Oriented Analysis and Design
Experience in reviewing and monitoring project progress, planning and managing dependencies and risks, Resource.

Technical Skills:

Databases Oracle 11g/10g/9i, SQL Server, Teradata, DB2, Hadoop, Hive
OS Unix, Linux, MS Windows XP/2003/2000
languages Spark, Scala, Python, C, Shell Scripting, SQL
ETL Tools Ab initio (GDE 3.1,3.5, 4.0.2.3 & Co-Ops 3.5), SSIS, Data Stage
Version Control Tools EME
Scheduling Tools Control-M, Ab Initio Console, Autosys, Tivoli


Professional Experience:

Data Engineer/ETL/Big Data Developer, BCBSNC, Durham NC 10/18- Till Date

Project Description:
Touchpoint Insights Platform {TIP} will enrich the big data platform with new consumer touchpoints & core data, develop and enable key insights for usage by activating layers and turn-off old TPH foundation framework.
Purpose of this Projects is Business can Execute the OLAP (Online analytical processing) types of Reports, the project is responsible for collecting streaming information from various sources in Complex unstructured fashion and put it into one shape (Structure Format). the shape of data depends on sources.

Roles and Responsibilities:

Experience in understanding of the Specifications for Data Warehouse ETL Process and interacted with the designers and the end users for informational requirements.
Improved performance of Ab Initio graphs by using various Ab Initio performance techniques like using lookups in memory joins and rollups to speed up various Ab Initio graphs.
Consolidating the code developed by all team members and fix issues by making deliverables as per schedule.
Using the Unix environments to run the wrapper scripts related to the ab initio jobs through the backend.
Addressing code related issues raised during all the phases of the project.
Used Ab initio as ETL tool to pull data from source systems, cleanse, transform, and load data into databases.
Involved in meetings with Business System Analysts and Business users to understand the functionality.
Created graphs using components like input/output table, input/output file, lookup, reformat, sort, dedup sorted, partition by key, partition by round-robin, broadcast, replicate, join, gather. etc.
Read data from Hadoop(avro format) and built extract out of it as per business requirement.
Implemented Spark Streaming data pipelines to Ingest customer behavioral patterns from various sources on insurance domain.
Involved in loading data from UNIX file system to Hadoop HDFS. Importing and exporting data into HDFS and Hive using Sqoop.
Loading data from different sources like Oracle, DB2, Teradata into HDFS using Pyspark ingestion framework and load into hive tables which are partitioned. Developed hive quires for analysis across different banners. Developed bash scripts to bring the log files from FTP server and then processing it to load into hive tables.
Implemented Spark Scala data pipelines to Ingest customer behavioral patterns from various sources on insurance domain.
Perform Big data processing using Hadoop, MapReduce, Spark Scala, PySpark, Hive, Impala, HDFS.
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Used Ab initio as ETL tool as well Pyspark/Python to pull data from various source systems, cleanse, transform, and load data into databases.
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
Involved in meetings with Business System Analysts and Business users to understand the functionality.
Adapts data integration processes to accommodate changes in source systems and meets new business requirements.
As an agile team member responsible for the quality of the code written by the team ensuring proper design reviews, code reviews, unit testing and integration testing.
Developed complicated graphs using components like input/output table, input/output file, lookup, split, normalize, reformat, sort, dedup sorted, partition by key, partition by round-robin, replicate, join, gather.
Read data from Hadoop (avro format) and built extract out of it as per business requirement.
Involved in various EME data store operations like creating sandbox, code check-in, code checkout, creating project parameters according to the environment settings for the application.
Implemented Spark Scala job to process Petabytes of JSON data.
Developed generic graphs for data-profiling, data-validations, data cleansing and process specific requirements.
Involved in writing processes to continuously capture the data from different servers across the country/state continuously.
Developed highly optimized stored procedures, functions, and database views to implement the business logic and also created clustered and non-clustered indexes.
Create / Manage Table Indexes, SQL Query Tuning, Performance Tuning, Deadlock Management
Developed system level and application-level configuration scripts using Shell Scripting. Also developed shell scripts for Batch Processing and start and end scripts for invoking the Ab Initio graphs.
Work in the UNIX environment using Shell Scripts for FTP, SFTP processes
Wrote several Shell scripts, to remove old files and move raw logs to the archives.
Process and Transform delta feeds of customer data, which comes in daily.
Involved in creation of complex SQL s as per the business requirement.
Participated in the code review process of various applications and provided inputs where
necessary that could improve efficiency of the graphs.
Created mockup data for the development environment to test the business functionality and involved in Unit testing and Integration testing.
Created a scalable program for transferring data from hybrid cloud structure to cloud (HDFS cluster to S3).
Worked on multiple AWS services like S3, EC2, SQS, Cloud Watch services, S3 trigger and lambda.
Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
Worked on several issues in production that are caused due to code/data or resources on the
server and resolved them immediately.
Environment: Ab Initio (GDE 3.5 & 4.0, Cooperating System3.3.3.4), Hadoop, Hive, SQL, AWS, Control-M, JSON, AVRO UNIX Shell Scripts.





AbInitio Developer, SunTrust 03/17-09/18

Project Description:
Medco recently purchased Ab Initio and is in the process of converting Client Benefit Management system (CBM) loading from manual to Automation, the project is about setting up the AB Initio environment and developing Applications for loading the CBM.

Roles & Responsibilities:
Developed graphs based on data requirements using various AB Initio Components such as Rollup, Reformat, Join, Scan, Normalize, Gather, Broadcast, Merge etc., making use of statements/variables in the components for creating complex data transformations.
Creating low level design documents by using High level design documents.
Used Lookups with Reformat component for fetching matched records based on the downstream process.
Worked on optimizing the performance of Ab initio graphs by using various performance tuning techniques like parallelism and skew reduction.
Worked in the UNIX environment using Shell Scripts for FTP, SFTP processes.
Worked with Partition components like partition by key, partition by Expression. Efficient use of Multi files system, which comes under Data Parallelism.
Worked on enhancement of existing Ab Initio applications and made graphs generic by incorporating parameters into the graphs and adopted the best practices to enhance performance for the graphs.
Assigned phases and set up check points to complex Ab initio graphs having large number of components to protect against failure, avoid deadlock and easy recovery of failed graphs
Experienced in interacting with EME Data store with several air commands.
Developed generic graphs using Ab initio i.e. generating surrogate keys, loading data into target table, unloading data from the source table.
Extensively Worked on ETL process through Plans and used PDL in making Entire graph to be Generic.
Good Knowledge Teradata database utilities, Fast Load, Multi-Load, and TPump.
Working experience with SSIS, modified existing daily, monthly packages, also migrated using Jenkins automated pipelines to multiple regions as part of SDLC.
Participated in Unit Testing of the developed components.
Interacting with Business users to understand each Interface in detail.
Responsible for gathering business requirement from users and technical analysis.
Worked on optimizing the performance of Ab initio graphs by using various performance tuning techniques like parallelism and skew reduction.
Worked in the UNIX environment using Shell Scripts for FTP, SFTP processes.
Environment: Ab Initio (GDE 3.2.2, Cooperating System3.2), EME (Enterprise Meta Environment), UNIX.

Ab Initio Developer, ADP 05/11-04/15
Responsibilities:
Designed and developed Ab Initio graphs based on the business requirement using Reformat, Filter By Expression, Rollup, Scan and other transform components.
Developed graphs to do complex calculation by using normalize and de-normalize components and to unload required data from database.
Created generic graphs and .pset files for reusability.
Modified and Migrated SSIS packages to multiple regions.
Worked using Data stage tool to create work stages to process the data based on requirement.
Involved in all phases of graph development.
Responsible for unit testing and creating test documents.
Responsible for creating the deployment plans.
Monitoring the jobs in UAT.
Debugging and resolving the code issues.
Assist production team during deployments.
Environment: Ab Initio GDE 3.1, Co>Operating System 3.1, UNIX, SQL, Teradata, Oracle 10g.

Education
Master s in computer science from Rivier University, Nashua, NH.
Bachelor of Technology in Electrical and Electronics, Jawaharlal Nehru Technological University.
Keywords: cprogramm business intelligence sthree information technology microsoft Colorado Delaware New Hampshire North Carolina

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];69
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: