Jai Data engineer - Sr. Data Engineer |
[email protected] |
Location: Sunnyvale, California, USA |
Relocation: Remote |
Visa: H1B |
Jai
Mail : [email protected] Ph: 732-746-0745 ________________________________________________________________________ Big Data Hadoop Cloud Snowflake AZURE Python Talend A Senior Data Engineer with 16 years of overall IT experience with an earned reputation for meeting strict deadlines and delivering mission critical solutions on time and within budget. Self-starter and well-motivated to rapidly acquire new skills on the job. I am good at technical report writing, presentation and communication skills. Subject matter knowledge to solve common and complex business issues within established guidelines and recommends appropriate alternatives. Works on problems and projects of diverse complexity and scope, Exercises- independent judgment within generally defined policies and practices to identify and select a solution. Have ability to handle most unique situations and acts as an expert providing direction and guidance to process improvements and establishing policies. Experience in programming with Python, SQL, T - SQL, HQL and PL/SQL. Experience implementing real-time and batch data pipelines using AWS Services, Lambda, S3, DynamoDB, Kinesis, Redshift, and EMR. Experience in AWS services such as S3, EC2, Glue, AWS Lambda, Athena, AWS Step Function, and Redshift. Able to use Sqoop to migrate data between RDBMS, NoSQL databases, and HDFS. Experience in Extraction, Transformation, and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating, and moving data from multiple sources using Snowflake, Talend and Bigdata Hadoop. Hands-on experience with Bigdata Hadoop architecture and various components such as Bigdata Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Bigdata Hadoop MapReduce programming. Used DataStage 11.5, 8.5 & 8.1 to extract data from various source to implement ECRRM Datawarehouse & Dynamic Pricing Project. Used IBM DataStage Designer and Information Analyzer to develop parallel jobs to extract, cleanse, transform, integrate and load data into Data Warehouse. Developed jobs in DataStage 8.1 using different stages like Transformer, Aggregator, Lookup, Join, Merge, Remove Duplicate, Sort, Row Generator, Sequential File and Data Set. Extensive experience as Oracle Developer in utilizing PL/SQL procedures, functions, packages, triggers, shell scripting, unit testing and involved in data extraction, transformation and loading operations on oracle using SQL Loader, Informatica. Expertise working with Oracle 10g, 9i, 8i, Developer 2000, PL/SQL, Oracle Forms 10g/6i, Reports10g/6i, SQL*Loader. Experience in performance tuning and SQL optimization using (Hints, Explain Plan, Auto Trace, and Tkprof), restructuring tables, demoralizing tables and build Materialized views. Adept Knowledge in writing UNIX shell scripts, scheduling jobs, DBMS_JOB and good understanding of ANSI standard C. Good experience in Data Modeling, Relational Database concepts and Entity relation diagrams (ERD). Extensively used ETL methodology for performing Data Profiling, Data Migration, Extraction, Transformation and Loading using Talend and designed data conversions from wide variety of source systems including Vertica, Oracle, MySQL, SQL server, Hive and non-relational sources like flat/Excel files, XML files. Good hands-on knowledge Tableau Desktop and Tableau Server. Proficient in dash boarding and understanding different charts in Tableau. Ability to drill down data and understanding various Levels of Detail in the report. Configuring interactive reports with filters and parameters in Tableau. Expert in using over 100+ components in Talend like Processing/Orchestration/Error Handling/Error Logging components. Expertise in creating complex Talend Jobs and fine tune existing jobs to enhance performance. Involved in design of dimensional models like Star Schema and Snowflake Schema. Involved in project planning and scheduling, System design, Functional Specification, Design specification, preparation of impact analysis, Coding, system test plan, Testing, code review, coordinating user testing and user training, Project demonstration and Implementation. Excellent understanding and knowledge of Bigdata Hadoop architecture and various components such as HDFS, YARN, Job tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm. Hands on experience installing, configuring, and using Bigdata Hadoop ecosystem components like Bigdata Hadoop MapReduce, HDFS, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume, Spark, HBase and Cassandra. Experience in Bigdata Hadoop 2.0 (MRv2) YARN architecture. Expertise in writing Bigdata Hadoop Jobs for analyzing data using MapReduce, Hive and Pig Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice versa. Experienced in writing custom UDFs and UDAFs for extending Hive and Pig core functionalities. Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper Experience in implementation of various Bigdata Hadoop file-formats and compression techniques like Sequence, Parquet, ORC, Avro, GZip and Snappy. Experienced in using NoSQL databases like HBase. Experience in reviewing Bigdata Hadoop log files. Experience in data management and implementation of Big Data applications using Bigdata Hadoop framework. Skill Set: Programming Languages SQL, PL/SQL, Unix shell scripting, Python, DataStage 8.7,8.5,8.0 Cloud AWS, S3, EMR, Glue, Athena, Redshift, GCP, GCS, Talend, Azure Data Factory, Azure Data Bricks Bigdata Hadoop Distributions Apache, Cloudera Hadoop Technologies HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Zookeeper and Flume Relational Data Stores Oracle, MySQL, Teradata Operating Systems Windows, UNIX EDUCATION: Bachelors in Electronics and Communication Engineering, India. County of Ventura, Ventura CA Criminal Justice Department Sr. Data Engineering Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current production state of application and determine the impact of new implementation on existing business processes. Extract Transform and Load data from Sources systems to Azure Data Storage services using a combination os Azure Data Factory, T-Sql and sybase. Data ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DQ) and process the data in Azure Databricks. Implement proof concepts for data replication from on prem to cloud and from cloud to on prem. Created pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Sybase, SQL Server, Blob storage, Azure SQL Data Warehouse, write-back tool and backwards. Responsible for estimating the cluster size, monitoring and troubleshooting of the data bricks cluster. Responsible for creating tunnels to connect Azure cloud with AWS cloud. Experienced in performance tuning of the Applications for setting right Batch interval time, correct level of Parallelism and memory tuning. Hands-on experience on developing SQL scripts for automation purpose. Created Build and Release for multiple projects (modules) in production environment using Visual Studi Team Services (VSTS). Environment: Sybase, SQL Server, Azure Data Factory, Azure Data Bricks, Azure Dev Ops, AWS DIRECTV INC. Los Angeles Sep 21 Dec 23 Sr. Data Engineer (AWS, Snowflake) Bulk loading from the external stage (AWS S3), internal stage to snowflake cloud using the COPY command. Loading data into snowflake tables from the internal stage using snowsql. Used COPY, LIST, PUT and GET commands for validating the internal stage files. Used Import and Export from the internal stage (snowflake) from external stage (AWS S3). Writing complex snow SQL scripts in snowflake cloud data warehouse for business analysis and reporting. Used SNOW PIPE for continuous data ingestion from the s3 bucket. Create clone objects to maintain zero-copy cloning. Data validations have been done through information schema. Experience with AWS cloud services: EC2, S3, EMR, RDS, Athena and Glue. Cloned production data for code modifications and testing. Performed troubleshooting analysis and resolution of critical issues. Developed stored procedures/ views in snowflake and used in Talend to load Dimensions and Facts. Very good knowledge of RDBMS topics, and the ability to write complex DQL, PLSQL. Experience in building and architecting multiple Data pipelines, end-to-end ETL, and ELT processes for Data ingestion and transformation in AWS. Used AWS API Gateway and AWS Lambda to get AWS cluster inventory by using AWS Python API. Worked on JSON schema to define tables and column mapping from AWS S3 data to AWS Redshift and used AWS Data Pipeline for configuring data loads from AWS S3 to AWS Redshift. Performed data engineering functions: data extract, transformation, loading, and integration in support of enterprise data infrastructures - data warehouse, operational data stores, and master data management. Responsible for data services and data movement infrastructures. Good experience with ETL concepts, building ETL solutions, and Data modeling. Architected several DAGs (Directed Acyclic Graph) for automating ETL pipelines. Hands-on experience in architecting the ETL transformation layers and writing spark jobs to do the processing. Created Partitions, Bucketing, and Indexing for optimization as part of Hive data modeling. Created data frames in SPARK SQL from data in HDFS and performed transformations, analyzed the data, and stored the data in HDFS. Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell. Architect & implement medium to large-scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB). Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Environment: Snowflake, AWS, S3, Glue, Redshift, Hive, Spark SQL, PLSQL Southern California Edison Jun 18-Sep 21 Data Engineer (Talend, Bigdata Hadoop, SAP, Snowflake, GCP) Southern California Edison (or SCE Corp), the largest subsidiary of Edison International, is the primary electricity supply company for much of Southern California, USA. Actively worked with the design team to understand the customer requirements and proposed effective Bigdata Hadoop solutions. Ingested structured data onto the data lake using Sqoop jobs and scheduled using Oozie workflow from the RDBMS data sources for the incremental data. Ingested Streaming Data (Time Series Data) into the data lake using Flume. Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks. Designed and implemented a cloud-based data processing pipeline using GCP services such as Google Cloud Storage and GCP BigQuery, resulting in a more scalable and cost-effective solution. Worked closely with data analysts and stakeholders to understand their requirements and provide scalable and efficient solutions using GCP services. Developed Map Reduce programs using Java programming language that are implemented on the Bigdata Hadoop cluster. Implemented filtering mechanism using map reduce programs to remove unnecessary records before moving data into Hive tables. Implemented optimized map side joins to perform large data set joins. Designed and implemented custom writable, custom input formats and custom partitioners. Involved in creating Hive Internal/External tables, loading with data and troubleshooting with Hive jobs. Experienced in Using Hive ORC formats for better columnar format, compression and processing. Experienced in configuring workflows using Oozie. Involved in deploying multi module applications into TAC. Using Redwood to schedule ETL Jobs on a daily, weekly, monthly, and yearly basis. Worked on over 100+ Talend components including Orchestration, Processing, Database& File components. Developed standards for ETL framework for the ease of reusing similar logic across the board. Scheduling and Automation of ETL processes with scheduling tool in TAC. Involved rigorously in Data Cleansing and Data Validation to validate corrupted data and send back to client. Migrated Talend Jobs /Job lets from Development to Test and to Production environment. Environment: Bigdata Hadoop, MapReduce, HDFS, HBase, Hive, Hive UDF, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Shell Scripts, SAP HANA, GCP, Tableau. AT&T (formerly DIRECTV INC.), Los Angeles Feb 15 May 18 Oracle Developer DOTCOM is the application for online self-care for DIRECTV. IT OPS handles incident management and problem management for dotcom along with ingesting all the user behavior data from the website and Engineering logs into Bigdata Hadoop cluster and run reports as per business requirements. Drove root-cause analysis of problems and shepherding their resolution through the DIRECTV software development lifecycle. Providing design solutions. Operations support, with a focus on incident, problem and change management in support of the ITIL Service Management framework. Experienced in working in an agile environment and on-site/offshore co-ordination. Written Complex SQL statements using joins, sub queries, inline views and Oracle Analytical Functions. Involved in Data analysis and reducing Data discrepancy of the Source and Target schema. Responsible for writing database triggers to validate business logic using PLSQL. Coded Reports and programs to support Billing functions in SQL, Oracle Reports and PLSQL. Conducted code walk through & internal/peer reviews. Assist in developing test scenarios, test scripts, and test data to support unit and system integration testing. Involved in comprehensive testing of the system to check it satisfies the Functional Specifications. Wrote technical specification requirements and provided production Support. Developed Sqoop scripts to import, export data from relational sources and handled incremental loading on the customer, transaction data by date. Wrote the shell scripts to monitor the health check of Bigdata Hadoop daemon services and respond accordingly to any warning or failure conditions. Involved in loading data from UNIX file system to HDFS. Used Pig and Hive as ETL tools to perform transformations, event joins, filter and some pre-aggregations. Environment: Bigdata Hadoop, MapReduce, Hive, Pig, Sqoop, Java (JDK1.6), UNIX Shell Scripting, Oracle 11g/12g, SQL* Plus, PL/SQL, TOAD, SQL Navigator, Erwin, Windows NT, UNIX. Bank of America Jan 13 Feb 15 ETL/DataStage Developer Worked on DataStage Designer, Manager, Administrator and Director. Worked with the Business analyst and DBAs for requirement gathering, analysis, testing and metrics project coordination. Involved in data extraction from different data sources like Oracle and flat files. Involved in creating and maintaining Sequencer and Batch jobs. Creating ETL job flow design. Used ETL to load data into the Oracle warehouse. Created various standard/reusable jobs in DataStage using various active and passive stages like sort, Lookup, Filter, Join, Transform, aggregator, Change Capture Data, Sequential Fille, DataSets. Used Remove Duplicates stage to remove duplicates from the data. Used designer and director to schedule and monitor jobs and to collect performance statistics. Creating local and shared containers to facilitate ease and reuse of jobs. Implemented the underlying logic for slowly changing dimensions. Executed pre and post session commands on source and target database using shell scripting. Environment: IBM Info Sphere DataStage 8.0.1, Teradata, Oracle 10g, Autosys, UNIX, Erwin, TOAD, XML files, MS Access database. Cap Gemini (India) /IKEA Dec 11- Jan 13 Oracle developer Computer support for the organization and management of warehouses has become imperative for timely, effective processing of logistic requirements within a company. GEMINI is one of such warehouse systems used by IKEA. GEMINI application is used for the Distribution centers across USA and Canada. Worked with Business Systems Analysts and Database Administrators to understand and implement functional and non-functional requirements into the project. Involved in the development of ERD Diagrams/ Dimensional diagrams for business requirements using ERWIN. Developed new hire report, Seasonal Hire Report, wages and tax info reports and modified the existing SQL reports. Extensively worked on Database objects (Tables, Views, and Indexes etc.) and stored procedures (PLSQL). Created Stored Procedures for Batch Processing and involved in moving large data using Bulk Insert/for all Methods. Fine-tuned Procedures for the maximum efficiency in various schemas across databases using oracle Hints, Explain Plans, Tkprof method and Trace sessions for Cost Based Optimization (CBO). Working knowledge of External Tables and Oracle Utilities. Responsible for developing scripts to migrate and convert data from SQL Server 2005 to Oracle Database. Knowledge of ADO.NET connections for accessing data from oracle, SQL Server 2005. Involved in the design and development of User Interface using Oracle Forms, which was integrated with menus, tabbed canvas and reports. These applications were incorporated in Oracle 9iAS. Designed and implemented data management systems using informatics tools such as ETL frameworks, databases, and data warehousing technologies, resulting in improved data quality, consistency, and accessibility. Used Informatica debugger in identifying bugs in existing mapping by analyzing data flow, evaluating transformations. Effectively worked in Informatica environment and used deployment groups to migrate the objects. Created user friendly Oracle Forms for immediate data lockup and retrieval. Developed many predefined and parameterized Discoverer workbooks for financial reporting needs. Prepared test case for the GUI screens and tested the application. Involved in designing and development of several web pages using visual studio, HTML, CSS, XSLT/XML and JavaScript. Environment: Oracle,10g / 9i, PL/SQL, Oracle Forms 6i/Reports6i, Toad, Windows NT, UNIX Shell scripts, SQL server 2005, Informatica Power center, HTML, CSS, XSLT/XML, JavaScript, Erwin, SCM. Cap Gemini (India) Jun 10-Nov 11 Oracle Developer SIM application is Part of Nissan/Renault s internal application to view various business statistics of all the major automotive companies in the world. Working with cross-functional teams studying the impact analysis and was involved in converting the business requirements into technical design document. Developed Stored Procedures, packages, Functions using PL/SQL to implement day to day modified business rules into the current system. Involved in Software Requirement Specification and Technical Design. Development, Testing, Implementation and Training with web Development techniques. Processed new accounts for customers and assisted existing customers with account maintenance. Worked with Developer/2000 Reports Designer to develop reports for validating data in system tables. Adherence to reporting procedures to provide information to Corporate Office from all regions. Collecting report requirements from the cross-functional teams and providing the decision supporting reports, performing analysis of the reports and presenting to the managerial staff. Implemented triggers to enhance functionality, supplement validation, and default transaction processing. Involved in developing the ETL mappings using Informatica. Performed database Normalization and end-normalization tasks. Analyzed the previous Interfaces in production, debugged the issues and modified them for better performance. Environment: SQL* Plus, PL/SQL, TOAD, SQL Navigator, Oracle 9i, Oracle Forms/Reports 6i, Erwin, Windows NT, UNIX, Informatica power center 7.1. Dow Jones Inc., South Brunswick, NJ Nov 08- Jun 10 Oracle Developer Dow Jones is a leading publishing company with great online products such as the Wall Street journal and Barron s Online. ICS continuity is designed to implement various business requests by the marketing and business users for the various online products of the WSJ and Barron s and also provided production support. Extensively worked on SQL Loader to load data from external system and developed PL/SQL programs to dump the data from temporary tables into Base tables. Involved in complete Software Development Life Cycle (SDLC). Written Complex SQL statements using joins, sub queries, inline views and Oracle Analytical Functions. Involved in Data analysis and reducing Data discrepancy of the Source and Target schema. Responsible for writing database triggers to validate business logic using PLSQL. Coded Reports and programs to support Billing functions in SQL, Oracle Reports and PLSQL. Developed many Inventory control reports like Stock status report, Customer comparison reports. Developed various forms for client s application using Developer/2000. Conducted code walk through & internal/peer reviews. Assist in developing test scenarios, test scripts, and test data to support unit and system integration testing. Involved in comprehensive testing of the system to check it satisfies the Functional Specifications. Wrote technical specification requirements and provided production Support. Environment: Oracle 9i, PL/SQL, Forms 6i, Reports 6i, SQL Navigator, VB, HTML, Windows/Unix Verizon Business Inc., Dallas, TX Sep 06-Nov 08 Oracle Developer and Data Analyst Verizon delivers advanced IP, Data, Voice and Wireless solutions to large business and government institutions. Essbase Reporting System built to support financial analysis and strategic decision-making processes. Various reports include custom built management reports and global performance reports are generated from this system. Worked on multiple projects and provided the interface between Oracle database and Hyperion system 9 tools like Essbase cube, financial reporting, HR reporting and Master Data Management. Designed, developed, debugged, tested and supported mission critical software solutions with PL/SQL to be used throughout the enterprise in the areas of Customer Service, Warehousing, Call Center, EDI, Order Entry, Inventory, Work Order Processing, Purchasing, and Reporting. Created tables, views, synonyms, indexes, sequences and database links as well as custom packages tailored to business requirements. Oracle developer skills using Developer 6i (reports/forms, SQL, PL/SQL, SQL*Plus, SQL*Loader and Toad against an Oracle database at release 9i or above. Created Functional indexes, Bitmapped indexes, Domain indexes and Btree Indexes to improved performance. Data Loading and data migration Used SQL Loader to load data from Flat files into staging tables and developed PLSQL programs and packages to load data from staging tables to Base tables. Supported current ETL processes, shared responsibilities for monitoring Data warehouse loading process. Environment: ORACLE 8i/9i, PL/SQL Packages, Procedures, Functions, Triggers, TOAD, DB2, SQL Server, Windows 2000 Xsilica Software Solutions Pvt. Ltd, India Jan 06-Aug 06 Associate Programmer Project: Order Processing System Inventory Control and Management system The Financial Accounting System is developed to record all the basic business transactions on a daily basis (online), generate daily/weekly reports (MIS) Cash Flow. Sales, Purchases, Stock Position, Debtors, Budget vs. Annual Monthly Reports-Trading and P&L, Balance Sheet, Debtors Age Analysis, Budget vs. Actual. The Inventory Control and Management system was developed to record stock inward and outward transactions. Preparation of Technical specification. Developing PL/SQL, SQL programs using Oracle 8i as back end and Developer 6i as front end. SQL and performance Tuning. Used SQL loader to load data from csv files into Oracle tables. Pl/SQL bulk programming. Used SQL to create cross tab reports. Developing forms and reports as per the requirements. Developing triggers, procedures, and functions for common usage of the development team. User Training. A good understanding of Java, JSP, web services. Environment: Oracle9i, Oracle Forms & Reports, Windows NT Keywords: access management business intelligence sthree database information technology microsoft procedural language California Colorado New Jersey Texas |