VIJAY - AWS CLOUD DATA ARCH |
[email protected] |
Location: St Louis, Missouri, USA |
Relocation: MO |
Visa: GC |
Accomplished data engineering leader with 15+ years of extensive experience in data management, data engineering, migration, integration and warehousing solutions, including on AWS platform.
Experienced in multiple industries- Banking, Life and Annuities (ACORD), Logistics, Property and Casualty insurance (Guidewire), Telecommunication (Oracle SCM E-Business Suite), Retail and Business Credit Bureau. Delivered several corporate data initiatives successfully by working with Business, Enterprise PMO office and IT management. Proven expertise in building architecture design and data modeling for 1) Enterprise data lakes 2) Enterprise Data Warehouses (EDW) 3) Operational Data Store (ODS), 4) Data Marts using Multidimensional and Dimensional modeling (Star and Snowflake schema) techniques 5) No-SQL modeling 6) Reference Data Models and 7) Master data management Architected and Implemented data solutions for the first generation data engineering and analytics using Data Integration tools(Informatica Suite, SSIS), Sources(Oracle, SQL Server, DB2, Mainframes, etc) , Datawarehouses(Oracle, SQL Server, etc) Architected and Implemented data solutions for the second generation data engineering and analytics using Data Integration tools(Informatica Big Data Suite, Talend), Data Sources(Cassandra, MongoDB, RDBMS, Salesforce, etc) , Hadoop ecosystem (HDFS, MapReduce, YARN, Hive, HBase, Pig, Sqoop, Flume, Kafka, Apache Spark, Zookeeper, Oozie) and MPP Datawarehouse systems(Teradata, Netezza, etc) Architected and Implemented data solutions for the third generation data engineering and analytics using Ingestion( DMS, Glue, Lambda, Python, Informatica IICS, etc) , Streaming(Spark, Kinesis tools, Kafka suite, NiFi, SNS/SQS) , Storage and Compute(S3, EMR, PySpark), Cloud Data warehouses( AWS Redshift and Snowflake) Architected and implemented Master Data Management (MDM) with realtime updates from users to adhoc exceptions with Informatica Data Quality (IDQ) and analyst using address standardizations routines, data cleansing, data matching, data conversion, exception handling, and reporting and monitoring capabilities of IDQ. Hands-on experience in improving performance of real-time jobs and batch cycles with hundreds of jobs by implementing tuning techniques at database level for sqls, at spark cluster for data processing and at Informatica code level Built CI/CD pipelines using GitLab and Terraform to deploy AWS services as Infrastructure as code and as well as data pipelines using these AWS services. Developed several frameworks to automate and decrease development, testing and deployment efforts in multiple projects. Lead development teams and mentored them to build high quality and high performing data applications. Helped them partner and communicate effectively with business teams and other IT teams to produce direct business outcomes. Strong believer in open source technologies and managed services. Always keep myself up to date with latest trends in the industry. Technical Skills: AWS Services SFTP, S3, EMR, DMS, Glue, Lambda, PySpark, Step Functions, SNS/SQS, Event Bridge, Lake Formation, RedShift, Aurora,DynamoDB, etc Databases & Datawarehouses Oracle, Microsoft SQL Server, DB2 UDB,Salesforce, Mainframes, Teradata, Netezza, Snowflake, Redshift, Cassandra, MongoDB, etc ETL Tools Informatica PowerCenter, Informatica Power Exchange, DT Studio. Informatica Web Services, IDQ, Informatica Cloud Sevices, NiFi, Matillion Reporting Tools Business Objects, Microstrategy Desktop and Web, OBIEE, Tableau, PowerBI Data Modeling ERWIN , ER Studio, Visio Industry Frameworks ACORD, Unified Data Model for HealthCare, HIPAA Scheduling Tools Autosys, Control M, Tidal, Airflow, etc PROFESSIONAL EXPERIENCE: Silicon Valley Bank (SVB) Jun 21 Current Cloud Data Architect SVB has a corporate initiative to build a modern data platform on cloud for all the data needs across enterprise. The platform is called One Data Platform that consists of a data lake, source aligned view layer and data warehouse. Initiative 1- Build data lake on AWS S3 Worked with infrastructure and networking teams to build out AWS infrastructure. Performed several POCs to define design patterns for ingesting historical data and on-going data from legacy file based systems, relational database systems, API into S3 storage Built configurable metadata driven framework to reduce development time effort for ingestion jobs into data lake. Built a framework using Pyspark to automate testing between source data and ingested data. Used Glue to build data pipelines to ingest data from APIs, databases and files. Built a framework using step functions, lambdas and Glue to mask the PII fields and as well as build data catalog on this ingested data. Implemented Lake Formation policies on top of the ingested data to control access. Trained and lead development teams to bring them upto speed on the design patterns and build data pipelines for ingesting data using AWS services Implemented CI/CD pipelines using Gitlab and Terraform to deploy AWS services and data pipelines. Built several lambdas to perform adhoc admin activities on the platform to house keep etc. Initiative 2- Build data warehouse on AWS Redshift Designed and implemented enterprise data models for the data warehouse for Banking, Credit Cards, Risks, Deposits, etc Built curation framework using step functions, lambdas and glue jobs to curate the data and load it to Redshift. Enhanced the framework to also be able to perform data movement and data quality checks within the data pipeline. Built several data pipelines using this framework to load dimension and fact tables. Implemented redshift data sharing to make curated data available in consumption accounts. Implemented optimization techniques to improve redshift performance. In-progress: Enhancing the testing automation framework to also work with curated data. Initiative 3- Build Conformed Storage Layer for Operational needs Architected the design to build source aligned view layer that provides historical and current snapshots of data from data lake Developed Glue jobs to cleanse, transform, integrate and load raw data layer to table format layer. Built metadata driven framework to build the pipelines to load this layer. Implemented security and access controls and policies for right personas to have right access to this layer. Evaluated lakehouse architecture to use one platform for data lake and conformed storage area. Environment: AWS Glue, Pyspark, S3, Event Bridge, Step Function, RedShift, Aurora, DynamoDB, Cassandra, Oracle, SQL Server, XML, MongoDB, Mainframe, DMS, Lambda, Glue, AWS, Python, JIRA, Tableau Globe Life, McKinney, TX May 17 Jun 21 Data Architect Torchmark Corporation is a life insurance company. Torchmark acquired several subsidiaries over the years and the management wanted to modernize the entire data platform integrating the operational data belonging to different domains for each of these companies in real time into one single data store, Enterprise Data Lake and Enterprise Datawarehouse and provide data to functional teams within each subsidiary for their operational and analytical needs. With this wealth of data collected and in place, management not only wanted to perform advanced analytics to learn customer demographic and behavioral patterns and thus identify better leads but also provide better service to customers by reducing operational costs and offering better products. Project 1- Build Operational Data Store Interacted with the Business SMEs and Technical SMEs to identify and understand the system requirements and conducted Impact and feasibility analysis. Designed the data extraction strategy from different operational systems into ODs Staging and then into ODS using Informatica PowerCenter Worked with Source system team to build the ODS/ TRIAGE (staging) model for different subject areas based on the various data feeds from different operational systems. Designed the logical and physical for various subject areas (Claims, Policies, Parties, Agents,etc) in ODS that would meet the data requirements. Developed High Level Design Documents that lists the Data extraction, Data loading, Data Transformation and Error handing techniques. Also wrote Source to target mapping specifications. Fine tuned workloads on SQL Server for high usage real-time API calls. Built Informatica Automation framework to build ETLs dynamically that load data into ODS/TRIAGE. This framework cut down development time by 50%. Created mappings using the Transformations like Source qualifier, Aggregator, Expression, Dynamic lookup, Router, Filter, Rank, Sequence Generator, Update Strategy and User Defined Transformation for sourcing data from on-prem databases and load into RedShift Used IICS Data integration to load data mainframe files into ODS staging. Was part of building Client 360 view using Informatica Data Quality Services that get fed from many policy admin systems. Standardized addresses of the clients using Address Doctor. Performed extensive debugging and performance tuning of mappings, sessions and workflows including partitioning, memory tuning and cache management. Maintained warehouse metadata, naming standards and warehouse standards for future application development. Developed shell scripts for profiling and tracing back the mainframe data sets to actual source. Involved in writing SQL scripts, Stored procedures for archival purposes and debugging them. Involved in Unit testing, System testing to check whether the data loads into target are accurate, which was extracted from different source systems according to the user requirements. Worked with Scheduling team to set up the Autosys schedule for jobs that will load historical data for Financial Regulatory purposes. Migrated repository objects and scripts from DEV to PRE-PROD environments. Extensive experience in troubleshooting and solving migration issues and production issues. Provided On-Call Support . Project 2- POC to migrate ODS(SQL Server) to AWS RDS Postgres Designed and implemented AWS DMS to perform one time history load and ongoing CDC replication from on-prem database into AWS RDS Postgres. Converted the physical schema and database functions/procedures to be compatible with AWS RDS Postgres. Enabled AWS DMS Data validation to verify data was migrated accurately. Published, educated and trained ODS ETL teams on end to end migration strategy and implementation to load data from ODS on-prem to AWS RDS platform. Environment: Informatica IICS, Informatica Power Center 10.1.1, Power Exchange 10.1.1, Informatica Data Quality, AWS Glue, Pyspark, Step Function, Snowflake, Oracle, SQL Server, XML, MongoDB, Mainframe, DMS, Lambda, Glue, AWS, Python, Unix, Linux, Flat files, Toad, SQL Workbench, SQL Server Studio, PL\SQL, JIRA, , Tableau XPO Logistics, Portland, OR Jun 16 May 17 ETL Lead XPO logistics does business in multiple subsidiaries 3PL, LTL, Last Mile,etc in multiple countries. XPO wanted to bring the data from each of these individual divisions and build a corporate datawarehouse that can cater to reporting needs of corporate power users from various departments and top level executives. Worked with IT teams from various departments and understood the data architecture, data relationships and anomalies. Worked with business teams in establishing the rules to cleanse, unify and integrate data from these systems by profiling data and identifying the patterns with various kinds of fuzzy logics. Worked with the management to roadmap the deliverables by subject area from different departments and mapped the benefits vs risks based on the maturity and ability of the systems. Built logical and physical data models for 1) operational data store that stores data from these different systems, 2) corporate datawarehouse (integration layer) and 3) Subsequent Datamarts (Analytical Layer). Architected ETL extraction strategy, audits and error handling mechanism. Ingest data into Hadoop / Hive/HDFS from different data sources. Created Hive External tables to stage data and then move the data from Staging to main tables Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed. Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data. Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS. Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard. Involved in developing Map-reduce framework, writing queries scheduling map-reduce Developed the code for Importing and exporting data into HDFS and Hive using Sqoop Built reference data by profiling and identifying patterns of data. identified and eliminated duplicates in datasets thorough IDQ components of Edit Distance, Jaro Distance and Mixed Field matcher, It enabled the creation of a single view of customers, help control costs associated with transportation of assets by preventing longer routes. Designed ETL mappings to consume and produce data from/to SOAP and REST APIs using web services and http transformations to communicate with services that talk to devices on trucks. Worked with users to plot the data and build visuals and apply them to pilot programs for specific operational activities. Developed ETL Programs to build aggregate tables and perform data retention tasks for the summary tables. Extensively used HTTP transformation and Web Service Consumer Transformation to send and receive data from external vendors. Spent significant amount of time on fine tuning various EDW jobs at various levels. Some of them needed performance at informatica level and others needed at database level(not only SQL tuning, but table reorgs, table partitionings, Uncompression,etc) Supported users to perform and complete the user acceptance testing by, setting up the data and then logging and fixing the bugs. Helped the reporting team to map the attributes and metrics in the reports to physical fields in the datawarehouse. I was part of team that performed POC to evaluate informatica cloud services to load data into Redshift datawarehouse using data synchronization and mapping designer services. Worked with business to not only implement their enhancements to existing production data but also help them identify what they can get out of warehouse. Managed, coordinated and reviewed development work from development team to validate the implementation of ETL designs and patterns. Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, MongoDB, PL/SQL, Python, Informatica Power Center 9.6.1, IDQ 9.6.1, Informatica Cloud, Oracle 11g/12c, Unix, Linux, Sql Server 2008, Sales Force, Flat files, Web Services, XML, Toad, SQL Server Studio, PL\SQL,JIRA, SOAP, REST Redshift, Tableau, ,Goldengate,DB2, Mainframe, Power Exchange 9.6.1 AT&T, Alpharetta, GA Feb 14 Jun 16 Sr. ETL Consultant AT&T has implemented the Oracle ERP system for its mobility division. Thus, a need for a new datawarehouse system arose from the users to report mainly on Supply Chain Management data and few more systems around it. Worked with business and IT stakeholders to create and plan roadmap for business analytics. Interacted with business customers and architects to gather and understand the data requirements and identify the data sources. Worked with program management to plan releases for projects and enhancements by providing estimates and risks. Worked extensively with subject matter experts and design team to come up with data models that meet the data requirements. Worked heavily on the informatica mappings to pull and store data into EDW from different sources like Oracle, Salesforce, Flat files and SQL server. Wrote the reusable code (mapplets and transformations) for other team members to reuse where applicable. Wrote Unit and Integration test plans for various modules like Sales, Transportation, Orders and Shipment subject areas. Supported users to perform and complete the user acceptance testing by, setting up the data and then logging and fixing the bugs. Helped the reporting team to map the attributes and metrics in the reports to physical fields in the datawarehouse. Performed admin responsibilities for informatica technology stack like Upgrades, Apply Patches, Code Migrations, User Maintenance, proactive notifications for systems resource utilization and availability. Performed responsibilities of BI operations lead to manage and support various subject areas, daily batch jobs, real time jobs. I was also responsible for data consulting and ETL questions that users from other departments needed answers for. Worked with business to not only implement their enhancements to existing production data but also help them identify what they can get out of warehouse. Assigned tasks to offshore team. Managed, coordinated and reviewed development work from offshore. Environment: Informatica Power Center 9.0.1/9.5.1, Oracle 11g, Visio,DAC, OBIEE ,Unix, Linux, Sql Server 2008, Flat files, Toad, SQL Server Studio, PL\SQL American Family Insurance, Madison, WI Sep 10 Feb 14 Sr. ETL Consultant American Family Insurance (AmFam) is one of the leaders in providing Personal, Property and Casualty insurance policies. American Family is currently executing a major corporate initiative, the Advance Program. This program is replacing a number of operational systems including the policy administration and billing systems. Project Advance is a multi-year reinvention strategy designed to transform the company and be more forward-looking, customer-focused due to changes in the industry and customers expectations. Advance encompasses new personal lines products, Life insurance and Annuity products, more sophisticated pricing for personal and commercial lines and new service options and systems across the company. Responsibilities: Interacted with the Business users to identify and understand the system requirements and Conducted Impact and feasibility analysis. Worked with Integration architect in developing the data extraction strategy from different operational systems into ODS/TRIAGE (staging) and then into EDW and Data marts (Analytical Data Layer). Worked with Source system people to build the ODS/ TRIAGE (staging) model for different subject areas based on the various XML data feeds from JMS that talks to Guidewire. Worked with data architect and data modeler in designing the logical and physical for various subject areas(Claims, Policies, Parties, Agents,etc) in EDW (CSA) that would meet the data requirements. Developed High Level Design Documents that lists the Data extraction, Data loading, Data Transformation and Error handing techniques. Also wrote Source to target mapping specifications. Sourced data from multiple operational platforms- DB2, Oracle and MQ Series on near real time basis. Created mappings using the Transformations like Source qualifier, Aggregator, Expression, Dynamic lookup, Router, Filter, Rank, Sequence Generator, Update Strategy and User Defined Transformation. Created Mapplets, reusable transformations and used them in different mappings. Used B2B Data Studio transformation to parse XMLs and integrated them with PowerCenter to consume data from Guidewire. Performed extensive debugging and performance tuning of mappings, sessions and workflows including partitioning, memory tuning and cache management. Developed shell scripts for profiling and tracing back the mainframe data sets to actual source. Involved in writing SQL scripts, Stored procedures for archival purposes and debugging them. Involved in Unit testing, System testing to check whether the data loads into target are accurate, which was extracted from different source systems according to the user requirements. Worked with Scheduling team to set up the Autosys schedule for jobs that will load historical data for Financial Regulatory purposes. Migrated repository objects and scripts from DEV to PRE-PROD environments. Extensive experience in troubleshooting and solving migration issues and production issues. Provided On-Call Support Environment: Informatica Power Center 8.5/9.0.1,Oracle 11g, DB2, Informatica Power Exchange 8.5/9.0.1, DT Studio, JMS, MQ Series, Informatica Web Services Hub, ER Studio, Win SQL, SQL Developer, Guidewire, Autosys Dean Health Care, Madison, WI May 09 Sep 10 Sr. ETL Developer Project 1- ARMOR EDW This project ARMOR EDW was aimed to build a new Enterprise Data Warehouse that replaces existing legacy data warehouse. The reason to build a new one is all of the operational data from the systems (old) is being converted to a new system called Metavance. The data from Metavance and the legacy operational systems need to be integrated even before the old data is converted and put into warehouse so that the downstream applications use the data seamlessly. Project 2- HEDIS This project was aimed to rewrite/convert the existing HEDIS packages to Informatica Mappings. There were also new requirements to many of the extracts that will be sent out to concerned government agency for rating purposes. Responsibilities: Interacted with the Business Personnel to analyze the business requirements and transform the business requirements into the technical requirements. Prepared technical specifications for the development of Informatica (ETL) mappings to load data into various target tables. Designed and Developed ETL logic for implementing CDC by tracking the changes in critical fields required by the user. Developed standard and re-usable mappings and mapplets using various transformations like expression, aggregator, joiner, source qualifier, router, lookup Connected/Unconnected, and filter. Created and deployed a mapping as a web service provider using PowerCenter and Web Services Hub Extensive use of Persistent cache to reduce session processing time. Identified performance issues in existing sources, targets and mappings by analyzing the data flow, evaluating transformations and tuned accordingly for better performance. Modifying the shell/Perl scripts to rename and backup the extracts. Used Workflow Manager for creating, validating, testing and running the sequential and concurrent sessions and scheduling them to run at specified time and as well to read data from different sources and write it to target databases. Wrote Stored Procedure using PL/SQL to archive 2 years worth of data on a rolling basis. Implemented screen door process for cleaning flat files as per the business requirements. Preparing ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment. Involved in Unit testing, User Acceptance Testing to check whether the data loads into target are accurate, which was extracted from different source systems according to the user requirements. Maintaining issue log of the issues during the UAT phase for future reference Build the new universes as per the user requirements by identifying the required tables from Data mart and by defining the universe connections. Converted Desktop Intelligence Reports to Web Intelligence using Report Conversion Tool Creation of reports like Reports by Period, Demographic reports and Comparative Reports. Preparing and using test data/cases to verify accuracy and completeness of ETL process. Lead the team to recover from the Data warehouse corruption. Put the schedule and developed the plan to recover in a short time. Actively involved in the production support and also transferred knowledge to the other team members. Co-ordinate between different teams across circle and organization to resolve release related issues. Environment: Informatica Power Center 8.1, Erwin, Autosys, Flatfiles, Oracle, AIX K-Shell scripts, ITSM, Quality Center, Erwin KOHLS CORPORATION, Menomonee Falls, WI Jan 08 May 09 ETL Developer Kohl s is one of the nation s largest retailers. The project is aimed to build a data mart which will provide merchants ability to report on Order Planning, Weeks of supply data and merge it with enterprise datawarehouse. The project s name is Service Level, Quantum and Projection Analysis Reporting . This data mart is the biggest data mart that Kohls has. Responsibilities: Analyzed the system for the functionality required as per the requirements and involved in the preparation of Functional specification document. Performed extensive data analysis along with subject matter experts and identified source data and implemented data cleansing strategy. Prepared technical specifications for the development of Informatica (ETL) mappings to load data into various target tables and defining ETL standards. Worked with various Informatica client tools like Source Analyzer, Warehouse designer, Mapping designer, Mapplet Designer, Transformation Developer, Informatica Repository Manager and Workflow Manager. Created mappings using different transformations like Source Qualifier, Joiner, Aggregator, Expression, Filter, Router, Lookup, Update Strategy, and Sequence Generator etc. Developed mapplets and worklets for reusability. Implemented weekly error tracking and correction process using informatica. Modified BTEQ scripts to load data from Teradata Staging area to Teradata data mart. Created Scripts using Teradata utilities (Fast load, Multi load, Fast export). Involved in creating secondary, join indexes for efficient access of data Based on the requirements Created maestro schedules/jobs for automation of ETL load process. Involved in performance tuning and optimization of Informatica mappings and sessions. Involved in Unit testing, User Acceptance Testing to check whether the data loads into target are accurate, which was extracted from different source systems according to the user requirements. Prepared test data/cases to verify accuracy and completeness of ETL process. Environment: Informatica Power Center 8.6.1,PowerExchange 7.1, IBM MQ Series, XML, XSD,Oracle 9i, DB2, SQL Server, Flat File Delimited , HP - Unix, Erwin 7, Documentum, Teradata SQL Assistant, Fast Load, Fast Export, Multi Load and BTEQ Infosys Technologies Limited, India Jun 06 Dec 07 ETL Developer Infosys is a Software consulting and Implementation Company. Project: Dun and Bradstreet EDW Responsibilities: Understood the data requirements and wrote the source to target specifications. Performed extensive data analysis along with subject matter experts and identified source data and implemented data cleansing strategy. Sourced data from DB2 and SQL Server and loaded data into Oracle EDW. Created complex and robust mappings and Mapplets using Mapping designer and Mapplets to generate different mappings for different loads. Fine-tuned existing Informatica mappings for performance optimization. Worked as a technical support for a team that involved technical & quality reviews of program codes and PL/SQL blocks for optimization and maintaining standards & guidelines Developed Database Triggers in order to enforce complicated business logic and integrity constraints, and to enhance data security at database level. Promoted the new code to production through unit and system testing. Environment: Power Center 7.1, Oracle, JIRA, Unix, DB2, SQL Server, Power Exchange, Mercury Quality Center Keywords: continuous integration continuous deployment message queue business intelligence sthree database information technology hewlett packard procedural language Colorado Georgia Texas Wisconsin |