Sudheer Lingaiah Kopparthi - ETL Data Engineer / DataStage Developer |
[email protected] |
Location: Marietta, Georgia, USA |
Relocation: Yes |
Visa: H1B |
Sudheer Lingaiah Kopparthi
ETL Data Engineer/DataStage Developer +1 640-529-4141 [email protected] Marietta,GA Yes H1B Summary: Overall, 13+ years of expertise in Information Technology as Data Engineer with an expert hand in areas of Database Development, ETL Development, Report Development and Big Data Technologies. Design, Development, documentation, and implementation of Data Warehousing / Client-Server solutions in various industries such as Banking, Manufacturing, Telecommunications and Insurance Worked extensively in analysis, design and development of various applications using DataStage, Talend, Oracle, DB2, Hadoop, VB on Windows NT\2000\XP and UNIX environments Expertise in developing DataStage/Talend jobs, user creations, permissions, assigning roles, DataStage health check activities, addressing production issues like performance tuning and enhancement, migrations and maintenance of different environments Strong analytical, presentation, problem solving and excellent inter-personal and communication skills with Source flexibility to learn emerging/latest technologies Team Player with ability to communicate effectively during the entire SDLC process and to stakeholders at various levels. Technical Expertise: Hands on experience using Hadoop ecosystem components like HDFS, Yarn, Hive, Oozie, Sqoop, Flume, Zookeeper, HBase, Spark, Spark-Streaming, Spark-SQL, Kafka. Experienced in complex sql queries like Subqueries, Co-related sub queries, Inline views, Aggregate functions. Handled performance tuning on SQL Queries by writing indexes, Hints, Partitions and Table analyzation. Experienced in Joins, Constraints and Sql Functions (Analytical, Conversion, date etc...) Expertise in PySpark on AWS (EMR, S3) to create HDFS files with Structured streaming along with Apache NiFi workflows on NoSQL environment. Design and Develop ETL process in AWS Glue to migrate claims data from external sources like S3, ORC/Parquet/Text files into AWS Redshift. Worked on AWS Data Pipeline to configure data loads from S3 to Redshift. Developed the PySpark code for AWS Glue jobs and for EMR. Generated script in AWS Glue to transfer the data and utilized AWS Glue to run ETL jobs and run aggregation on PySpark code Created, monitored, and restored Azure SQL database and performed migration of Microsoft SQL Server to Azure SQL database. Experienced in building Data Warehouse in Azure platform using Azure data bricks and data factory. Developed Python AWS serverless lambda with concurrent and multi-threading to make the process faster and asynchronously executing the callable. Extensive experience in migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data Warehouse, as well as managing and giving database access and migrating on-premises databases to Azure Data Lake stores using Azure Data factory Achieved business process automation via applications developed using Git, Jenkins, MySQL and custom tools developed in Python and Bash. Extensive experience in setting up CI/CD pipeline using tools such as Jenkins, Bit Bucket, GitHub and Maven. Experience in Designing, Architecting and implementing scalable cloud-based web applications using AWS. Experience in Migrating On-Premise database to Azure data lake. Certifications: Certified in IBM Certified Solution Developer-Infosphere DataStage v8.5. Certified Banking Acumen in Bank of America- Sep 2016 Certified Financial Research and Report Writing in Bank of America- Nov 2017 Certified in Teradata 14 Basics professional in Pearson VUE-May 2018 Certified Cloudera Admin CDP Private cloud base in Cloudera-2021 Technical Skills: Programming Language SQL, UNIX shell scripting, Python, Scala. Cloud Environment Amazon Web Services (AWS), Azure (ADF & Databricks). Big Data Spark, Hive: LLAP, Beeline, HDFS, Sqoop, HBase, Oozie. Hadoop Distributions Cloudera, Hortonworks Dashboard Ambari, Elastic Search, Kibana. DBMS Oracle, SQL Server, MySQL, Db2 Operating System Windows, Linux, Solaris, Centos, OS X Scheduling Tool Autosys and ControlM RDBMS Tools TOAD 8.5/ 8.0, SQL*Loader (Oracle), SQL*Plus ETL Tools DataStage, Talend. Testing/Defect Tracking HP Quality Center Professional Experience: Client: Iquadra Information Services LLC., Marietta, GA. Nov 22 -Till Date Role: Data Engineer Project: Enterprise Data Warehouse (EDW) Unum offers group and individual life insurance coverage to suit your employees offered through your employer, can provide a safety net, helping to secure your finances when the unexpected happens. When you are unable to work due to sickness or injury, Disability Insurance can replace a portion of your income. The EDW team is responsible for part of the process involving Extracts from various source systems/Client systems to the EDW (Enterprise Data Warehouse). Once in the EDW environment, these extract files are transformed, enriched and loaded into the EDW. Responsibilities: Design and develop ETL processes in AWS Glue to migrate Claims data from external sources like S3, ORC/Parquet/Text files into AWS Redshift Designed, and built scalable distributed data solutions using with AWS & planned migration plan for existing on-premises Cloudera Hadoop distribution to AWS based on business requirement Worked closely with Leadership and Data Governance to define and refine the Data Lake platform to achieve business objectives Developed Spark framework to load the data from AWS S3 to Redshift for data warehousing. Developed Python/SQL scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy and consistency. Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL and Teradata database. Created ETL flows to Integrate the On-Prem data to Cloud AWS S3 Buckets. Worked on AWS data pipeline to configure data loads from S3 to into Redshift Translate detailed business requirements into optimal database and ETL solutions. Create an ETL jobs to load data into AWS S3 buckets by performing cleansing. Create a flow to load data from Amazon S3 to Redshift using Glue. Diagnose complex problems, including performance issues, and driving to resolution. Write reusable, testable, and efficient code in Python. Configure Data Governance on Business requirements and specifications. Assist Cloud Architect design overall for data migration solution. Connect to extract the (meta) data from multiple source systems that comprise databases like Oracle, SQL Server, Hadoop etc. into data governance platform. Develop DataStage Parallel Extender jobs using different stages like Aggregator, Join, Merge, Lookup, Source dataset, External Filter, Row generator, Column Generator, Change Capture, Copy, Funnel, Short, Peek stages etc. Work with other developers, designers, testers and architects to verify that the application meets business requirements with converted data Follow strong security coding practices to ensure the conversion process is free of most common coding vulnerabilities Participate in design and code reviews Environment: DataStage v11.7, Python, Spark, Spark SQL, AWS (EC2, S3, Redshift, Lambda, Glue, Athena), Putty, SQL Server, Rally, Service Now, Tableau. Client: I Crest SDN BHD, Kuala Lumpur, Malaysia Dec 20 -Nov 22 Role: Lead Data Engineer Project: Adobe Experience Platform (AEP) Digi is one of the largest telecommunications service-provider in Malaysia. Digi wants to implement the next generation BI stack on cloud in each of the participating BUs along with continued support for Front-line Integration. The shift towards a long-term strategy to address challenges in the digital business has driven Telenor Group to look for cloud-based analytics technologies. Consolidating and centralizing customer data sets as a foundation/base, and as the single view or single-source-of-truth. The Scope is to transform the way users work with data through Interactive Business Analytics and Dashboards. Responsibilities: Demonstrated abilities to manage a project through its complete lifecycle viz. Initiation, Planning and Design, Execution, monitoring and controlling through Closing Used Azure DataLake (ADLS) as the source and determined the Data Warehouse solutions using external table on Azure SQL Data Warehouse (Azure DW). Primarily involved in Data Migration process using SQL, SQL Azure, Azure storage and Azure Data Factory for Azure Subscribers and Customers. Worked with Azure SQL database transactional applications to work with front end developers. Created Databrick notebook to streamline and curate the data for various business use cases. Also, mounted blob storage on Databrick. Created pipelines in ADF using linked services to extract, transform and loaded the data from multiple sources like Azure SQL, Blob storage and Azure SQL Data Warehouse. Performed ETL using Azure Data Bricks. Migrated on-premises Oracle ETL process to Azure Synapse Analytics. Decoded the raw data and loaded into Json before sending the batch streaming file over the Kafka producer, received the Json response in Kafka consumer in Python. Worked on the core and Spark SQL modules of Spark extensively. Experienced in loading data into HDFS using Sqoop as well as saving data in Hive. Worked on HDP Hadoop, cluster configuration and the eco system components like OOZIE, HBase, and HIVE. Used Spark-SQL and PySpark and developed spark applications for data extraction, transformation and loading from various file format to analyze and transform the data into user friendly. Environment: Python, Azure Data Factory, Blob Storage, Scala, Spark v2.0.2 Hive, Denodo, Nagios, Bitbucket. Client: Capgemini Services, Kuala Lumpur, Malaysia Mar 20 -Dec 20 Role: ETL Developer Project: Parallel Data Warehouse (PDW) PDW server infrastructure of DataStage servers and EBBS A-level DB server are obsolete. The lack of vendor support and the availability of hardware replacement in the event of hardware failure are now a key operational risk impacting finance/risk, retail analytic business users and regulatory submissions. The primary purpose of the project is to migrating the 15 projects from DataStage 8.1 to DataStage 11.7 version. Responsibilities: Responsible for deployment of Talend jobs, optimization and automation Understanding business rules and transforming data accordingly Worked on creating job designs to migrate the data using joins, tMap, tLogRow, tDBinput, tDBoutput, expression filters, and rejection errors etc. in Talend data studio for integration Worked on ts3 & tazurestorage component, aws storage and azure blob mechanism Design of Technical Documents and preparation of Unit Test Cases Keep track on every migration by handling meetings on daily basis to deliver the data successfully. Good knowledge on ts3 bucket and Microsoft storage explorer Environment: DataStage 8.1,8.5 & 11.7, Talend 7.2, AWS, Jira, Remedy, Git. Client: Soft Reflexes SDN BHD, Kuala Lumpur, Malaysia Mar 19 -Feb 20 Role: ETL Developer Project: Application Lifecycle Management (ALM) The ALM Project is undertaken to implement an asset & liability management system to map raw data information into the analysis engine for balance sheet and risk management purposes. Data fields are hence required to be extracted from source systems for populating current profile and simulation of risk exposures. Require products and transactional information as data inputs into ALM system for it to produce liquidity risk and scenario testing for better balance sheet and liquidity management. This project provides an all-in-one central repository for converged Data Platform that helps bank to integrate and analyze a wide variety of online and offline customer data (AM-Insurance, AM-Medlife, Am-Islamic. etc) which includes reduction rate calculation, advance amount, payment collection. There are 46 existing report layouts have to be modified based on the new changes and 5 new reports to be extracted as CSV file The data will be stored in Hadoop file system and processed using Hive and MR jobs. Ingestion or acquisition of data will be done through Sqoop for data transformation and aggregation, filtering, grouping. Responsibilities: Developing Sqoop jobs with incremental load from heterogeneous RDBMS (Netezza & SQL Server) using native dB connectors into HDFS. Designed Hive repository with external tables, internal tables, buckets, partitions, ACID property, UDF and ORC compressions for incremental data load of parsed data for analytical & operational dashboards Created Hive external tables for the data in HDFS and moved data from archive layer to business layer with hive transformations. Performed Import and Export of data into HDFS and Hive using SQOOP Working on IBM Infosphere Information Server Manager to move, deploy and control DataStage assets from one system to another. Involved in Production Support and Operations teams to resolve production issues in a timely and efficient manner. Data consolidated in Hive into Elastic search for visualization through Kibana. Environment: DataStage v11.5, Hadoop- Hive, Sqoop, Autosys, Super PuTTY, Kibana. Client: Bank of America, Charlotte NC Mar 19 -Feb 20 Role: ETL Developer Project: Bank of America CARD Information system (BACARDI) Bank of America CARD Information system (BACARDI) is a card services data mart hosting card and transaction information for Consumer, Small Business, Large Commercial, Merchant and Government Cards. The application users are across USA and EMEA regions. Bacardi receives and extracts data from various SORs within cards space and various consumer data sources and imports this data into tables. This information is then utilized to develop models, assign scores, perform analytics and generate leads. Bacardi data is mainly used to provide financial, performance and management reports. Bacardi currently supplies data to approximately 20+ applications via 200+ extracts and there are large number of users running queries and accessing data on Bacardi (1500+ concurrent users). This project provides an all-in-one central repository (Gold /Silver & Bronze layer) for converged Data Platform that helps bank to integrate and analyze a wide variety of Card data received through many systems including the e-commerce transactions, clickstream data, e-mail, point of sale (POS) systems and call center records. Analyst can analyze this data to generate insights about individual consumer behaviors and preferences, and offer personalized recommendations such as flexible offers and Loyalty and Reward Programs. Responsibilities: ETL IBM DataStage Developer with emphasis on Hadoop Data Processing - Coding and testing complex (data from multiple sources such as SQL databases and Hadoop) IBM DataStage jobs. Provide technical expertise on ETL design,Data Warehousing, and Data integration. Collaborate with Business Intelligence and business operational teams to understand data sources and implement new requirement. Created several Job Sequences for the maintenance of Data Stage jobs using Wait for file, Job Activity. Used File Connector stage to Pull and Load data from HDFS file system. Responsible for the ongoing operational stability of the ETL processes ensuring they are properly monitored and audited to provide data integrity and timeliness of delivery Working with Onshore business partner for requirement gathering and estimation. Analyze and Resolve the Problem tickets created by Production Batch teams through Break-Fix. Providing KT to L1/L2 teams for new/enhancements installs in the production. Setting up internal and external file transmissions through different gateways and tools to facilitate data movement between upstream and downstream. Participating in weekly team meetings and status updates calls with Onshore the post them with latest updates on the assigned tasks and progress. Go through change management process like create CRQ, get approval and implementation. Environment: DataStage v11.7/9.1, DB2 LUW 11.1, AIX v7.2, Hadoop- Hive, Sqoop, Autosys. Client: HCL Technologies, Singapore Nov 13 Sep 15 Role: ETL DataStage Developer Project: Basel CDW To comply with the Basel International Banks accord, Standard Chartered bank has developed two global warehouses one is Wholesale Banking Data Warehouse and the other is Retail Banking Data Warehouse. Basel is a wholesale banking data warehouse and deals with credit data. It has two components SORSA and QDW. The SORSA data warehouse is built using the IBM BDW framework and is an E-R model, QDW (Query able Data Warehouse) which stores the credit risk data alone is a Star schema model. CDW has 65+ different systems based on the different transactions performed in the Bank. The system receives Customer, Limits, Collateral and Outstanding balances information from all the TP systems operational around 65 countries. The data is loaded into warehouse on Daily basis, Weekly Basis and Monthly basis. First the data is extracted into staging area, and then this data is validated in Transformation level and loaded to T- tables. In loading part, the data from T- tables is loaded into different target tables and these tables are present in the separate database. The data from BDW goes into QDW where all the Dimension and Fact tables are present. Responsibilities: Responsibility to Provide feedback on business requirements documentation to ensure resolution of any apparent ambiguity or contradictions. Responsibility to provide the impact analysis for design modification and obtain signoff from the Architecture Team. Responsibility to Conduct documentation and code reviews for the team members to ensure standards compliance. Responsibility to Update system data and prepare conversion requirement as necessary for new implementation and production rollout. Participate in project status review meetings with the Development Manager. Responsible for overall ETL architecture design and implementation of the platform that includes data validation, data matching and merging processes. Responsible for explaining the ETL design flow to the End Client. Responsible in designing and developing ETL jobs in Data Stage v8.7 Arranging in the preparation of Release Notes Documents to different Environments. Analyzed and documented client's business requirements and processes. Improved the performance of ETL load jobs to by optimizing the ETL code. Environment: DataStage v8.7, DB2 LUW 11.1, AIX v7.1, Teradata, Control M, PuTTY. Client: Future Focus Infotech, Mumbai, India Aug 11 Oct 13 Role: ETL DataStage Developer Project: GSP Holden (General Motors) Global Strategic Pricing is the ongoing development and implementation of service part price levels to GM dealers and distributors that optimizes contribution margin and supports revenue growth (short & long term) while simultaneously being a driver of positive ownership experiences for GM vehicle owners. The purpose of this project is to build a Global Service Parts database, which is a common repository for all Parts Engineering Data, Global Warranty data, Parts Pricing data and Global Currency data. Applications like GSP-Holden will utilize this SERVICE PARTS ODS as a data source. Currently, GIF store Process Part attributes data from manufacturing unit in Europe, Asia pacific and North America region as the Global Part master ODS. ERAPA, ERAPA-I and ETS are the source system for the engineering data. Global Part Master ODS in GIF is the source for GSP Dubai and GSP US/CA. Currency Information from RMM_CONVERSION_RATE Table in WWPMARS schema in GIF. The US/Canada instance stores Warranty Information from GWM. Responsibilities: Participated in the Design discussion, system design, coordinate with EDW team to define the requirements and develop implementation plan and deployment schedule. Extensively used IBM InfoSphere DataStage to develop processes to extract, transform, integrate and load data from various sources into the Data Warehouse. Used DataStage v8.5 to load the data to Staging area and eventually to the DWH. Worked extensively on building CDC jobs to capture new Inserts, Updates and Deletes. Created the Unit test cases for the complete flow and validated the data to achieve accuracy. Used DataStage Director to monitor the jobs, debug, run DataStage jobs. Developed DataStage jobs using various source stages, target and processing stages like Sequential file, Dataset, Aggregator, Transformer, CDC, look-up, join, merge, funnel, Shared containers, ODBC connector, DB2 connector, Job activity, Execute Command Activity, User Variable Activity, Routine Activity, Loop Activities, Notification Activity Etc. Expertise in creating reusable components like Parameter-sets, Multiple instance jobs, Shared Containers and Local Containers. Improved the performance of ETL load jobs to by optimizing the ETL code. Worked with INFORMATION ANALYZER 8.5 tool for profiling the data Created Technical requirement and mapping documents with the help of business analyst between sources to operational staging targets, using Star Schema, implemented logic to Slowly Changing Dimensions. Worked on Job Sequencers and implemented restart ability with the help of checkpoints to ensure the correct process flow and dependencies. Provided 24/7 support during the various test phases and production support during the release phases. Preparing Unit test cases and Unit testing the Jobs. Working under different modules to achieve the project deliveries possible. Test Results and Screenshots uploaded in HPALM (Testing Tool). Environment: DataStage v8.5, Oracle, Control M, Toad, Unix. Education: Master of Computer Application s from Periyar University India: May 2007 June 2010 Bachelors of Computer Science from Sri Venkateshwara University, India: Apr 2004 Apr 2007 Keywords: continuous integration continuous deployment access management business intelligence sthree database rlang information technology golang hewlett packard California Colorado Georgia North Carolina |