Kedar Kumar - Lead data Engineer |
[email protected] |
Location: Phoenix, Arizona, USA |
Relocation: Only Remote |
Visa: H1B |
Kedar Kumar Nanda
CCA 175 Spark and Hadoop Developer Certified AWS Certified Cloud Practitioner OCA 1Z0-051 Oracle Database 11g: SQL/PLSQL Certified Data Mining Certified from Arizona State University (ASU) PROFILE Over 14 years of Professional Experience in Data Engineering Functions: Dat ase Design, Modelling and Development, Data Warehousing, Data Analytics, Data Models, and Reporting. Currently working for PRA Health Science, Inc. as IT Architect (Sr Data Engineer). Good knowledge on USA Healthcare Data. Working experience with SQL and MPP systems (SQL, SPARK SQL, AWS Redshift, SnowSQL, HIVE etc). Working Experience in RDBMS Database: Oracle PL/SQL. Working Experience in Python and PySpark. Proficiency in advanced SQL, and performance tuning. Experience with Cassandra NOSQL database with Spark to build complex pipelines. Scheduling Tools Autosys and Control-M. Working Experience on Data Warehousing Systems Using Snowflake, AWS Redshift and Oracle PL/SQL. Building and Deploying ETL and data pipelines using Spark, Python, Apache Nifi and Snowflake in conjunction with AWS services like SQS and SNS. Using PySpark to process and Transform Distributed Data. Expertise with relational databases and experience with schema design and dimensional data modeling. Experience using business intelligence reporting tools (Alteryx) Good Knowledge of Machine Learning and Statistics Algorithms Linear Regression, Logistic Regression, KNN, Decision Tree, Random Forest, PCA, Basic Recommendations Systems, K Means Clustering. Experience with scientific computing, analysis and Visualization packages such as Numpy, Pandas, Matplotlib, Seaborn, Plotly, Cufflinks, and Chlorepath Maps. Extensive domain knowledge of United States Healthcare Data. Good Knowledge on AWS Web App 3-tier architecture and AWS services EC2, ELB, ASG, RDS, ElastiCache, S3, CLI, Elastic Beanstalk, CICD, CloudFormation, CloudWatch, X-Ray, CloudTrail, SQS, SNS, Kinesis, AWS Lambda, DynamoDB etc. Good understanding of OOPs concepts and Data Structure. Good understanding of Big Data Technology Concepts. Well versed in agile practices for software project development. Strong analytical, interpersonal communication and problem solving skills Talent Management - I lead a team of 6 Database Developers, Analysts and Senior Analysts. PROFESSIONAL EXPERIENCE PRA Health Science, Inc. Jan 05, 2021 - Present Designation: Sr Data Engineer (Architect) Primary Skills Used: Oracle SQL, PL/SQL, Oracle 12C, Integrated Data Verse (IDV), SHS/PRA Graphical User Interface tool, Apache Hadoop, HDFS, TOAD, Toad Data Modeler, Spark, PySpark, Oracle Warehouse Builder(OWB), SQL Developer, Apache Hive, Sqoop, Autosys, Amazon Web Services (AWS) Services, Microsoft Azure, Snowflake, Putty, Unix Shell Scripting, SQL*Loader, Python, PRO*C/C++, PHAST, Prescriber Patient tool (PPV), Summus Ticketing Tool, Market Definition Tool, Word Order System(WOS) tool. Relational databases/tools - Oracle 12c, Toad, SQL Developer Programming Language Python, Oracle PL/SQL programming, Unix Shell Scripting, PRO*C/C++ Scripting Language Unix Shell Scripting SQL and MPP Systems (SQL, Spark SQL & Data Frames, HIVE, Pig) ETL Tool Apache Nifi Datawarehosuing Systems - Snowflake, Oracle PLSQL, Toad Data Modeler, Oracle Warehouse Builder(OWB), SQL*Loader Big Data Technologies Cloudera Spark, Hadoop, Apache Hive, HDFS, Sqoop. Data Visualization Libraries - Numpy, Pandas, Matplotlib, Seaborn, Plotly, Cufflinks, Chlorepath Maps Cloud Services AWS S3, CLI, IAM, EC2, ELB, RDS, Redshift, ElastiCache, Elastic Beanstalk, Microsoft Azure. Reporting tool Alteryx Automation/Job Scheduling Tool - Autosys and Control-M File Transfers: AWS S3 CLI, Microsoft Azure, FTP/SFTP, Filezillae, WinSCP Operating Systems Unix, Linux, Windows Job Duties and Responsibilities: Collaborate with Operation Team, Business analysts, production support, design analysts, and functional teams to understand, analyze and gather requirements. Build robust and scalable data warehouses and data integration ETL pipelines. Use Snowflake to make several layers of data warehouse available for reporting, data Science and analytics and build and deliver high quality data architecture to support business analysis and customer reporting needs. Design and develop operational data storage solutions (ODS) and Dimensional DataMart model storage solutions. Design and develop data integration ETL data pipelines and streaming pipelines using advanced technologies. Design and develop highly scalable and extensible EDW, ODSPRD, staging and PR1 applications and self-service platforms to support the problems by writing advanced and complex SQL queries, query optimization and join strategies. Design and Develop Oracle fine-grain control - Row/column level security models with masking and encryption methods. Design and develop different data models like ER models, Conceptual/Logical/Physical models, Relational models to efficiently build and optimize data warehouses. Generate QC Reports to identify system bottlenecks, data anomalies, trend differences, frequency counts, and determine if they are within the specifications of the final delivery for high revenue generating and critical warehouses. Develop automation frameworks to expedite the QC reports, trending reports and adhoc reporting requirements to validate the data, building and deploying large-scale, complex data processing pipelines using advanced technologies. Create descriptive reports, trend reports and automate these process to run them on weekly and monthly intervals or custom intervals. Develop comprehensive data integration solutions. Design and build databases and data warehouses based on a reporting data model to support reporting layer. Build and deliver complex data architectures to support business analysis and customer reporting requirements. Design, build and deploy large-scale, complex data processing pipelines. Perform Pre-processing and loading from disparate data sets (Structured, Semi-structured, unstructured) and reconciliation. Design and develop staging and data warehousing applications to support the problems by writing advanced and complex SQL queries, query optimization and join strategies. Stage the vendor files in S3 buckets in different file formats and load them into Snowflake. Stage intermediate data from or to snowflake at S3 Buckets. Build data pipes with Cloud Storage Notification Services such as SQS and SNS. Design and develop highly scalable and extensible staging and data warehouse applications and self-service platforms which enables collection, storage, modelling and analysis of massive data sets from different structured, semi-structured and unstructured sources. Develop and maintain semantic layer(s) for multiple types of access. Design and develop both ad hoc and static reporting solutions. Design and administer data quality mechanisms. Deliver robust test cases, plans and strategies. Build imputation machine learning models, APLD, GUI and PTD processes. Build patient swapping processes with Spark and Python and design self-service applications and platforms for ingesting data, profiling data and transformations on data. Re-architect legacy mainframe driven codes and ftp file transfer processes into more efficient database-centric queries, designs, approaches using advanced technologies like Oracle PLSQL and Apache Hive, AWS S3 Buckets and Microsoft Azure Blobs. Develop custom Imputation Process to Impute the Warehouse Healthcare data using operational data storage solutions, data warehousing platform and using different technologies. Plan, develop, design, test, implement, and support custom proprietary warehouse applications in various software languages, platforms and environments. Handle all phases of software development life cycle; facilitation, collaboration, knowledge transfer and process improvement. Develop and maintain Complex Code Units, perform quality Check and debugging for warehouse applications as per requirement, and automate any repeated manual process. Architect Data Migration Strategies, Data Modelling and Performance Tuning Activities and associated in all phases of the SDLC (Software Development Life Cycle). Troubleshoot issues related to existing data warehouses to meet client deliverable SLAs. Troubleshoot and fix the production automated job failures, data Investigation, provide a solution, fix the issue and deploy the updated code into Production, develop efficient hive scripts with joins on datasets using various techniques, and load data from different datasets, deciding on which file format is efficient for a task. Define various integrity constraints like Primary Key, Foreign Key, Unique Key, Not Null, and Check Constraints and perform indexing. Implement big data solutions, cloud solutions and distributed architectures using AWS, Azure services and Spark. Create Solution Design, Application Design Documents along with work breakdown structure (WBS)/Line of Effort (LOE) for design and development of application, create user manual/instruction documents with very detailed scope and functionality of applications for both IT and business users. Use Bulk Collections for better performance and easy retrieval of data, by reducing context switching between SQL and PL/SQL engines, create PL/SQL scripts to extract the data from the operational database into simple flat text files using UTL_FILE package, and create records, tables, collections (nested tables and arrays) for improving Query performance by reducing context switching. Use Pragma Autonomous Transaction to avoid mutating problem in database trigger, implement column family schemas of Hive within HDFS. Data Migration using SQL*Loader, Import and Export Utilities, architect and develop Oracle schemas and Hive tables, and develop efficient hive scripts with joins on datasets using various techniques. Harman (A Samsung Company) July 08, 2013 Jan 04, 2021 Designation: Sr Data Engineer (Architect) Duration: 7 Years 7 Months Primary Skills Used: Oracle SQL, PL/SQL, Oracle 12C, Apache Hadoop, HDFS, TOAD, Toad Data Modeler, Spark, Oracle Warehouse Builder, SQL Developer, Apache Hive, Sqoop, Autosys, Amazon Web Services (AWS) Services, Microsoft Azure, Snowflake, Putty, Unix Shell Scripting, SQL*Loader, Python, PRO*C/C++, RPM, Summus Ticketing Tool, PHAST, Integrated Data Verse (IDV), Prescriber Patient tool, Graphical User Interface tool, Market Definition Tool, SHS Graphical User Interface tool. SQL and MPP Systems (SQL, Spark SQL & DataFrames, HIVE, SnowSQL) RDBMS databases/tools - Oracle 11g/12c, Toad, SQL Developer NOSQL Database: Cassandra (CQL) in conjunction with Spark. Programming Language Python, Oracle PL/SQL programming, Shell Scripting Datawarehosuing Systems - Snowflake, Oracle PLSQL, Toad Data Modeler, Oracle Warehouse Builder(OWB), SQL*Loader Scripting Language Unix Shell Scripting Data Processing Transformation and pipelines using PySpark and Python. Big Data Technologies Cloudera Spark, Hadoop, Apache Hive, HDFS, Sqoop. Data Analysis and Visualization using Different Python Libraries (Numpy, Pandas, Matplotlib, Seaborn, Plotly, Cufflinks, Chlorepath Maps) AWS Services IAM, EC2, ELB, Route 53, RDS, ElastiCache, S3, CLI, Elastic Beanstalk etc. ETL and Reporting tool Alteryx Workflow Management Tool/Job Scheduling Tool - Autosys and Control-M File Transfers: AWS S3 CLI, Microsoft Azure, FTP/SFTP, Filezillae, WinSCP Project Description: Design, model and develop large scale data warehousing solutions, design and implement complex ETL pipelines and other business Intelligence solutions to support the rapidly growing and dynamic business demand for healthcare data and provide powerful analytics, high-value data, insights and innovative performance solutions to top United States Pharmaceutical companies. Develop Operations Data Storage (ODS) solutions, Multi-dimensional data mart model storage solutions, data imputation models, machine learning models, data intelligence Modules and Patient Transactional datasets. Develop complex end-client modules with millions of patient data record and practitioners and other dimensions to perform predictive data analytics to impute the missing data and estimate the likelihood of a future outcome based on patterns in the historical data. Predictive analytics solutions on different competitive segments such as Early detection of diseases , Prediction of drug initiation , Disease progression to 2nd line of Therapy in CLL to improve market productivity and effectiveness through sales targeting. Job Duties and Responsibilities: Lead multiple database technical teams (Data Warehousing Development, Target and Compensation (T & C), Brand Analytics (BA), Dynamic Claim Analyzer (DCA)). Perform peer code review, test cases review and champion best practices. Performs Data Engineering Functions like Data Warehousing, Analytics, Data Models, Reporting, Database Design, Modelling and Development. Working with the team to assign tasks and ensure timely completion of required tasks. Pivotal in building the offshore team with knowledge transition. Act as subject matter expert (SME) for several key existing warehouse applications, providing assistance to support teams as needed. Execute the role of SME (Subject Matter Expert) both on functional and technical aspects of the applications. Collaborate with business analysts, design analysts and other technical/functional teams to understand and gather requirements. Design and Build robust and scalable data integration warehouses and ETL pipelines using advanced technologies. Design Operational Data Storage and Multi-Dimensional model storage solutions. Use Snowflake to make several layers of data warehouse available for reporting, Data Science and Analytics. Build and deliver high quality data architecture to support business analysis, customer reporting needs. Design and develop the conceptual/Logical/Physical Data Models. Design ER Models, Relational Models, Object Oriented Models etc for data warehouses. Design multidimensional schemas like Star Schema, Snowflake Schema or Galaxy Schemas. Develop QC Reports to identify data anomalies and trend differences for high revenue generating client warehouses. Develop automation frameworks to expedite the QC reports, Trending reports and adhoc reporting requirements to validate the data. Build and deploy large-scale, complex data processing warehouse pipelines using advanced technologies. Capacity Planning the Disk Space, Memory (RAM) and CPU of data warehouses. Design and develop highly scalable and extensible staging and data warehouse applications and self-service platforms which enables collection, storage, modelling and analysis of massive data sets from different structured, semi-structured and un-structured sources. Design and develop staging and data warehousing applications of the team to support the requirements. Plan, develop, design, test, implement, and support custom proprietary warehouse applications in various software languages, platforms and environments. Stage the vendor files in S3 buckets in different file formats and process them as per the requirement. Stage intermediate data from or to snowflake at S3 Buckets. Design self-service applications and platforms for ingesting data, profiling data and transformations on data. Re-architect legacy mainframe driven codes and ftp file transfer processes into more efficient database-centric queries, designs, approaches using advanced technologies like Oracle PLSQL and Apache Hive, AWS S3 Buckets and Microsoft sju Blobs. Develop custom Imputation Process to Impute the Warehouse Healthcare Retail Data (Rx), Mail Order Data (MO) and Standard Non-Retail (SNR) data, Diagnostic Data (Dx), Procedure Data (PX) and Surgical Data (SX) using operational data storage solutions, data warehousing platform and using different technologies like Oracle, Python, Spark, Hive, SQL etc. Design and develop staging and data warehousing applications to support the problems by writing advanced and complex SQL queries, query optimization and join strategies. Develop and maintain Complex Code Units, perform quality Check and debugging for warehouse applications as per requirement. Automate any repeated manual process and adhoc deliverables to Oracle/Hadoop HiveQL/Spark/AWS S3/Azure workflows using Autosys scheduler and UNIX Shell Scripting. Architect Data Migration Strategies, Data Modelling and Performance Tuning Activities. Requirement analysis, design, development, Unit Testing, production deployment of business applications and associated in all phases of the SDLC (Software Development Life Cycle) with timely delivery against aggressive deadlines. Troubleshoot and fix the production automated job failures, data Investigation, provide a solution, fix the issue and deploy the updated code into Production. Develop efficient hive scripts with joins on datasets using various techniques. Loading data from different datasets and deciding on which file format is efficient for a task. Create Solution Design, Application Design Documents along with work breakdown structure (WBS)/Line of Effort (LOE) for design and development of application. Update all the gathered requirements in the Work Order System (WOS) tool. Use Bulk Collections for better performance and easy retrieval of data, by reducing context switching between SQL and PL/SQL engines, create PL/SQL scripts to extract the data from the operational database into simple flat text files using UTL_FILE package, and create records, tables, collections (nested tables and arrays) for improving Query performance by reducing context switching. Use Pragma Autonomous Transaction to avoid mutating problem in database trigger, implement column family schemas of Hive within HDFS. Data Migration using SQL*Loader, import and export Utilities, architect and develop Oracle schemas and Hive tables. Customize, Configure, Integrate and Implement Proprietary tool, RPM, as per client requirements. Integrate with PRA Health science (Symphony Health Solutions) proprietary tools PHAST, Integrated Data Verse (IDV), Prescriber Patient Tool (PPV), Graphical User Interface Tool, Market Definition Tool to develop Healthcare Solutions and analytical reports. Develop Data Warehousing Healthcare solutions (DWHS), Operation Data Storage (ODS) and Dimension DataMart Model Storage Solutions using emerging technologies Develop complex logics and deliver the data within agreed amount of time to meet the SLA s. Install, configure, Develop and Maintain enterprise Spark, Hadoop HiveQL environment and Oracle Relational Database. Develop Oracle Schema and Databases for warehouses, collaborate with DBA (Database Administrator) to assign required amount of spaces to each schema. Develop Oracle/Hadoop HiveQL Job Flows. Create Unit test cases for databases, warehouse applications and reports for Project/Modules/Tools. Build and install all the required relational databases, Hadoop Ecosystems, Spark Applications, Softwares, and Tools in lower environment (Dev & SIT) and configure them accordingly. Create technical and functional process documents to document the changes done as part of Change Records (CRs) and Work Orders (WRs). Sonata Software July 02, 2012 July 05, 2013 Designation: Senior Systems Analyst Client: Sony Primary Skills Used: Oracle PL/SQL, SQL, Unix Shell Scripting, Toad, Toad Data Modeler, Python, PRO*C/C++ Job Duties and Responsibilities: Understanding and Collecting requirements from Business user side or Business Analyst and try to convert that requirement to SQL queries, PL/SQL. Design and develop various Sony Supply Chain Data Warehouses and Storage solutions using Oracle, SQL, Unix Shell Scripting, Python and PRO*C. Work in SQL and PL/SQL programming, developing complex code units, PL/SQL Packages, Procedures, Functions, Triggers, Views and Exception handling for retrieving, manipulating, checking and migrating complex data sets in Oracle. Partitioned large Tables using range partition technique. Worked extensively on Ref Cursor, External Tables and Collections. Involved in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance with timely delivery against aggressive deadlines. Involved in Data flow diagrams, Data dictionary, Database normalization theory techniques, Entity relation modelling, Logical/Physical Data Models using various design techniques. Work in SQL performance tuning using Cost-Based Optimization (CBO). Good knowledge of key Oracle performance related features such as Query Optimizer, Execution Plans and Indexes. Performance Tuning for Oracle RDBMS using explain Plan and HINTS. Troubleshoot and fix the prpraoduction automated job failures, data investigation, provide a solution, fix the issue and deploy the updated code into production. Ensure daily, weekly and monthly jobs run smoothly in Autosys. Federal Bank Limited Apr 19, 2010 June 27, 2012 Designation: Assistant Manager Job Duties and Responsibilities: Provide banking solutions and support using Banking Product Finacle . Collaborate with different team members and manager to understand, gather requirements and provide customer solutions. Support and develop various Banking Customer Data Modules, Accounts and storage solutions using Finacle product and other proprietary tools, platforms and databases. Handling transactions and various activities of the customers. Resolve open tickets related to Finacle data applications. Handle all phases and solutions of Finacle banking modules. Develop both adhoc and static reporting solutions. Administer data quality mechanisms. Tata Consultancy Services July 12, 2007 Oct 15, 2009 Designation: Asst. Systems Engineer Primary Skills Used: Oracle PL/SQL, Toad, Oracle Warehouse Builder (OWB), I-DEAS Job Duties and Responsibilities: Collaborate with business analysts and team leader to understand and gather requirements. B & W data modelling, design and development of Nissan data warehouses and storage solutions using different tools, platforms and databases. Deliver robust test case, plans and strategies. Handle all phases of software development life cycle; facilitation, collaboration, knowledge transfer and process improvement. Develop comprehensive data integration solutions. Troubleshoot issues related to existing data warehouses to meet client deliverable SLAs. EDUCATION Qualification Institute University Bachelors of Technology CET, Bhubaneswar, India BPUT XII Khallikote Collge School, Bbsr, India CBSE X BKH School, Bbsr, India CBSE TOOLS & TECHNOLOGY EXPERTS Databases: Oracle Database 10g,11g/12c, Toad SQL and MPP SQL, Spark SQL & DataFrames, HIVE, SnowSQL, Redshift Programming Language: Python, Oracle PL/SQL, Unix Shell Scripting, PRO*C/C++ Datawarehosuing Systems: Snowflake, Redshift, PLSQL, Toad Data Modeler, Oracle Warehouse Builder (OWB), SQL Developer, SQL*Loader ETL Tool Apache Nifi Scripting Language Unix Shell Scripting Big Data Technologies: Apache Hadoop HDFS, Hive, Sqoop, Pig, Spark, PySpark Reporting/Other Tools: Alteryx File Transfer Tools: AWS, Microsoft Azure, FTP/sFTP, Filezillae, WinSCP AWS Services: S3, CLI, Redshift, EMR, Glue, Athena, Lambda, RDS, Kinesis Visualization Libraries: Numpy, Pandas, Matplotlib, Seaborn, Plotly, Cufflinks, Chlorepath Maps IT Processes: Agile, Software Development Life Cycle (SDLC) Automation Tools: Autosys, Control-M, Airflow OS: Unix, Linux, Windows CERTIFICATION CCA Spark and Hadoop Developer (CCA175) AWS Certified Cloud Practitioner OCA (Oracle Certified Associates) in SQL from ORACLE CORPORATION. OCA (Oracle Certified Associates) in PL/SQL from ORACLE CORPORATION. Data Mining Certified from ARIZONA STATE UNIVERSITY, AZ, USA. Operating Systems from ARIZONA STATE UNIVERSITY, AZ, USA. Intro to HIPAA for Business Associates certified Earned Badge for Oracle Cloud Infrastructure (OCI Explorer) Keywords: cprogramm cplusplus business analyst sthree active directory information technology procedural language Arizona Missouri |