Vivek Reddy - Data Engineer - Snowflake |
[email protected] |
Location: Chicago, Illinois, USA |
Relocation: No |
Visa: H1B |
Name: Vivek Reddy
Email ID: [email protected] Contact Number: +1 469-459-6394 __________________________________________________________________________________________ Professional Summary: Highly dedicated professional with over 11+ years of IT industry hands-on expertise implementing Data warehouses in various industries like Retail, Health, and Financial Services. Have proven track record of working as Data Engineer on Azure, Amazon cloud services and product development. Profound experience in performing Data Ingestion, Data Processing (Transformations, enrichment, and aggregations). Strong Knowledge of the architecture of Distributed Systems and Parallel Processing. Extensive experience in ETL tools like Teradata Utilities, Informatica, Oracle. Experience working on creating and running Docker images with multiple micro services. Creating pipelines using Azure Data Factory with Snowflake and Azure Synapse involves orchestrating data movement and transformation tasks between these data platforms. Developed methodologies for cloud migration, implemented best practices and helped to develop backup and recovery techniques for applications and databases on virtualization platforms. Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Control-M. Experienced in using most common Operators in Airflow - Python Operator, Bash Operator, Google Cloud Storage Download Operator, Google Cloud Storage Object Sensor. Experience in designing and creating RDBMS Tables, Views, User Created Data Types, Indexes, Stored Procedures, Cursors, Triggers and Transactions. Worked on Jenkins Pipelines to build Docker containers and exposure in deploying the same to Kubernetes engineer. Understanding of snowflake design patterns and capabilities like Snow pipe, Shares, Caching, Query Pruning. Processed the files present in Amazon S3 Bucket to Snowflake tables by creating external stages. Developed methodologies for cloud migration, implemented best practices and helped to develop backup and recovery techniques for applications and databases on virtualization platforms. Worked on loading of structured and semi-structured data in snowflake by creating external and internal stages. Working experience with Snowflake, Database Management Creating Databases, Schemas, Roles, Tables, Views. Good understanding of Permanent, Transient and Temporary tables in snowflake and storage costs associated with Time travel and Fail-safe mechanisms. Experience using Agile development processes (e.g., developing and estimating user stories, sprint planning, sprint retrospectives, etc.) Worked on Design phases of the projects and experience in leading Database and Application teams. Efficient in designing, developing Conceptual, Logical and Physical Data model. Worked on extracting data from multiple source systems and handled millions of data files to transform the valid business rules and loaded the data into the final core tables. Strong in RDBMS concepts and well experience in creating database objects like Tables, Views, Sequences, triggers, Collections etc taking the Performance and Reusability into consideration. Extensively worked on BTEQs, Teradata utilities like Multi-Load, Fast Load, Fast export and Teradata Parallel Transporter. Knowledge on Amazon EC2, Amazon S3, AWS Glue, AWS Lambda, AWS Redshift, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR and other services of the AWS family. Strong communication skills, writing skills and proficient in articulating actionable insights to various clients and stakeholders of both technical and non-technical background. Extensive experience in collaborating and presenting results to different teams and leaders in an organization. Technical Summary: Programming & Scripting Languages Python, SQL, Shell script, C, C++ Databases Oracle, MySQL, SQL Server, DynamoDB, Teradata, DB2, Snowflake BI and Visualization Tableau, Power BI, Cognos IDE Jupyter Notebook, PyCharm, Eclipse, GitHub Cloud Based Tools AWS: EMR, Glue, Athena, Dynamo DB, Redshift, RDS, Data Pipelines, S3, IAM, CloudFormation, EC2 Microsoft Azure: Data Lakes, Data Factory, SQL Data warehouse, Data Lake Analytics, Databricks, other Azure services Operating Systems MacOS, Windows, UNIX, Linux ETL Tools Informatica, Abinitio Scheduling Tools Airflow, Control-M, Autosys Bachelor of Technology in Computer Science Jawaharlal Nehru Technological University Certifications: Certified in Teradata Vantage Associate and Developer. Certified in Snowpro core. Certified in MTA Introduction to Programming using Python. Certified in Azure Data Fundamentals from Microsoft. Professional Experience: Client: PepsiCo Inc. Feb 2024 Till Date Location: Plano, TX Role: Senior Data Engineer Responsibilities: Capturing the business requirements and preparing the technical design documents. Designing, implementing, and maintaining data pipelines to extract, transform, and load (ETL) data into Snowflake. Working with various data sources, such as databases, cloud storage, and streaming platforms, to ingest data into Snowflake. Optimizing data pipelines for efficiency, scalability, and reliability. Monitoring and analyzing the performance of data pipelines and database queries. Identifying and resolving performance bottlenecks in Snowflake. Worked on loading of structured and semi-structured data like JSON files in snowflake by creating external and internal stages. Developing and maintaining the schema for JSON data within Snowflake, including table structures and semi-structured data handling. In Snowflake we used to parse JSON data using a combination of built-in functions and SQL Queries. Snowflake Virtual warehouse credits usage Monitoring, Calculating Query Load and checking for excessive credit usage by using load Monitor charts. Monitoring and analyzing the performance of data pipelines and database queries. Identifying and resolving performance bottlenecks in Snowflake. Experience in usage of Azure Data Factory to build pipelines that define the sequence of data movement and transformation activities. Pipelines consist of activities that perform specific tasks, such as copying data, transforming data, or executing SQL queries. Usage of Copy Data activity in Azure Data Factory to move data between Snowflake and Azure Synapse. Configure the source and sink datasets to specify the source and destination of the data transfer. Data transformations are done in Azure Data Lakes using Databricks Service. Schedule pipelines to run at specified intervals using triggers in Azure Data Factory. Triggers can be time-based or event-based, depending on your requirements. Monitor the execution of pipelines and activities using the Azure Data Factory monitoring interface. Monitor data movement, track pipeline runs, and troubleshoot errors. Snowflake query s Performance Tuning by understanding underlying tables Micro partitions, Query Pruning, Clustering. Automating the process, the tables to be clustered. Used COPY, LIST, PUT Commands to process the files by creating internal stages and loading those files into Snowflake. Performance tuning of queries on Snowflake database and Credit Usage Analysis. Worked on the loading of raw files from Amazon S3 buckets to Snowflake by creating snow pipes. Created Stored Procedures to load the data into the snowflake tables based on the business logic. Created data pipelines for different events of ingestion, aggregation, and load consumer response data in snowflake tables that serve as feed for tableau dashboards. Scheduled Airflow DAGs to run multiple snowflake jobs, which independently run with time and data availability. Client: Apple Inc. April 2022 - Dec 2023 Location: Hyderabad, India Role: Senior Data Engineer Responsibilities: Capturing the business requirements and preparing the HLD, LLD & technical design documents. Involved in designing the schema s, analyzing the data, design the entities and its relationships required for reporting. Created the stored procedures to load the data into semantic tables in snowflake for both on-demand and scheduled reports. Created the python scripts to export the data into the files from the semantic tables present in snowflake and upload it into the Amazon S3 buckets. Involved in the creation of python scripts to download the files from the Amazon S3 buckets and upload the data files into Geneva Server in the form of reports like delimited formatted or excel formatted. Involved in the Teradata to Snowflake data migration projects by doing the data analysis and making the data conversions for all the objects by automating the scripts. Developed a level of competency while migrating the Teradata DDLs and code to Snowflake. Worked on loading of structured and semi-structured data in snowflake by creating external and internal stages. Actively involved in the development and testing phases like Unit Testing, System Testing, Regression Testing, UAT Support for providing the quality and defect free code. Used Azure Data Factory to build pipelines that define the sequence of data movement and transformation activities. Pipelines consist of activities that perform specific tasks, such as copying data, transforming data, or executing SQL queries. Data transformations are done in Azure Data Lakes using Databricks Service. Involved in code deployment from lower to higher environments and in the performance tuning activities. Performed report validations to check whether the reports are generated as per the business requirements or not. Created the jobs in Airflow for the scheduling purpose and monitored them closely. Involved in anticipating the customer needs and coordinating with the architects and the team to meet the deliverables on time. Developed Python scripts to extract the files from Snowflake tables and publish those files to SAP. Processed ad-hoc requests to process the files into snowflake tables by creating user stages. Prepared the support documents and provided the transition to the support team and involved in providing the support to the project during the hyper-care phase. I worked with engineers, developers, and QA on the development of current applications and future applications related to the content management line of business. Client: PepsiCo Inc. July 2014 Mar 2022 Location: Hyderabad, India Role: Data Engineer/BI developer Responsibilities: Preparation of technical design documents based on the WIT s and involved in requirement gathering. Extensively used ETL to load the data from multiple source systems into Teradata. Used Power Centre to create mappings, sessions and workflows for populating the data into the dimensions and fact tables simultaneously from different source systems. Tuned performance of Informatica sessions for large data files by increasing block size, data cache size, sequence buffer length and target-based commit interval. Responsible for the development activities on Teradata bteq scripts and UNIX scripts. Handling millions of data files in order to transform the valid business rules and load until core tables successfully. Run the SQL scripts as BTEQs in Unix. Fetching the files from various source systems and loading them to Teradata using utilities like TPT and BTEQs. Created Unix Shell Scripts for the file validation, file creation, error handling and for the data loads. Involved in client interaction sessions and project status meetings. Prepared unit testing cases based on the KPI s combination. Supporting the client IT team during UAT. Good knowledge in migrating/deploying the code to higher environments using Starteam/GSM requests and scheduling the Control M jobs and monitoring them. Also, have a good knowledge in L1/L2 and L3 support activities like monitoring jobs, finding RCA for job failures, and providing solutions and handling the incident management requests like creating/accepting GSM tickets as well as redirecting to other teams upon need. Being a member of LASAG (Layered Architecture Sustain and Governance) Team, I am mainly involved in guiding the EDW new developers internally and at the client end and helping them in the best possible way to try to resolve the issues. I am also involved in providing support to L1/L2 teams to recover the failed jobs and provide support during outage to recover jobs. I work with the team to get to the root of the problem and see what can be done to get the issue resolved and make sure that the data issues are fixed if they are not able to handle them. Along with my regular LASAG activities, I was involved in the development related activities and had prioritized the tasks accordingly to meet deadlines. I have developed a level of competency, which helps to anticipate issues and solve them in advance. Involved in providing support to the project during the hyper care phase. Created Stored Procedures, Functions and various Database Objects and implemented Performance Tuning techniques. Troubleshooting database issues related to performance, queries, stored procedures. Created and maintained Teradata Tables, Views, Macros, Triggers and Stored Procedures. Expertise in snowflake to create and Maintain Tables and views. Onboarding the applications into Snowflake by working with existing stakeholders whose data is in Teradata DB. Automated the processes by developing scripts using SQL or Shell scripts to implement the conversions from one environment to another. Created data pipelines for different events of ingestion, aggregation, and load consumer response data in snowflake tables that serve as feed for tableau dashboards. Scheduled Airflow DAGs to run multiple snowflake jobs, which independently run with time and data availability. Client: Anthem Inc. Oct 2013 - June 2014 Location: Hyderabad, India Role: ETL Developer Responsibilities: Prepared technical specifications of the ETL process flow. Following the agile methodologies and working on the user stories assigned to me and guiding my team members in the best possible way. Handling millions of data files to transform the valid business rules and load until core tables successfully. Run the SQL scripts as BTEQs in Unix. Fetching the files from various source systems (WGS, CR Facets, NASCO) and loading to Teradata using utilities like Informatica and BTEQs. Created Unix Shell Scripts for the file validation, file creation, error handling and for the data loads. Daily connects with the team and updates project status to the customer. Responsible for the development activities in developing Multiload, Fastload, Fastexport, BTEQ and UNIX shell scripts for this project. Involved in code migration from Dev-SIT and SIT-Prod and involved in performance tuning. Prepared unit test cases with multiple test case scenarios. Involved in the impact analysis for all code migrations. Development of control-m jobs and monitoring the jobs. Client: American Express Aug 2012 - Oct 2013 Location: Hyderabad, India Role: ETL Developer Responsibilities: Following the agile methodologies and working on the user stories assigned to me and guiding my team members in the best possible way. Involved in stored procedure conversion, history data migration of tables and stored procedure testing. Analyzing and understanding of Abinitio code. Migrating Abinitio code into Teradata. Gathering database object details from Sybase and set plan for migrating to Teradata. Analyze the feeds and check the datatypes to create the DDLs in Teradata. Comparing the data in Abinitio to Teradata and applying transformations when required. Review the logic used In Abinitio and convert without affecting logic in Teradata. Working with Informatica to generate flat files from mainframe sources. Prepared unit test cases with multiple test case scenarios. Involved in the performance tuning for both ETL and Teradata Queries. Designed and documented operational problems by following standards and procedures using a software-reporting tool JIRA. Keywords: cprogramm cplusplus quality analyst access management business intelligence sthree database active directory information technology Idaho Texas |