Teja sai - Snowflake Data Engineer |
[email protected] |
Location: Goldsboro, North Carolina, USA |
Relocation: Any |
Visa: H1B |
Tejasai Ch
Azure /Aws Snowflake Data Engineer Charlotte- NC Mail: [email protected] Mobile: (980)-819-0257 LinkedIn: www.linkedin.com/in/tejasai-ch PROFESSIONAL SUMMARY: Over 10+ years of experience in the IT industry, specializing in AWS and Azure cloud platforms, Big Data Technologies, Hadoop ecosystem, Data Warehousing, and SQL-related technologies across diverse industry sectors. Implemented Snowflake as a SaaS solution, facilitating the transfer of DB2 data using Snow SQL and data movement servers. Developed AWS pipelines for seamless integration with Snowflake, ensuring efficient data flow. Implemented data ingestion using Snowpipe, automating the continuous loading of streaming data into snowflake. Collaborated with DBAs to optimize performance for PostgreSQL, MongoDB, and Snowflake databases. Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory. Implemented batch and real-time data pipelines using various AWS Services such as S3, Lambda, EMR, EC2, Redshift, Glue, Athena, SNS, SQS, Kinesis, and Step Functions. Conducted ETL Migration services using AWS Lambda functions and contributed to Agile Scrum methodologies. Extensive knowledge of Big Data technologies including Hadoop, Hive, MapReduce, Spark, Spark Core, Spark SQL, and Data Frames/Data Sets/RDD API. Designed and implemented scalable dimensional data models for various projects, ensuring efficiency and performance. Designed and upheld robust data warehouses in AWS and Azure cloud environments, tailored for complex query performance and business intelligence needs. Proficient in developing data pipelines using Hive and Sqoop, extracting weblogs data and storing it in HDFS, and utilizing HiveQL for data analytics. Expertise in converting Hive/SQL queries into Spark transformations using Java and leveraging ETL development with Kafka and Sqoop. Proficient in using Apache Hadoop and MapReduce programs for efficient analysis of large data sets. Proficient in utilizing DBT (Data Build Tool) for modelling data and constructing data transformation pipelines. Proficient in Spark Streaming and Apache Kafka for real-time data ingestion and processing. Developed Spark applications using PySpark and Spark-SQL in Databricks, enabling data extraction, transformation, and aggregation from various file formats for in-depth customer usage pattern analysis. Experience with CI/CD pipelines with Jenkins, Bitbucket, GitHub etc. Extensive experience in developing applications for Data Processing tasks using Teradata, Oracle, SQL Server, and MySQL databases. Worked on data warehousing and ETL tools like Tableau, Power BI, Informatica and Talend. Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills. Excellent communication and interpersonal skills, able to collaborate effectively in cross-functional team environments. TECHNICAL SKILLS: Languages Python, R, SQL, Scala, spark, HTML, CSS, PL/SQL, T-SQL Hadoop Eco System Hadoop, MapReduce, Spark, HDFS, Sqoop, YARN, Oozie, Hive, Impala, Apache Flume, Apache Storm, Apache Airflow, HBase, OLAP, OLTP Data Warehousing Snowflake, Amazon Redshift, Azure Synapse Analytics Data Visualization AWS QuickSight, Power BI, Tableau, Informatica, Microsoft Excel, PowerPoint ETL Tools Tableau, Power BI, Informatica, Talend, SSIS,DBT Database MySQL, SQL Server, Oracle, Teradata, PostgreSQL, MongoDB Python Libraries NumPy, Pandas, Matplotlib, Seaborn Cloud Platform AWS, Azure AWS Cloud S3, EC2, Glue, Redshift, Elastic MapReduce, Athena, Data Pipeline, Lambda, Kinesis, SNS, SQS, CloudWatch, CloudFormation Azure Cloud Data Factory, Databricks, Data Lake Storage, Synapse Analytics, Functions, Stream Analytics, HDInsight Data Analysis Web Scraping, Data Visualization, Statistical Analysis, Data Mining, Data Warehousing, Data Migration, Database Management IDE/Data Formats VScode, Eclipse, IntelliJ, PyCharm, Jupyter, Databricks, JSON, Parquet, AVRO, XML and CSV Operating System Windows, Unix, Linux Methodologies Agile, Waterfall EDUCATION: Wilmington University AUG 2017 DEC 2018 Master of Science in Computer and Information Systems Bachelors in Electronics and Communication Engineering Jun 2009 - Jun 2013 R.V.R & J.C College of Engineering, India WORK EXPERIENCE Client: Anthem March 2022 - Present Role: AWS Snowflake Developer Responsibilities: Utilized Snowflake as a Software as a Service (SaaS) solution and transferred DB2 data to Snowflake through the use of Snow SQL and data movement servers. Created pipelines using AWS infrastructure to integrate with Snowflake. Responsible for loading data into S3 buckets from the internal server and the Snowflake data warehouse. Utilized Snow SQL for er tasks and building analytical warehouses on the Snowflake platform. Implemented the installation and configuration of a multi-node cluster on Amazon Web Services (AWS) EC2 for cloud-based projects. Using Amazon Web Services (Linux/Ubuntu), launch Amazon EC2 Cloud Instances and configure launched instances for specific applications. Demonstrated expertise in data modeling, including development and updating of data models. Involved in designing the LDM and PDM models in data modelling tools like Erwin. Developed the logical and physical data models that capture current/future state data elements and data flows using ER studio. Collaborated closely with cross-functional teams to integrate Snowpipe with AWS S3, creating a seamless and automated flow of data from cloud-based sources to Snowflake. Work with a team on making decisions on how to migrate the data from on prem to cloud,which tools can be used for ETL or ELT on cloud. Implemented DBT (Data Build Tool) as part of the data transformation pipeline, enhancing efficiency and reliability in modeling data and constructing analytical warehouses on the Snowflake platform. Used DBT to test the data(schema tests, referential integrity tests, custom tests) and ensures data quality. Worked on a project involving the ingestion of JSON files into Snowflake, developing strategies to handle dynamic data structures and ensure schema-on-read flexibility. Implemented AWS Lambda functions to run scripts in response to events in Amazon DynamoDB table or S3 bucket or to HTTP requests using Amazon API gateway. Developed Spark applications using Pyspark and Spark-SQL for data extraction,transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Led the implementation of a real-time data ingestion solution using Snowpipe, automating the continuous loading of streaming data into Snowflake data warehouse. Worked on Snowflake Schema, Data Modeling and Elements,and Source to Target Mappings, Interface Matrix and Design elements. Performed data quality issue analysis using Snow SQL by building analytical warehouses on Snowflake. Utilized Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon EC2 and Amazon S3. Utilized Linux and Unix environments for automating routine data tasks, contributing to a more streamlined and efficient data engineering workflow. Designed and implemented action filters, parameters, and calculated sets for dashboards and worksheets in Tableau. Environment: Snowflake, Python, SQL, Linux, Unix DB2,Terraform, AWS services, GIT, DBT, Tableau. Client: Comcast, Remote June 2020 Feb 2022 Role: Azure Big Data Engineer Responsibilities Implemented various stages of Data Flow in the Hadoop ecosystem, including Ingestion, Processing, and Consumption. Executed PySpark and SparkSQL transformations in Azure Databricks for intricate business rule implementations. Worked on multiple Azure platforms like Azure Data Factory, Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse(SYNAPSE ANAYALTICS), Azure HDInsight. Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL. Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in snowflake. Build and maintain the environment on Azure IAAS, PAAS. Designed and developed Hive, HBase data structures, and Oozie workflows for job scheduling and batch processing. Written multiple MapReduce programs to extract data for extraction, transformation, and aggregation from more than 20 sources having multiple file formats including XML, JSON, CSV &other compressed file formats. handled the importing of data from various places, transformed it using Hive and Map-Reduce, loaded it into HDFS, and used Sqoop to retrieve data from MySQL and store it in HDFS. Cluster coordination services through Zookeeper. Enhanced Hive and Pig core functionality by crafting custom UDFs, UDTFs, and UDAFs. Build ETL pipelines using Azure Synapse Analytics, Azure Data Factory, and distributed computing such as Databricks Spark. Data Lake Analytics for extracting, transforming, and loading data from source systems to Azure Data Storage services. Built streaming ETL pipelines using Spark Streaming to extract data from diverse sources, perform real-time transformations, and load it into a data warehouse like Azure Synapse Analytics. Worked on running queries over CSV files kept in an external data store and Parquet files kept in a data lake. Loaded and transformed data into HDFS from structured data, Oracle, and SQL Server using Talend Big Data Studio. Utilized various Talend components (e.g., tMap, Tfilelist, tJoin, tHashInput, tHashOutput, tJava, TOracleInput, TOracleOutput, TsendEmail) in the development process. Written Spark applications using Scala to interact with the MySQL database using Spark SQL Context and accessed Hive tables using Hive context. Analysed the data by performing Hive queries (HiveQL). Worked on the Analytics Infrastructure team to develop a stream filtering system on top of Apache Kafka and Storm. Worked on a POC on Spark and Scala parallel processing. Real streaming the data using Spark with Kafka. Scripted the creation, truncation, dropping, and alteration of HBase tables after executing map reduce jobs for later analytics. Environment: Sqoop, Hive, Azure, Json, XML, Kafka, Python,Terraform, Map Reduce, oracle, Snowflake, Spark, Scala, Hive, Azure, Azure Data Bricks, DAX, Azure Data Lake, Kafka, Python, Talend Administrator Console. Client: Banner Health Jan 2018 - May 2020 Role: Big Data Engineer Responsibilities Implemented the design and execution of a robust data lake architecture on AWS, leveraging Glue, EMR, and S3. Achieved a seamless migration, ensuring scalability and optimal data storage capabilities. Led end-to-end development of intricate ETL pipelines in Python and Spark, handling vast datasets from diverse sources. Elevated data quality and accuracy through comprehensive cleansing processes. Applied extensive Hadoop expertise, including HDFS and MapReduce, optimizing data processing speed and resource utilization for efficient large-scale data management. Installed and Configured Apache Hadoop clusters for application development and Hadoop tools. Pioneered the integration of Kafka for real-time data streaming, revolutionizing the ingestion of high-volume data and enabling real-time analytics functionalities. Implemented automated deployment and scaling using AWS services (EC2, Lambda, Auto Scaling), enhancing system reliability and resource efficiency. Arranged workflow scheduling with Yarn and Oozie for optimal task execution. Developed ETL processes using AWS Glue to migrate data from external sources (such as S3, ORC/Parquet/Text Files) into AWS Redshift. Designed and developed ETL process using Informatica tool to load data from wide range of sources such as Oracle, Aws cloud. Implemented data ingestion with cleansing and transformations using AWS Lambda, AWS Glue, and Step Functions. Collaborated with database administrators to optimize performance and implement best practices for PostgreSQL, MongoDB, and Snowflake databases, ensuring seamless data accessibility. Created HBase tables to load large sets of semi-structured data coming from various sources. Responsible for loading the customers data and event logs from Kafka into HBase using REST API. Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala, and Python. Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which is writeable to the glue catalog and can be queried from Athena. Programmed in Hive, Spark SQL, Java, and Python to streamline the incoming data and build the data pipelines to get the useful insights and coordinated pipelines. Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance. Experienced in developing Web Services with Python programming language and good working experience in processing large datasets with Spark using Scala and Pyspark. Experienced in using and tuning relational databases (e.g., Microsoft SQL Server, Oracle, MySQL) and columnar databases (e.g., Amazon Redshift, MicrosoftSQL, Data Warehouse) Conducted code reviews and implemented version control using Git, fostering a collaborative and efficient development workflow. Collaborated cross-functionally to troubleshoot and resolve issues related to Sqoop, Flume, HBase, and other big data technologies, demonstrating expertise in problem-solving and system maintenance. Environment: AWS Glue, Python, Hadoop (HDFS, Map Reduce), Kafka, Scala, AWS Services (Ec2, Lambda, EMR, Auto scaling), Yarn, IAM, PostgreSQL, Spark, Impala, Informatica, MongoDB, Snowflake, Python, HBase, Oozie, Hue, Sqoop, Flume, Oracle, NIFI, Git Client: Zensar, Hyderabad, India June 2013 May 2017 Role: Sr System Engineer Responsibilities: Collaborated with business users, clients, and end users to collect and analyze business requirements, subsequently translating them into a comprehensive Technical Specification Document. Conducted in-depth analysis of extensive datasets to determine the most effective approach for aggregation and reporting. Constructed both simple and complex MapReduce Jobs utilizing Hive and Pig for streamlined data processing. Developed Java-based MapReduce programs to convert XML, JSON, and other formats to CSV, implementing analytics along the way. Expert in designing ETL data flows using SSIS, creating mappings/workflows to extract data from SQL Server and Data Migration and Transformation from Access/Excel Sheets using SQL Server SSIS. Enhanced the efficiency of MapReduce Jobs through the application of compression techniques and performance tuning methods. Managed the importation of data from diverse sources, executed transformations using Hive and MapReduce, loaded data into HDFS, and utilized Sqoop to extract data from MySQL into HDFS. Developed and optimized Spark applications for large-scale data processing, leveraging distributed computing power. Exported analyzed data to relational databases through Sqoop for visualization and report generation for the BI team. Crafted MapReduce Drivers, Mappers, and Reducers in Java, preserving Binary Data through Sequence File and Avro Data Files. Designed custom components, such as Writable Comparables, to handle complex data types effectively. Contributed to the development of Hive and Pig UDFs for performing aggregations on customer data. Wrote Mappers and Reducers using the Streaming API and employed Zookeeper for facilitating coordination among distributed processes through a shared hierarchical namespace. Effectively managed Hadoop jobs using the Oozie workflow scheduler system, overseeing processes for MapReduce, Hive, and Sqoop. Environment: Apache Hadoop, Sqoop, Flume, Oozie, Java, Spark, Scala, EMR, Python, PySpark, Hive, YARN, JDBC, Pig, Zookeepers, SSIS, XML, JSON, MySQL Keywords: cprogramm continuous integration continuous deployment business intelligence sthree rlang information technology procedural language North Carolina |