Yashwanth - Data engineer |
[email protected] |
Location: Lake Dallas, Texas, USA |
Relocation: yes |
Visa: H1B |
CONTACT :
[email protected] +1 (469)-988-5899 Data Engineer Professional Summary : Possess nearly 7+ years of expertise as a Data Engineer within the Big Data realm, proficient in the Hadoop Ecosystem alongside AWS and AZURE. Engage closely with stakeholders to dissect business processes and requirements, translating them into data warehouse designs, meticulously documenting, and delivering them. Demonstrated proficiency in Cloud data migration, utilizing AWS and Snowflake extensively. Extensive familiarity with Spark Streaming, Spark SQL, and various Spark components such as accumulators, broadcast variables, diverse caching levels, and optimization strategies for Spark deployment. Hands-on involvement in implementing Big Data ecosystems encompassing Hadoop MapReduce, NoSQL, Apache Spark, Pyspark, Python, Scala, Hive, Impala, Sqoop, Kafka, AWS, Azure, and Oozie. Spearheaded the definition of product requirements and creation of high-level architectural specifications, ensuring the feasibility and functionality of existing platforms. Conducted prototyping of components, benchmarking, and provisioning of templates for development teams to evaluate design solutions. Proficient in optimizing data processing performance through techniques like dynamic partitioning, bucketing, file compression, and cache management in Hive, Impala, and Spark. Expertise extends to various data formats such as JSON, Avro, Parquet, RC, and ORC, alongside compressions like snappy and bzip. Successfully executed a proof of concept for Azure implementation, with the overarching objective of migrating on-premises servers and data to the cloud. Leveraged Azure Databricks notebooks to construct batch data pipelines tailored to diverse data types. Proficiently operate within the AWS environment, utilizing AWS Spark, Snowflake, Lambda, AWS RedShift, DMS, EMR, RDS, EC2, and the broader AWS stack. Possess substantial experience in Cloud computing platforms, particularly AWS services. Led the migration of an existing on-premises application to AWS, leveraging services like EC2 and S3 for processing and storing small datasets, and adeptly maintaining the Hadoop cluster on AWS EMR. Proficient in Data Pipelines, encompassing phases of ETL and ELT processes, converting Big Data and unstructured datasets (JSON, log data) into structured datasets for Product analysts and Data Scientists. Transformed legacy reports from SAS, Looker, Access, Excel, and SSRS into Azure Power BI and Tableau. As a Data Engineer, entrusted with tasks involving data modeling, migration, design, and ETL pipeline preparation for both cloud and Exadata platforms. Developed ETL scripts for data acquisition and transformation employing Informatica and Talend. Extensive experience with Teradata, Oracle, SQL, PL/SQL, Informatica, UNIX Shell scripts, SQLPlus, and SQLLoader for data warehouse ETL architecture and development. Proficiently integrate various data sources such as SQL Server, DB2, PostgreSQL, Oracle, and Excel. Solid understanding of Data Warehousing principles with Informatica, including significant experience in creating Tasks, Workflows, Mappings, Mapplets, and scheduling Workflows and Sessions. Skilled in leveraging object-oriented programming (OOP) concepts using Python. Profound knowledge in usability engineering, user interface design, and development. Thorough understanding of Reporting tools such as Power BI, Data Studio, and Tableau. Proficient in backend skills, adept at creating SQL objects like tables, Stored Procedures, Triggers, Indexes, and Views to facilitate data manipulation and ensure consistency. Expertise in implementing best SDLC and ITIL techniques. Experienced in team management, encompassing work planning, allocation, tracking, and execution, driven by strong relationships, results, and innovative thinking. Professional Experience AWS Data Engineer June 2023 to present Verizon, Irving, TX Contributed to gathering requirements, analyzing the entire system, and providing development and testing effort estimations. Participated in designing various system components such as Sqoop, Hadoop (including MapReduce and Hive processes), Spark, and FTP integration with downstream systems. Crafted optimized Hive and Spark queries utilizing techniques like window functions and customizing Hadoop shuffle and sort parameters. Developed ETL processes using PySpark, leveraging both Dataframe API and Spark SQL API. Utilized Spark for diverse transformations and actions, with final result data saved to HDFS and then to the target database Snowflake. Orchestrated the migration of an existing on-premises application to AWS, utilizing services like EC2 and S3 for processing and storing small datasets, and maintaining Hadoop clusters on AWS EMR. Demonstrated expertise in real-time data analytics employing Spark Streaming, Kafka, and Flume. Configured Spark streaming to capture ongoing information from Kafka and store it in HDFS. Designed and implemented ETL processes in AWS Glue for migrating Campaign data from external sources to AWS Redshift. Employed various Spark transformations and actions for data cleansing. Utilized Jira for issue tracking and Jenkins for continuous integration and deployment, enforcing data catalog and governance best practices. Developed Datastage jobs encompassing various stages for data processing and manipulation. Proficient in creating, debugging, scheduling, and monitoring ETL jobs using Airflow for Snowflake loading and analytical processes. Built ETL pipelines for data ingestion, transformation, and validation on AWS, ensuring compliance with data governance. Managed ECS clusters to ensure availability, reliability, and efficient resource allocation. Scheduled jobs using Airflow scripts with Python, configuring tasks and dependencies within DAGs. Leveraged PySpark for data extraction, filtering, and transformation in data pipelines. Expertise in server monitoring using Nagios, CloudWatch, and ELK Stack (Elasticsearch, Kibana). Utilized Data Build Tool, AWS Lambda, and AWS SQS for ETL transformations. Developed Spark applications in Databricks using Spark SQL for data extraction, transformation, and aggregation, analyzing customer usage patterns. Leveraged AWS Glue or Apache Spark on ECS for distributed data processing tasks. Managed Tableau server operations, which involved establishing User Rights Matrix to regulate permissions and roles, monitoring report usage, and setting up sites for different departments. Responsible for estimating cluster sizes and monitoring and troubleshooting Spark Databricks clusters. Automated data loading processes to the target Data Warehouse using Unix Shell scripts. Implemented monitoring solutions using Ansible, Terraform, Docker, and Jenkins. Environment: Red Hat Enterprise Linux 5, HDP 2.3, Hadoop, Map Reduce, HDFS, Hive 0.14, Shell Script, SQOOP1.4.4, Python 3.2, PostgreSQL, spark 2.4, airflow, snowflake. Data Engineer VISA, San Francisco, California April 2022 to June 2023 Responsibilities: Develop, plan, and construct contemporary data solutions that facilitate data visualization by utilizing Azure PaaS services. Determine how a new implementation will affect the present business processes by understanding the current state of application in production. Using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL, extract, transform, and load data from source systems into Azure Data Storage services Azure Analytics Data Lake. Data is ingested into Azure Services (Azure Data Lake, Azure Storage, Azure SQL, and Azure DW) and processed in Azure Databricks. Prototypes for SOAP and REST APIs were implemented. Obtain analytics data from various data feeds via REST APIs Designed DynamoDB schemas and data models to optimize performance, scalability, and cost-effectiveness for various use cases and workloads. REST APIs for obtaining analytics information from various data sources Linked Services/Datasets/Pipeline/ was used to create pipelines in ADF that extract, transform, and load data from many sources, including write-back tools, Azure SQL Data Warehouse, Blob storage, and Azure SQL. Created Spark applications for data extraction, transformation, and aggregation from various file formats using Pyspark and Spark-SQL. These applications were then analyzed and transformed to reveal insights into client usage patterns. Designed efficient DynamoDB schemas tailored to application needs. Built pipelines to ingest and transform data from various sources into DynamoDB. In charge of monitoring and troubleshooting the Spark data bricks cluster, as well as predicting the cluster size. Has experience optimizing Spark applications' performance by adjusting memory, batch interval timing, and parallelism levels. Created JSON scripts to be used in Azure Data Factory (ADF) to deploy the pipeline that uses the SQL Activity to process the data. Practical experience writing SQL scripts for automation; Work with Visual Studio Team Services (VSTS) to construct modules in a production environment. Snowflake Data Engineer Johnson control, Dallas, TX FEB 2021 to April 2022 Responsibilities: In my capacity as the Snowflake Database Administrator, I oversaw the design of the data model, the deployment of production releases for the database migration, and the successful implementation of the corresponding metadata into the production platform environments on the AWS Cloud (Dev, Qual, and Prod). Managed daily integration with the DB2, SQL Server, Oracle, and AWS Cloud teams of database administrators (DBAs) to guarantee that database tables, columns, and associated metadata were successfully inserted into the Aurora and Snowflake environments in the DEV, QUAL, and PROD regions of AWS Cloud. Converted functional requirements from source to target data mapping documents using Informatica for ETL data translation, supporting big data projects involving massive datasets (such as Snowflake and Aurora) in AWS Cloud databases. Designed and implemented data warehouse solutions using Google BigQuery to support analytics and reporting requirements. Assisted Project Managers and Developers in performing ETL solution design and development to produce reporting, dashboarding, and data analytics deliverables. Modeled schemas and tables in BigQuery to optimize query performance and storage efficiency. Performed logical and physical data structure designs and DDL generation to facilitate the implementation of database tables and columns out to the DB2, SQL Server, AWS Cloud (Snowflake), and Oracle DB schema environment using ERwin Data Modeler Model Mart Repository version 9.6. Implement one time data migration of multistate level data from sql server to snowflake by using python and snowsql Technical Team Member of the Agile Price Information Architect-Data Modeling team; in charge of creating Enterprise Conceptual, Logical, and Physical Data Models as well as Data Dictionary to support Retirement Plan Services (RPS), Shared Support Platforms, and Global Investment Services (GIS), the three business units. ETL Engineer Tech INDIA services, Hyderabad, India March 2017 to Nov 2020 Responsibilities: Analyzed the business requirements and framing the Business Logic for the ETL Process and maintained the ETL process using Informatica PowerCenter 10.4.0 Worked on Agile Methodology, participated in daily/weekly team meetings, worked with business and technical teams to understand data and profile the data. Understanding transformation and techniques especially with prem source to Attunity. Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML. Update Strategy, Aggregator, Expression, Joiner Transformations and then loaded into data Warehouse using Informatica BDM 10.2. Designed Tableau dashboards, datasets, data sources, and worksheets. Experienced in Snowflake advanced concepts like setting up resource monitors and performance tuning. Involved in creating UNIX shell scripts for Informatica workflow execution. Data extracted the information from various disparate prem sources including but not limited to SQL, Teradata, DB2, MSSQL, and flat files and loaded into destination or used directly for profiling. Expertise includes taking any incoming data set and applying various data quality logic to it as per business needs. Analyzing data using Snowflake query window Design and Big Data Quality Rules Extensively worked on Informatica transformations such as Source Qualifier, Joiner, Filter, Router, Expression, Lookup, Aggregator, Sorter, Normalizer, Update Strategy, Sequence Generator and Stored Procedure transformations. Migrated data from legacy systems SQL server 200, AS400 to Snowflake and SQL server. Extensively used the Informatica cloud (IICS) transformations like Address validator, Exception, Parser, Solid experience in debugging and troubleshooting Sessions using the Debugger and Workflow Monitor. Used SQL scripts and AWS resources (Lambda, Step Function, SNS, S3) to automate data migration. Expertise in deploying Snowflake features such as data sharing, events, and lake-house patterns Worked with multiple divisions throughout the organization to conform with best practices and standards. Created connections including Relational connection, Native connections, and Application connections. Involved in Performance tuning of sessions, mappings, ETL procedures, processes for better performance and support integration testing Developed ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake. Involved in data analysis and handling the ad-hoc requests by interacting with business analysts, clients and resolve the issues as part of production support Performed debugging by checking the errors in the mapping using Debugger utility of the Designer tool and made appropriate changes to generate the required results Liaison with cross functional teams across the enterprise, including OLTP development team and various data-warehouse teams (on site and offshore team members) Implemented Error Handling Strategy in ETL mappings and route the problematic records to exception tables Engaged in creating UNIX Shell scripts to invoke workflows and used PL/SQL to create the dynamic pmcmd commands and parameter files for the workflows Responsible for writing Autosys JILs and scheduling the Informatica workflows on the Autosys server Prepared Test Plan and Test strategy from the Business Requirements and Functional Specification for the integrations. Developed Test Cases for Deployment Verification, ETL Data Validation, and application testing. Worked as ETL Tester responsible for the requirements /ETL Analysis, ETL Testing and designing of the flow and the logic for the Data warehouse project. Followed waterfall and AGILE development methodology and adhered to strict quality standards in requirement gathering. Functioned as the Onsite / Offshore coordinator for a team. Experienced in writing complex SQL queries for extracting data from multiple tables. Created custom views to improve performance of the PL/SQL procedures. Testing has been done based on Change Requests and Defect Requests. Preparation of System Test Results after Test case execution. Experience in Linux/Unix technologies Strong understanding of internals of Spark and Hadoop e.g. Data Frame, DAG, data partition and distribution, named node limitations and tuning Expertise in Unix commands and in scripting (Korn Shell & Python). Worked on Unix based File System good in log monitoring, analyzing, and providing remediation steps. Worked with informatica support for fixing Informatica Linux server issues. Worked on moving S3 folders and buckets to cloud using Python in Lambda. Hands on python development Keywords: business intelligence sthree database active directory information technology procedural language Texas |