Pradeep - Data Engineer |
[email protected] |
Location: Williamsburg, Virginia, USA |
Relocation: Yes |
Visa: h1b |
Pradeep Reddy
Mobile Number: 972-945-5529 Email: [email protected] Data Engineer Professional Summary Over 10+ years of IT- experience in designing, implementing, and maintaining solutions on Big Data Eco-System. Adept at implementing E2E solutions on Big Data using the Hadoop framework, executed, and designed big data solutions on multiple distribution systems like Cloudera (CDH3 & CDH4), Hortonworks. Expertise in designing data-intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data engineering, Data Warehouse, Data Visualization, Reporting, and Data Quality solutions. Expertise in Big Data processing using Hadoop, Hadoop Ecosystem (Map Reduce, Spark, Scala, Hive, Sqoop, Flume, Pig and HBase, Cassandra, Mongo DB, Kafka Framework) implementation, maintenance, ETL and Big Data analysis operations. Experience in Transforming and Processing raw data for further analysis, visualization, and modeling. Hands-on experience in designing and implementing data engineering pipelines and analyzing data using Hadoop ecosystem tools like HDFS, MapReduce, Yarn, Spark, Sqoop, Hive, Kafka, Impala, Oozie, and HBase. Hands-on experience with Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, DynamoDB. Used PythonBoto3 to configure the AWS services like EC2, S3, and Redshift. Worked with monitors, alarms, notifications, and logs for Lambda Functions, Glue Jobs using CloudWatch. Experience implementing Cloud-based Linux OS in AWS to Develop Scalable Applications with Python. Experience with Shell scripting to automate various activities. Experience in design and implementation plans for hosting complex application workloads on MS Azure. Proficient with Azure Data Lake Services (ADLS), Databricks & iPython Notebooks formats, Databricks Deltalakes & Amazon Web Services (AWS). Experience in Azure API Management, Security, Cloud-to-Cloud Integration. Extensively Worked as an Azure Data Engineer using Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos, NO SQL DB, Azure HDInsight, Big Data Technologies (Hadoop and Apache Spark) and Data bricks Hands on experience in creating pipelines in Azure Data Factory V2 using activities like Move &Transform, Copy, filter, for each, Get Metadata, Lookup, Data bricks etc. Expertise in design and development of various web and enterprise applications using Typesafe technologies like Scala, Akka, and Play framework. Hands-on experience with message brokers such as Apache Kafka. Proficient in designing and implementing transformations, jobs, and workflows using Pentaho tools. Good knowledge in developing data ingestion processes using Flume Agents and Spark Streaming for real-time and near-real time data analysis. Experience in developing workflows using Flume Agents with multiple sources like Web Server logs, REST API and multiple sinks like HDFS sink and Kafka sink. Knowledge in GCP, Big Query, GCS bucket, G - cloud function, cloud data flow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver. Knowledge of Google BigQuery and architecting data pipelines from on-prem to GCP. Knowledge on GCS buckets and GCP DataProc Clusters. Performed ETL Integration with SSIS and handled FTP functionalities within that Experience with snowflake utilities, snowSQL, Snowpipe, and Bigdata model techniques using python. Technical Skills: Big Data Technologies Hadoop, MapReduce 2(YARN), Hive, Pig, Apache Spark, HDFS, Sqoop, Cloudera Manager, Kafka, Amazon Azure, EC2. Programming/Scripting Languages Scala, Python, REST, Java, XML, SQL, PL/SQL, HTML, Shell Scripting, PySpark. Databases HDFS, Oracle 9i, 10g & 11g, MySQL, DB2, HBase, Big Query Cloud Environments Azure, AWS, GCP Tools and Services Jenkins, GitHub, Redmine, Jira, Confluence Visualization Tableau, Thoughtspot, Plotly, Amazon Quicksight and MS Excel. Methodologies Agile Scrum, Waterfall, Design patterns Operating Systems Windows, Unix, Linux Professional Experience Wells Fargo, Charlotte, NC May 2020 to Present Data Engineer Responsibilities: Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and preparing low and high-level documentation. Imported required tables from RDBMS to HDFS using Sqoop and used Storm/ Spark streaming and Kafka to get real time streaming of data into HBase. Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud. Used AWS Redshift, S3, Spectrum and Athena services to query large amount data stored on S3 to create a Virtual Data Lake without having to go through ETL process. Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift. Wrote HIVE UDF's as per requirements and to handle different Schema s and XML data. Implemented ETL code to load data from multiple sources into HDFS using spark. Developed data pipeline using Python, Hive to load data into data link. Perform data analysis data mapping for several data sources. Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables. Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. Designed new Member and Provider booking system which allows providers to book new slots, with sending out the member leg and provider Leg directly to TP through Datalink. Open SSH tunnel to Google DataProc to access to yarn manager to monitor spark jobs. Analyze various type of raw file like Json, Csv, Xml with Python using Pandas, NumPy etc. Developed Spark applications using Scala for easy Hadoop transitions. And Hands-on experience in writing Spark jobs and Spark streaming API using Scala and Python. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive developed Spark code and Spark-SQL/Streaming for faster testing and processing of data. Automated the existing scripts for performance calculations using scheduling tools like Airflow. Designed and developed the core data pipeline code, involving work in Java and Python and built on Kafka and Storm. Good experience on Partitions, bucketing concepts in Hive, and designed both Managed and External tables in Hive for optimized performance. Performance tuning using Partitioning, and bucketing of IMPALA tables. Hands on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka. Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper. Worked on NoSQL databases including HBase and Cassandra. Environment: Map Reduce, HDFS, Hive, HBase, Python, SQL, Sqoop, Impala, Scala, Spark, Apache Kafka, AWS, Zookeeper, J2EE, Linux Red Hat, Cassandra PWC, Tampa, FL May 2019 to April 2020 Data Engineer Responsibilities: Understand the business needs and objectives of the system and interact with the end client/users and gather requirements for the integrated system. Experience in developing data processing tasks using Pyspark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations. Written Python utilities and scripts to automate tasks in AWS using boto3 and AWS SDK. Automated backups using AWS SDK (boto3) to transfer data into S3 buckets. Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP. Worked in API group running Jenkins in a Docker container with RDS, GCP slaves in Amazon AWS Use of API, OpenNLP&StanfordNLP for Natural Language Processing and sentiment analysis. Creating the API with the Serverless framework, in Python 3.6. Install and configure Splunk DB Connect and support of syslog-ng and rsyslog and Security Operation Centre (SOC). Working on different file formats like JSON, CSV, XML using spark SQL. Implemented incremental load approach in spark for huge amount of data tables. Used RESTful API with JSON for extracting Network traffic/Memory performance information. Using Amazon Web Services (AWS) for storage and processing of data in cloud. Created Incremental eligibility document and developed code for Initial load process. Performed Transformations and Actions using Spark for improving the performance. Load the transformation data into Hive/Save As Table in spark. Designed and developed User Defined Function (UDF) for Hive and Developed the Pig UDF to pre-process the data for analysis as well as experience in (UDAFs) for custom data specific processing. Performing transformations using Hive, MapReduce, hands on experience in copying .log, snappy files into HDFS from Greenplum using Kafka, loaded data into HDFS and extracted the data into HDFS from MYSQL using Sqoop. Wrote Map Reduce jobs for text mining and worked with predictive analysis team and Experience in working with Hadoop components such as HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, oozie, Impala and Flume. Using Build tools like Maven to build projects. Extensive experience on Unit testing by creating Test Cases. Using Kafka, Spark Streaming for streaming purpose. Experience in Development Methodologies like Agile, Waterfall. Experience in code repositories like GitHub. Environment: Apache Spark, Scala, Eclipse, HBase, Talend, Python, Pig, Flume, PySpark, Hortonworks, SparkSQL, Hive, Teradata, Hue, Spark Core, Linux, GitHub, AWS, JSON. Ascena Retail Group, Patskala, Ohio Feb 2016 to April 2019 Data Engineer Responsibilities: Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL. Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB). Enhancements to Vertex configuration utilizing Tax Assist, Taxability Drivers, creating and editing Taxability Categories and Tax Rule overrides. Develop stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts. Designed Azure Data warehouse, Azure blobs, Redshift to feed BI reporting Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse Good hands-on experience in Data Vault concepts, data models, well versed understanding and implementation on Data warehousing concepts/Data Vault. Designed, reviewed, and created primary objects such as views, indexes based on logical design models, user requirements and physical constraints Developed procedure for cross-referencing Customer, Vendor and Exemption Certificate Manager Master Data utilizing Microsoft Access DBMS for importing into Vertex O-series ver7 Taxability Manager System. Evaluate Snowflake Design considerations for any change in the application Worked with stored procedures for data set results for use in Reporting Services to reduce report complexity and to optimize the run time. Exported reports into various formats (PDF, Excel) and resolved formatting issues. Define virtual warehouse sizing for Snowflake for different type of workloads. Designed the packages in order to extract data from SQL DB, flat files and loaded into Oracle database. Responsible for defining cloud network architecture using Azure virtual networks, VPN and express route to establish connectivity between on premise and cloud Extensively used SQL queries to check storage and accuracy of data in database tables and utilized SQL for querying the SQL database Environment: Azure Data Factory, Spark (Python/Scala), Hive, Jenkins, Kafka, Spark Streaming, Docker Containers, PostgreSQL, RabbitMQ, Celery, Flask, ELK Stack, MS-Azure, Azure SQL Database, Azure functions Apps, Azure Data Lake, BLOB Storage, SQL server Cybage Software Private Limited, Hyd, India July 2013 to Aug 2015 Data Engineer Responsibilities: Enhanced data collection procedures to include information that is relevant for building analytic systems. Maintained and developed complex SQL queries, views, functions and reports that qualify customer requirements on Snowflake. Performed analysis, auditing, forecasting, programming, research, report generation, and software integration for an expert understanding of the current end-to-end BI platform architecture to support the deployed solution Developed test plans, Validating Test cases, executing Test cases and creating Validation Final and Summary reports. Worked with the ETL team to document the transformation rules for Data migration from OLTP to Warehouse environment for reporting purposes. Worked on ingestion of applications/files from one Commercial VPC to OneLake. Worked on building EC2 instances, Creating IAM user s groups and defining policies. Worked on creating S3 buckets and giving bucket policies as per client requirement. Understanding the used case for data analytics and big data building solutions using Open source technology such as Spark, Python. Engaged with business users to gather requirements, design visualizations and train them to use self-service BI tools. Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle etc. Designed easy to follow visualizations using Tableau software and published dashboards on web and desktop platforms. Creating the High Level and Low-Level design document as per the business requirement and working with offshore team to guide them on design and development. Continuously monitoring for the processes which are taking longer than expected time to execute and tune the process. Carry out necessary research and root cause analysis to resolve production issues during weekend support. Monitor system life cycle deliverables and activities to ensure that procedures and methodologies are followed and that appropriate complete documentation is captured. Environment: SQL, MS Excel, Power BI, GIT, Jenkins Pipelines, Data Models, Shell Scripts, Linux, Python, Agile Keywords: business intelligence sthree database information technology golang microsoft procedural language Florida North Carolina |