Syed Omer - Data Engineer candidate resume |
[email protected] |
Location: Dallas, Texas, USA |
Relocation: |
Visa: h1b |
SAI TEJA N
Big Data Engineer Mobile: +1 (845) 593-4233 Email: [email protected] PROFESSIONAL SUMMARY 8+ years of professional work experience in Information technology with an expert hand in the areas of Big Data, Hadoop, Spark, Hive, Impala, Sqoop, Flume, Kafka, SQL, ETL development, report development, database development, data modeling and strong knowledge of several Database Architectures. Experienced in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL). Designed pipelines to extract data from snowflake and Druid to perform data transformations and filtering before pushing it to data warehouse. Expertized in core Java, JDBC and proficient in using Java APIs for application development. Expertized in Java Script, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls. Good experience in Tableau for Data Visualization, Druid and analysis on large data sets, drawing various conclusions. Leveraged and integrated Google Cloud Storage and Big Query applications, which connected to Tableau for end user web-based dashboards and reports. Worked on Azure Data Bricks and Azure Data Factory to automate the Process of loading the data into Reporting Database Periodically. Good Knowledge in Amazon Web Service (AWS) concepts like Athena, EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics. Expertized in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL and HDFS, parallel processing - MapReduce framework. Developed Spark-based applications to load streaming data with low latency, using Kafka and PySpark programming. Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Azure Databricks, Snowflake, and Azure SQL Data warehouse or Synapse Analytics and Controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory. Experienced in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm, and MapReduce open-source tools. Experienced in installation, configuration, supporting and managing Hadoop clusters. Experienced in working with MapReduce programs using Druid, Apache Hadoop for working with Big Data. Experienced in development, support, and maintenance of the ETL (Extract, Transform and Load) processes using Talend Integration Suite. Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS. Strong hands-on experience with AWS services, including but not limited to EMR, S3, EC2, route53, RDS, ELB, DynamoDB, CloudFormation, etc. Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Pig, Impala, Sqoop, Oozie, Flume, Storm, big data technologies. Worked on Spark, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines. Experienced in working with different scripting technologies like Python, UNIX shell scripts. Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, NetBeans. Expertized working on Amazon EMR, Spark, Kinesis, S3, ECS, Druid Elastic Cache, Dynamo DB, Redshift. Experienced in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4 & CDH5 clusters. Proficiency in multiple databases like NoSQL databases (MongoDB, Cassandra), MySQL, ORACLE, and MS SQL Server. Experience in database design, entity relationships and database analysis, programming SQL, stored procedures PL/SQL, packages, and triggers in Oracle. Experience in working with different data sources like Flat files, XML files and Databases. Ability to tune Big Data solutions to improve performance and end-user experience. Managed multiple tasks and worked under tight deadlines and in fast-paced environment. Excellent analytical, communication skills which helps to understand the business logics and develop a good relation between stakeholders and team members. Educational Details Bachelor of Technology, Computer Science and Engineering, KL University, Vijayawada 2014 Technical Skills Summary Big Data Technologies HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Apache Spark, Spark Streaming, Impala, Collibra Hadoop Distribution Cloudera, Horton Works, AWS Languages SQL, Python, Scala, Druid Regular Expressions, PL/SQL, Pig Latin, HiveQL, Linux Operating Systems Windows, UNIX, LINUX, UBUNTU, CENTOS Portals/Application servers WebLogic, WebSphere Application server, JBOSS Build Automation tools SBT, Ant, Maven Databases Amazon RDS, Amazon Redshift, Oracle, SQL Server, MySQL, MS Access, Teradata, Cassandra, HBase, MongoDB ETL Tools Informatica Power Center, Talend Open Studio for Big Data Cloud Technologies AWS, Snowflake, Azure Data Factory, Azure Data Lakes, Azure Blob Storage, Azure Synapse Analytics, Amazon S3, EMR, Redshift, Lambda, Athena, Glue Work Experience FPL, Juno Beach, FL Jan 2022 Till Date Big Data Engineer Responsibilities: Worked on analyzing Hadoop stack & different Big Data analytics tools like Pig, Hive, HBase database & Sqoop. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity. Performed job functions using Spark APIs in Scala for real time analysis and for fast querying purposes. Involved in creating Spark applications in Scala using cache, map, reduce by Key etc. functions to process data. Created Oozie workflows for Hadoop based jobs including Sqoop, Hive and Pig. Created Hive External tables and loaded the data into tables and query data using HQL. Used Azure DevOps Version controller for the Code version controlling. Handled the importing of data from various data sources, performed transformations using hive, Map-Reduce, loaded data into HDFS and extracted data from MySQL into HDFS using Sqoop. Wrote HiveQL queries by configuring number of reducers and mappers in the query needed for the output. Transferred data between Pig Scripts and Hive using HCatalog, transferred relational database using Sqoop. Configured and maintained different topologies in Storm cluster and deployed them on regular basis. Responsible for building scalable distributed data solutions using Hadoop, Installed and configured Hive, Pig, Oozie and Sqoop on Hadoop cluster. Developed simple to complex Map-Reduce jobs using Java programming to implement using Hive and Pig. Ran many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster. Configuring the Kafka, Storm, and Hive to get and load the real time messaging. Worked on Databricks Structured Streaming of the Telemetry data from the Azure Event Hub to Azure data lake storage. Analyzed the data by performing Hive queries (HiveQL) and running Pig Scripts (Pig Latin). Cluster coordination services through Zookeeper. Installed and configured Hive & also written Hive UDFs. Worked on the Analytics Infrastructure team to develop a stream filtering system on top of Apache Kafka & Storm. Worked on a POC on Spark and Scala parallel processing. Real streaming the data using Spark with Kafka. Worked extensively on PySpark to build Big Data flow. Good hands-on experience with Apache Spark in my current project Environments: Hadoop, Spark, HDFS, Hive, Pig, HBase, Big Data, Apache Storm, Oozie, Sqoop, Kafka, Flume, Zookeeper, MapReduce, Cassandra, Scala, Linux, NoSQL, MySQL Workbench, Java, Eclipse, SQL Server. Knight-Swift Transportation, Phoenix, AZ Sep 2019 Dec 2021 Big Data Engineer Responsibilities: Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs. Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture. Loaded and transformed large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts. Installed and Configured Apache Hadoop clusters for application development and Hadoop tools. Installed and configured Hive and written Hive UDFs and used repository of UDF's for Pig Latin. Developed data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis. Migrated the existing on-prem code to AWS EMR cluster. Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution. Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades. Worked on modeling of Dialog process, Druid, Business Processes and coding Business Objects, Query Mapper and JUnit files. Created automated pipelines in AWS Code Pipeline to deploy Docker containers in AWS ECS using S3. Used HBase NoSQL Database for real time and read/write access to huge volumes of data in the use case. Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into HBase. Developed AWS Lambda to invoke glue job as soon as a new file is available in Inbound S3 bucket. Created spark jobs to apply data cleansing/data validation rules on new source files in inbound bucket and reject records to reject-data S3 bucket. Created HBase tables to load large sets of semi-structured data coming from various sources. Responsible for loading the customer's data and event logs from Kafka into HBase using REST API. Created tables along with sort and distribution keys in AWS Redshift. Created shell scripts and python scripts to automate our daily tasks (includes our production tasks as well) Created, altered, and deleted topics using Kafka Queues when required with varying. Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a MapReduce. Developed analytics enablement layer using ingested data that facilitates faster reporting and dashboards. Created Hive External tables to stage data and then move the data from Staging to main tables Implemented the Big Data solution using Hadoop, Druid, hive and Informatica to pull/load the data into the HDFS system. Developed applications using Angular6 and lambda expressions in Java to store and process the data. Implemented Angular 6 Router to enable navigation from one view to next as agent performs application tasks. Pulling the data from Hadoop data lake ecosystem and massaging the data with various RDD transformations. Used PySpark API over Hortonworks Hadoop YARN to perform analytics on data in Hive. Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala, and Python. Developed and maintained batch data flow using HiveQL and Unix scripting. Designed and Developed Real time processing Application using Spark, Kafka, Scala, and Hive to perform streaming ETL and apply Machine Learning. Developed and execute data pipeline testing processes and validate business rules and policies. Implemented different data formatter capabilities and publishing to multiple Kafka Topics. Written automated HBase test cases for data quality checks using HBase command line tools. Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance. Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files. Environment: Hadoop 3.0, MapReduce, Hive 3.0, Agile, Druid, HBase 1.2, NoSQL, AWS, EC2, Kafka, Pig 0.17, HDFS, Java 8, Hortonworks, Spark, PL/SQL, Python, Jenkins. Fifth Third Bank, New York, NY Oct 2016 Aug 2019 Big Data Engineer Responsibilities: Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production. Extensively used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value. Involved in implementing the solution for data preparation which is responsible for data transformation as wells as handling user stories. Developed and tested ETL pipelines for data Ingestion/Preparation/Dispatch jobs in azure Data Factory as BLOB storage. Worked on migrating existing SQL data and reporting feeds to Hadoop. Developed Pig script to read CDC files and ingest into HBase. Worked on HBase table setup and shell script to automate ingestion process. Created Hive external tables on top of HBase to be used for feed generation. Scheduled automated run for production ETL data pipelines in Talend Open Studio for Big Data. Worked on migration of an existing feed from hive to Spark. To reduce latency of feeds the existing HQL that got transformed to run using Spark SQL and Hive Context. Worked on logs monitoring using Splunk. Performed setup of Splunk forwarders & built dashboards on Splunk. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics. Data Ingestion to one or more Azure Services (Azure Data Lake Storage Gen2, Azure Storage, Azure SQL DW) and processing the data in Azure Databricks. Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB). Created dispatcher jobs using Sqoop export to dispatch the data into Teradata target tables. Implemented new PIG approach for SCD type1 jobs using PIG Latin scripts. Created Hive target tables to hold the data after all the PIG ETL operations using HQL. Created HQL scripts to perform the data validation once transformations done as per the use case. Implemented compression technique to free up some space in the cluster using Snappy compression on HBase tables to reclaim the space. Hands on experience with accessing and perform CURD operations against HBase data. Integrating SQL layer on top of HBase to get the best performance while reading & writing using salting feature. Written shell scripts to automate the process by scheduling and calling the scripts from scheduler. Create Hive scripts to load the historical data and to partition the data. Closely collaborated with both the onsite and offshore team Closely worked with App support team to deploy the developed jobs into production. Environment: Hadoop, HDFS, Map Reduce, Hive, Flume, Sqoop, PIG, Java (JDK 1.6), Eclipse, MySQL and Ubuntu, Zookeeper, SQL Server, Talend Open Studio for Big Data, Shell Scripting. Genems Systems Inc, Hyderabad, India Jun 2014 Sep 2016 Hadoop Developer Responsibilities: Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Bigdata technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive. Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers. Developed full SDLC of AWS Hadoop cluster based on client's business needs. Involved in loading and transforming large sets of structured, semi-structured and unstructured data from relational databases into HDFS using Sqoop imports. Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (Cassandra) Responsible for importing log files from various sources into HDFS using Flume. Analyzed data using HiveQL to generate payer by reports for transmission to payer's form payment summaries. Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format. Used Data Frame API in Scala for converting the distributed collection of data organized into named columns. Performed data profiling and transformation on the raw data using Pig, Python, and Java. Developed predictive analytic using Apache Spark Scala APIs. Involved in working of big data analysis using Pig and User defined functions (UDF). Created Hive External tables and loaded the data into tables and query data using HQL. Developed Shell, Perl, and Python scripts to automate and provide Control flow to Pig scripts. Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with CSV, JSON, parquet and HDFS files. Developed Hive SQL scripts for performing transformation logic and loading the data from staging zone to landing zone and Semantic zone. Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability. Worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data on a timely manner. Environment: Big Data, Spark, YARN, HIVE, Pig, JavaScript, JSP, HTML, Ajax, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL. Keywords: quality analyst sthree database information technology microsoft procedural language Arizona Florida New York |