Vinod - Data engineer |
[email protected] |
Location: Overland Park, Kansas, USA |
Relocation: Yes |
Visa: |
Vinod
AWS Data Engineer (804)-588-9938 [email protected] Professional Summary: Having experience of 9+ as a Data Engineer with expertise in designing data intensive applications using Hadoop Ecosystem, Data Analytical, AWS engineer, Cloud Data engineering, Data Warehouse/ Data Mart, Data Visualization, Reporting, and Data Quality solutions. In - depth knowledge of Hadoop architecture and its components like YARN, HDFS, Name Node, Data Node, Job Tracker, Application Master, Resource Manager, Task Tracker and Map Reduce programming paradigm. Extensive experience in Hadoop led development of enterprise level solutions utilizing Hadoop components such as Apache Spark, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper, and YARN. Programminglanguages:Experienceworkingin Python,Scala,Java,C++,shellscriptingonUNIXand LINUX(Ubuntu) understanding of MapReduce programming paradigm and Spark execution framework. Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data frame API, Spark Streaming, MLlib, Pair RDD's and worked explicitly on PySpark and Scala. Proficient in AWS Cloud Platform which includes services like EC2, S3, VPC, ELB, DynamoDB, Cloud Front, Cloud Watch, Route 53, Security Groups, Red shift, CloudWatch and CloudFormation. Handled ingestion of data from different data sources into HDFS using Sqoop, Flume and perform transformations using Hive, Map Reduce and then loading data into HDFS. Managed Sqoop jobs with incremental load to populate HIVE external tables. Good working experience in using Apache Hadoop eco system components like MapReduce, HDFS, Hive, Sqoop, Pig, Oozie, Flume, HBase and Zookeeper. Extensive experience working on spark in performing ETL using Spark Core, Spark-SQL and Real-time data processing using Spark Streaming. Experience in designing and creating RDBMS tables, Views, User Created Data Types, Indexes, Stored Procedures, Cursors, Triggers and Transactions. Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML. Experienced with JSON based RESTful web services, and XML/QML based SOAP web services and worked on various applications using python integrated IDEs like Sublime Text and PyCharm. Building and productionizing predictive models on large datasets by utilizing advanced statistical modelling, machine learning, or other data mining techniques. Data Lake Design & implementation on AWS Developed intricate algorithms based on deep-dive statistical analysis and predictive data modelling that were used to deepen relationships, strengthen longevity, and personalize interactions with customers. TechnicalSkills: BigDataEcosystem Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie,ZooKeeper, Spark. HadoopDistributions Cloudera,AWSEMR,Hortonworks AWS EMR,Glue,S3,EC2,Lambda,Athena,Stepfunction,APIgateway,SNS,GlueCatalogue, Redshift,DynamoDb, CloudWatch GCP Dataproc,BigQuery,DataFlow,GCS,ComputeEngine AzureComponents AzureDataFactory,AzureSQL,AzureDatabricks Languages Java,Scala, Python,SQL,ShellScripting Workflow Airflow,Stepfunctions,Dataflow,Control-M WebTechnologies HTML,CSS,Javascript,JSON,XML/SOAP,REST,WSDL NoSQLDatabases CassandraandHBase IDE Databricks,JuPyter,IntelliJ,PyCharm,Eclipse, VersionControl Git,Subversion,CVS,MKS Database Oracle10g,MySQL,MSSQL Operatingsystems UNIX,LINUX,MacOSandWindowsVariants TrackingTools Rally,JIRA CI/CDTools Jenkins,Chef,Confluence,Bitbu cket Professional Experience: Sr. AWS Data Engineer June 22 Present Great American Insurance Responsibilities: Developed Spark Applications by using Python and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources. Responsible for building scalable distributed data solutions using Hadoop Extensive experience in working with AWS cloud Platform (EC2, S3, EMR, Redshift, Lambda and Glue). Built NiFi dataflow to consume data from Kafka, make transformations on data, place in HDFS and exposed port to run spark streaming job. Worked on Migrating jobs from NiFi development to Pre-PROD and Production cluster. Migrated an existing on-premises application to AWS platform using various services Experienced in Maintaining the Hadoop cluster on AWS EMR Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop. Work on Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming. Used Spark Streaming APIs to perform transformations and actions on the fly for building common. Load D-Stream data into Spark RDD and do in memory data Computation to generate output response. Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables. Configured Snow pipe to pull the data from S3 buckets into Snowflakes table. Worked on AWS redshift for shifting all Data warehouses into one Data warehouse. Design and implement real-time data visualization dashboards using Tableau to communicate insights to stakeholders Good understanding of Cassandra architecture, replication strategy, gossip, snitches etc. Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables. Stored in Hive to perform data analysis to meet the business specification logic. Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles. Worked in Implementing Kafka Security and Boosting its performance. Developed Oozie coordinators to schedule Hive scripts to create Data pipelines. On cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager. Environment: Spark, Spark-Streaming, Spark SQL, AWS, map R, HDFS, Hive, Pig, Apache Kafka, Sqoop, Python, Pyspark, Shell scripting, Linux, MySQL Oracle Enterprise DB, SOLR, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap, Cassandra & Agile Methodologies. Data EngineerJune 19 May 22 T-Mobile India Responsibilities: Designed the data flow for the collapse of 4 legacy data warehouses into an AWS Data Lake Designed the ETL strategy into an Analytics Sandbox and the Business intelligence Facility(EDW) Design and Develop complex Data pipelines using Sqoop,Spark, and Hive to ingest, transform and analyze customer behavior data. Built a data visualization dashboard using D3.js to visualize streaming data from a wind turbine farm to help engineers monitor turbine performance and identify issues in real-time. Extracting the data from the RDBMS by using Sqoop into Hadoop file systems (HDFS). Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data. Worked on AWS environment and technologies such as S3, EC2, EMR, Glue, CFT, Lambda and databases Oracle, SMS, DynamoDB, MongoDB. Developed the Pyspark code for AWS glue jobs and integrated them with AWS data lake and its services. Worked with AWS based Data ingestion and transformations. Designed detail migration plans for workloads from third party data and cloud platforms to AWS and executed them successfully. Implemented logging framework - ELK stack (Elastic Search, LogstashKibana) on AWS. Transformed, manipulated, and cleansed the data using Python. Sourcing from different databases to S3 bucket using the Python. Setup Spark EMR to process huge data which is stored in Amazon S3. Developed and implemented Apache NIFI pipelines across various environment. Implementing the Partitioning and bucketing for faster query processing in Hive Query Language. Manage and monitor Hadoop log and spark log files of the jobs. Developed the shell scripts for migrating and deploying in Production Servers. Apply different HDFS formats and structure like Parquet, Avro, etc. to speed up analytics. Fine tune Hadoop applications for high performance and throughput, troubleshoot and debug any Hadoop ecosystem run time issues. Environment:Hadoop, Spark, Hive, Sqoop, Kafka, Python, Bash/Shell Scripting, HDFS, Unix, Scala, GIT, Jenkins, AWS. Data Engineer Oct 17 May 19 State Farm India Responsibilities: InvolvedinwritingSparkapplicationsusingPythontoperformvariousdatacleansing,validation,transformation andsummarizationactivities accordingtotherequirement. CreatedvarioussparkapplicationsusingScalatoperformvariousenrichmentoftheclickstreamdatacombinedwithenterprisedataof theusers. Design&Developmentofsparkapplicationsfordataextraction,transformation,andaggregation Frommultiplesources. PerformancetuningofSparkapplicationforsettingrightbatchinterval time,correctlevelofparallelismandmemory tuning. Applicationdesignforintegrationwith RESTAPI s,MerchantUIandcustompythonlibraries. BuildingframeworkforDatamigrationanddevelopingsparkjobs. TranslatebusinessanddatarequirementsintoLogicaldatamodelsin supportofEterpriseDataModels,OLAP,OLTPandAnalyticalsystems. Visualizationofstorelayoutwithadjacency,left/right,opposite&perpendicularmapping. Infrastructuredesign,deploymentandconfigurationmanagementonGCPcloud. AutomationofETLprocessestowranglethedata, postingrecommendationdatastorebystore. Automationofstoredataanalysisbydepartmentandvalidation. BuiltHorizontally, VerticallyscalabledistributeddatasolutionswithPythonmultiprocessing,RESTAPI,andGCP. DesigningOozieworkflowsforjobschedulingandbatchprocessing. Environment: Python, Apache Spark, Scala, PySpark,,REST, Pandas, GCP, Dataproc, VM, Elastic search, , Shell-scripting,Linux,UnixShellScripting,ApacheKafka. JavaDeveloper June 15 Sep 17 EHS InfoTech Solutions India Responsibilities: Participateincapturingbusinessrequirementsandtranslatingthemintodetaileddesigndocuments. ApplicationsdevelopedusingJavaandJ2EEtechnologies. Weuseopen-sourcetechnologiessuchasHibernate,ORMandSpringFrameworktodevelopand maintain sophisticated service-based architectures. Developing server-sideservicesusingJava Multithreading,StrutsMVC,Java,EJB, Spring,WebServices (SOAP, WSDL, AXIS). Developed aDAO layer using SpringMVC,and configuration XMLforHibernatetomanageCRUD operations (insert, update, delete). Used AJAXand JavaScriptforvalidations and integrating business server-sidecomponents onthe client side within the browser. UsedRESTFULServicestointeractwiththeClientbyprovidingtheRESTFULURLmapping. Implementingproject using AgileSCRUM methodology,involved in daily stand up meetings and sprint showcase and sprint retrospective. Developed userinterface using JSP,JSPTag libraries,and Java Script tosimplifythecomplexities ofthe application. YouperformedaCRUDoperationsuchasupdating,inserting,ordeletingdatain Oracle. Afunctionaltestcasewascreated,andabugfixwasachieved. WorkedwithWebLogicApplicationServertosetupanddeployapplications.Hands-onexperience with J2EE frameworks such as Servlets and JSPs. Deployingweb,presentation,andbusinesscomponentsonApacheTomcatApplicationServer. DevelopPL/SQLproceduresforvariousapplicationscenarios. DevelopedtheUIpanelsusingJSF,XHTML,CSS,DOJOandjQuery. Environment:HTML5,JSP,Servlets,JDBC,JavaScript,Json,Spring,SQL,Oracle11g,Tomcat,EclipseIDE,XML, ANT,Tomcat. Hadoop/ETL Developer April 14 - May 15 DST Systems - India Responsibilities: Closely working with the Subject Matter Experts (SME s) for the Analysis, Design on Functional documents. Coordinating with source system owners. Involved in full life cycle for the project, Design, data Modelling, Requirements gathering Seasoned in data warehousing, data modeling, data marts, data migration and data analysis. Collaborate with data architects for data model management and version control. Conduct data model reviews with project team members. Capture technical metadata through data modeling tools. Create data objects (DDL). Enforce standards and best practices around data modeling efforts. Ensure data warehouse and data mart designs efficiently support BI and end user Involved in Design, Development, testing, and implementation using ETL AB initio. Created generic graphs to read multiple source Data from Hadoop Data Lake. Produced the required documents like Mapping Documents, Design Documents, and Use Case Documents. Environment:MapR, Hadoop, Hbase, HDFS, Scoop, Flume, Ab Initio, Teradata, UNIX, Windows 7, Control-M Keywords: cplusplus continuous integration javascript business intelligence sthree database rlang |