Home

Jyoti - Big Data Engineer
[email protected]
Location: La Presa, California, USA
Relocation: No Looking for remote
Visa: H1B
Ahalya
Bench Sales Recruiter
Vdrive IT Solutions Inc. | 800 E Campbell Road, Suite # 157 | Richardson, TX 75081
Office: +1 (469)-988-5899
[email protected]


Professional Summary

10+ years of diverse experience in the field of Information Technology, with approximately 5+ years of BigData experience in development and designing, 4+ years of experience as quality Analysts that includes various types of testing on different applications.
Hands on experience in designing and implementing a complete end-to-end Hadoop Infrastructure using Azure Databricks, Spark, Scala, YARN, HDFS, HIVE, Sqoop, HBase, Kafka, Apache Solr.
Experience in Continuous Integration/Deployment tools like Maven, Jenkins, Bamboo, UCD.
Experience in Apache Hadoop additional components like Ambari, Hue, Oozie and integrated tools like IBM BIGSQL and IBM Maestro.
Experience in Creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in HiveQL.
Experience in designing and implementing data warehouse systems which includes gathering requirements, data modeling, developing ETL, and building test cases.
Good knowledge on ETL procedures to transform the data in the intermediate tables according to the business rules and functionality requirements.
Good hands-on experience in Unit testing, Integration Testing and Functionality testing.
Worked on Job automation used shell scripting and Used bash shell scripting and Perl for writing scripts.
Good experience in methodologies like Waterfall, Agile and Scrum processes.
Good Knowledge on Google cloud Service(GCS)
Testing Desktop and Web-Based applications with SQL/Oracle Databases.Expertise in using Tools HP Quality Center, Version One, ALM and JIRA

TECHNICAL SKILLS
Hadoop ecosystem HDFS, MapReduce, Sqoop, Hive, Flume, Spark, Zookeeper, Oozie, Ambari, Hue, Yarn, Cloudera, Hbase, Solr, Janus Graph DB, Kafka, BigQuery
IDE s & Utilities Eclipse, IntelliJ
Cloud Platform Azure Data lake services(ADLS), Azure Databricks, Amazon web services(AWS), EMR, S3, EC2
CICD tool Git, Bit bucket, Jenkins, Docker, Oozie, UCD, Concord
Defect Tracking Tools HPSM, Team Foundation Server(2012, 2015), Quality Center 9.0/10.0, MTM, JIRA, Version one
Programming/Scripting Languages C/C++, shell/Unix scripting, Scala, Java, SQL, SSMS
Operating Systems Windows, Linux/Unix
RDBMS OracleDB, SQL Server, MySQL, Big Query
Testing Methodologies Agile, Waterfall, V-Model

PROFESSIONAL EXPERIENCE
Client: Molina Feb 2023 - Current
Role: Azure Operations
Molina is an Americas leading healthcare insurance that is known providing Insurance products for different line of businesses. Molina technological organization provides platform or technology services to its entities or different domains claims, members, group, dental and other health related entities. Working with EIM Platform, importing all kinds of claims data, processing as per the business requirement using cloud technologies of ADLS loading the processed data into Integrated data System for further analytical and reporting.
Responsibilities
Daily Operational activities include monitoring and maintaining data Applications Implemented in Enterprise Analytics Platform, and handling data from various sources in Production environment.
Involved in the Migration from Cloudera to Azure
Work on Data quality issues and work with respective Dev team and fix the issue.
Triage the failed jobs with highest priority and work with respective teams to resolve issues to make sure we have proper remediation and no business impact.
Work on JIRA items/ Adhoc request to ensure the data load activities happen as per the schedule. Works on Incidents and resolve/assign ticket to corresponding team.
Create and schedule new Autosys jobs for any new process deployed in production.
Involve in creating hive tables and importing and transforming large sets of structured, semi-structured, and unstructured data using Spark Scala.
Understand the code and get required knowledge transform from development team to provide production support for deployed applications.
Understand each domain spark Scala code and make sure team get enough knowledge on the process before the process deployed in production environment.
Daily coordinate with offshore team and get the daily updates on jobs Monitoring, issues, JIRA tickets, KTs and process improvement.
Involve with dev team in any further process improvement for respective domain jobs.
Work with Hadoop Admin team if any cluster level issues causing jobs
Work with Reporting team to resolve any issues with executing queries issues.
Involve and implement in bridge calls to resolve issues.
Work on loading the data from different SQL sources to Data lake.
Maintain Month end data load activities.
Working on data loads using Sql server management studio (SSMS). Creating new packages, batch files , Autosys jobs for new jobs getting deployed into production.
Triage failed issues by checking the batch file logs.

Environment: Azure, Hadoop, HDFS, Spark, Scala, Kafka, Java, Cloudera, Linux, Yarn, Concord, Git, SSMS,Hue,Imapla,Hive,JIRA

Client: Molina Sep 2021 - Jan 2023
Role: Onsite Bigdata Operations Lead
Molina is an Americas leading healthcare insurance that is known providing Insurance products for different line of businesses. Molina technological organization provides platform or technology services to its entities or different domains such as claims, members, group, dental and other health related entities. Working with EIM Platform, importing all kinds of claims data, processing as per the business requirement using Hadoop architecture and loading the processed data into Hadoop RealTime ingesting System as well Integrated data System for further analytical and reporting needs.
Responsibilities
Daily Operational activities include monitoring and maintaining data Applications Implemented in Enterprise Analytics Platform, and handling data from various sources in Production environment.
Triage the failed jobs with highest priority and work with respective teams to resolve issues to make sure we have proper remediation and no business impact.
Work on JIRA items/ Adhoc request to ensure the data load activities happen as per the schedule.
Create and schedule new Autosys jobs for any new process deployed in production.
Involve in creating hive tables and importing and transforming large sets of structured, semi-structured, and unstructured data using Spark scala.
Troubleshoot and mitigate production issues by implementing any code changes, working along with development team and business team.
Understand each domain spark scala code and make sure team get enough knowledge on the process before the process deployed in production environment.
Daily coordinate with offshore team and get the daily updates on jobs Monitoring, issues, JIRA tickets, KTs and process improvement.
Involve with dev team in any further process improvement for respective domain jobs.
Work with Hadoop Admin team if any cluster level issues causing jobs.
Maintain Month end data load activities.
Working on data loads using Sql server management studio (SSMS). Creating new packages, batch files , Autosys jobs for new jobs getting deployed into production.
Triage failed issues by checking the batch file logs.


Environment: Azure, Hadoop, HDFS, Spark, Scala, Kafka, Java, Cloudera, Linux, Yarn, Concord, Git, SSMS, Hue, Impala, Hive, JIRA

Monster Government solutions Aug 2020 Sep 2021
Role: BigData Engineer
Adroitent is a global technology services and outsourcing company delivering high-quality software engineering services using agile methodology and SEI-CMMi processes has in-depth experience in enterprise-based architectures, social media and big data solutions, hybrid and native enterprise mobile implementations and cross-platform integration. Working as a big data engineer for their Client Monster Government Solutions to extract data from RDBMS Oracle, process data by applying transformation and load the processed data to EMR RDS MySQL.This processed data from RDS is further used by analytics team and reporting team.

Responsibilities
Involve and participate in day to day Project design and architecture meetings, finalize the Design and daily Agile Scrum meetings
Analyze the data and understand the data flow of current existing ETL system.
Analyze the Data in transactional tables in exiting database and create a Data Lineage sheet for all the source and target tables further are used in Development.
Extract Data from Oracle Database to EMR S3 storage using Spark.
Develop Sql views for different levels of transformations which will be further used for data processing.
Developing Spark Scala code to extract data from oracle db, process data by applying transformation on the data and apply sql views to load the report required data to AWS RDS MySql.
Creating Hive external tables on top of the csv files to maintain temporary tables used in spark process.
Development is in progress for 15 reports data.
Developing shell scrip to perform the spark submit jobs based on different config files and parameters.

Environment: Amazon Web Services, EC2, EMR, RDS ,Oozie, Spark ,Scala, Spark SQL, HDFS, Hive, Linux, Intellij, Oracle, Shell Scripting.


MetLife, Cary Oct 2019 Aug 2020
Role: Hadoop Developer
MetLife is an Americas leading insurance project that is known providing Insurance products for different line of businesses. MetLife s technological organization provides platform or technology services to MetLife entities or different domains such as claims, dental and retail entities. Working in MetLife s Enterprise Online Storing System, that ingests Realtime datafrom the insurance entities/domains from different sources/vendors into BigDataReal Time Systems to serve the enterprise 360 call centre services. Working with new Disability Platform, importing all kinds of claims data, processing as per the business requirement using Hadoop architecture and loading the processed data into Hadoop RealTime ingesting System as well Integrated data System for further analytical and reporting needs.
Responsibilities
Experience in building data flow ingestion framework (MDFE) from external sources (DB2/IBM MQ/Informatica/SQL servers) into Big data Ecosystems (Hive/HBase/Solr/Titan/Janus) using Big Data tools Apache Spark, Spring Boot, Spark SQL.
Work with the connecting partners (GSSP/SPI or API) to gather the requirements to build and support the required framework.
Creating HBase/NoSQL designs including Slowly Changing Dimension (SCD) with in the design to maintain history of updates.
Creating Hive designs including Slowly Changing Dimension (Type-2 and Type-4) with in the design to maintain history of updates.
Developing Sqoop scripts to extract large historical and incremental data from legacy applications (traditional warehouses such as Oracle, DB2, and SQLServer) into Big Data for Data and Analytics business.
Creating the low-level design for the injection, transforming and processing of the Daily business datasets using Sqoop, HIVE, Spark, IBM BIGSQL and IBM Maestro.
Automate and deploy the code into Production using CI/CD tools like BitBucket, Bamboo and urbancode.
Creation of various database objects and performance fine tuning of SQL tables and queries
Analyzing multiple SOR data which include disability/Absenceclaims, members, Treatments (ICD codes), Payments, etc and also involved in the design and architecture discussions of the project.
Creating Hive tables, loading data and performing analysis using hive queries.
Creating Scala script to read CSV files, process and load the corresponding JSON file to HDFS.
Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
Expert hands-on in data deduplication, data profiling for many production tables.
Worked with SPARK ecosystem using SCALA/HIVE Queries on different data formats like text, csv, json.
Developed Spark Scala program for transforming a semi and unstructured data into the structured target using Data Bricks and SparkSQL.
Developed Scripts and Batch Job to schedule using Autosys and IBM Scheduler.
Experience working with SSMS project creating/updating the packages used to load data from different source systems to destination QDR servers into flat files.
Write Hive queries for data analysis to meet the business requirements.

Environment: Scala, Spark, Janus, Titan, Hbase, Solr, Spark SQL, HDFS, Hive, Linux, Intellij, Hue, Sqoop, Oracle, Shell Scripting, SSMS, Yarn, Ambari and Hortonworks

State Farm Insurance, TX May 2019 October 2019
Role: Hadoop Developer
State Farm is a large insurance and financial services companies which operates Nationwide in U.S. Finance Modernization efforts in State Farm focuses on enterprise priority deliveries with the focus on enabling the modernization of policy administration systems (Life, Auto & Fire) of financial transaction data. This focus in turn enabling the enterprise to lead with analytics and operate in a Big Data environment by sunsetting financial legacy systems, data, and reports. We receive policies from our Guidewire Policy center, FRR and Non FRR feeds data are landed in HDFS using data streaming Kafka and flume, processed the data according to business requirements using Spark Scala.
Responsibilities
Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
Implemented test scripts to support test driven development and continuous integration.
Involved in scheduling Oozie workflow engine to run multiple Spark jobs.
Developed the Spark data transformations programs based on Data Mapping.
Experience of GitHub and use of Git bash for code submission in GitHub repository.
Experienced with batch processing of data sources using Apache Spark.
Involve in Production Support and resolve any issues occurring asap to help the data flow to downstream on time.
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Deploy code using Urban code deploy and create a fully Automated Build and Deployment Platform and coordinating code builds, promotions and orchestrated deployments using Jenkins, Oozie and GIT.
Experienced on Docker to containerize the Application and all its dependencies by writing Docker file and Docker-Compose files. Designed, wrote and maintained systems (Python) for administering GIT. By using Jenkins, Bamboo as a full cycle continuous delivery tool involving package creation, distribution and deployment onto Hadoop servers that has shell scripts embedded.
Developing scripts for build, deployment, maintenance and related tasks in order to implement CI (Continuous Integration) system using Jenkins, Docker, Maven, Python and Bash.
Experience in part of POC on Bamboo, Docker for continuous integration and for End-to-End automation for all build and deployments.

Environment: Python, Scala, Spark, Spark SQL, HDFS, Hive, Linux, IntelliJ, Oozie, Hue, Oracle, Shell Scripting, Hortonworks, Docker, Jenkins, GitBash, Version One

MetLife, Cary Feb 2018 - Apr 2019
Hadoop Developer
MetLife is an Americas leading insurance project that is known providing Insurance products fordifferent line of businesses. MetLife s technological organization provides platform or technology services to MetLife entities or different domains such as claims, dental and retail entities. Working in MetLife Data Analytics team, we make use of the insurance entities/domains data from different sources and inject into BigData to do data analytics, to increase their customer base and revolutionize the retail market to hold MetLife s position in the competitive market. Any electronically generated Data can be ingested and stored in the Hadoop Distributed File System further analysed and processed to aid in improvising the sales of products and quality of customer care at the lowest possible cost by build Enterprise Online Store, faster environment to enable agents to provide various options to customers.

Responsibilities
Developed Spark code using Scala for processing of data.
Migrated legacy application to Big Data application using Sqoop/Hive/Spark/HBase.
Implemented proof of concepts on Spark and Scala for various source data (XML/JSON) transformations and processed data using Spark SQL.
Creating Hive tables, loading data and performing analysis using hive queries.
Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
Expert hands-on in data deduplication, data profiling for many production tables.
Worked with SPARK ecosystem using SCALA/HIVE Queries on different data formats like Text and CSV.
Implemented Spark best practices like partitions, caching and check pointing for faster data access.
Involved in HBase schema key designs, and Solr designing for contextual search features for a POC.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Developed Scripts and Batch Job to schedule various Hadoop Program.
Write Hive queries for data analysis to meet the business requirements.
Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
Developed solutions to pre-process large sets of structured, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Experienced with batch processing of data sources using Apache Spark.
Migrating tables from TEXT format to ORC and data induction and other customized file formats.
Designing and documenting the project use cases, writing test cases, leading offshore team, and interacting with client.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
Responsible for managing data from multiple sources, Load and transform large sets of structured, semi-structured data.

Environment: Scala, Spark, Spark Streaming, Spark SQL, HDFS, Hive, Pig, Linux, Eclipse, Oozie, Hue, Apache Kafka, Sqoop, Oracle, Shell Scripting, Yarn, Ambari and Hortonworks.

Emerson, Austin, TX Apr 2017 - Feb 2018
Software Engineer
Working on Distributed Control system (DCS), a computerized control system for a process or plant, in which autonomous controllers are distributed throughout the system, which is controlled by a central operator supervisory.The project was to design an internal web portal to discover/explore the prospects. Internal web portal is for all the statistical analysis teams at Silicus technologies. This particular portal has much functionality like discovering a prospect, knowing people, knowing business, understanding the digital data. Data ingestion, Data loading into HDFS from legacy systems, ETL processing on top of data to expose the information to web portal by loading back to Hadoop.
Responsibilities
Manage and review Hadoop log files on clusters.
Performed NoSQL operations, Hadoop analytics, and event stream processing.
Extracted files from DB2 through Sqoop and placed in HDFS and processed.
Analyzed large data sets by running Hive queries and Pig scripts.
Creating Hive tables and loading and analyzing data using hive queries.
Developed Simple to complex Map Reduce Jobs using Hive and Pig
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
Involved in unit testing using MR unit for Map Reduce jobs.
Involved in loading data from LINUX file system to HDFS.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Responsible in exporting analyzed data to relational databases using Sqoop.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries.
Experience of GitHub and use of Git bash for code submission in GitHub repository.
Worked on Hbase NoSQL based database that persists high-volume user profile data.
Create Hive scripts to extract, transform, load (ETL) and store the data using Talend.

Technical Environment: Hadoop, HDFS, pig, Hive, Map Reduce, Sqoop, Python, Publisher/Subscriber, LINUX and Hbase.
.
TEK Systems Apr 2011 Jan 2015
Software Engineer
LaserJet Firmware Division at HP follow Test-Driven Development in an Agile Environment with most of testing automated. Cruise Control is used for Continuous Integration and Virtual Machine Provisioning System execute more than a million automated tests per day using the latest available revision. Simulators /Emulators/Engines are used for automated testing. Manual testing is performed only for the test cases which are not covered under Continuous Automated testing.
Implemented the project according to the Software Development Life Cycle (SDLC).
Exception Management mechanism using Exception Handling Application Blocks to handle the exceptions.
Designed and developed user interfaces using JSP, Java script and HTML.
Involved in Database design and developing SQL Queries, stored procedures on MySQL.
Logging was done through log4j.
Executed Manual Test Cases by using positive and negative test scenarios.
Device recovery when there is FW corruption in device.
Execute Manual tests to verify various features & functionality on various HP MFP and SFP. Functionalities Includes: Boot-to-Ready, Print, Copy, Digital send, Scan to Network shared folder, Fax, Web-Content, FIM, Security, Email, PDL, Encryption & Decryption Disk tests.
SMTP (simple mail transfer protocol), LDAP (lightweight directory access protocol), FTP (file transfer protocol), SNMP (simple network management protocol), SNTP (simple network time protocol) testing.
Performed Regression testing at various phases of the project development.
Performed Test Run manually and maintained logs in Test Lab of HP Quality Center
Writing Automated test cases in Eclipse Editor Using Selenium WebDriver for HP EWS Web page and execute automated tests, debug and fixing the test case.
Performance testing is performed to check the performance of various features (Print, Scan, UI, Applications) on the printer (MFP/SFP) on their first boot and also over continuous usage of printer.
Execute performance test suite for each type of device/engine on latest firmware revision, generate test report, verify and uploaded to internal lab portal to make the results accessible to entire team/business.
Issues found are to be triaged and root caused to identify the issue in the Firmware and the recent change which caused the issue. A CR has to created and assigned to the owner of the change.

Technical Environment: SDLC, JSP, Java Script, Html, Sql, MySql, SMTP, LDAP,FTP.

EDUCATION
Master s in Computer Science, Texas A&M, Commerce, Texas. (2016)
Bachelor of Technology, Electrical Engineering, Andhra University, India. (2006)
Keywords: cprogramm cplusplus continuous integration continuous deployment user interface message queue sthree database information technology hewlett packard Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1916
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: