Home

Umakanth - Sr Data Engineer
[email protected]
Location: Lansing, Michigan, USA
Relocation: Yes
Visa: H1B
Professional Summary:
Over 13 years of experience in design, development, implementation of Software applications and BI/DWH solutions. Experience in data discovery and advance analytics and building business solutions with knowledge in developing strategic ideas for deploying Big Data solutions in both cloud and on-premise environments, to efficiently solve Big Data processing requirements.

Build Advanced Analytics Applications on different eco systems Cloudera, HWX, GCP, Snow Flake and AWS.
Strong Understanding in distributed systems, RDBMS, large-scale & small-scale non-relational data stores, map-reduce systems, database performance, data modelling, and nifimulti-terabyte data warehouses.
Extensively used Hadoop open-source tools like Hive, HBase, Sqoop, Spark for ETL on Hadoop Cluster.
Detail-oriented Data Analyst with a strong analytical background and a proven track record of transforming data into actionable insights.
Proficient in ETL (Extract, Transform, Load) processes, Master Data Management (MDM), Data Security, and Data Governance.
Proficient in analysing and interpreting complex data sets to identify trends, patterns, and actionable insights.
Skilled in using statistical and data visualization tools to communicate findings effectively.
Worked with several data Integrating and Replication tools like Atunity Replicate etc.
Strong knowledge on system development lifecycles and project management on BI implementations.
Extensively used RDBMS like Oracle and SQL Server for developing different applications.
Build several Data Lakes on top of S3, HDFS to help different clients to perform their advance analysis on big data.
Work with Data science team to provide and feed data for AI, ML and Deep learning projects
Real-time experience in Hadoop Distributed files system, Hadoop framework and Parallel processing implementation (AWS EMR,Cloudera) with hands on experience in HDFS, Map Reduce, Pig/Hive, HBase, Yarn, Sqoop, Spark, Pyspark, RDBMS, Linux/Unix shell scripting and Linux internals.
Experience in writing UDF s and map reduce programs in java for Hive and Pig.
Created Kafka data pipelines to produce and consumer applications for log stream data.
Experience in Data visualization tools like tableau and looker.
Experience in creating scripts and Macros using Microsoft Visual Studios to automate tasks.
Strong expertise in Master Data Management, ensuring data accuracy, consistency, and reliability across the organization.
Capable of designing and implementing MDM solutions to maintain a single source of truth for critical business data.
Knowledgeable in data security best practices, ensuring the confidentiality, integrity, and availability of sensitive information.
Proficient in implementing data security measures, including encryption, access controls, and data classification.
Well-versed in establishing and maintaining data governance frameworks to manage data assets effectively.
Skilled in defining data policies, standards, and processes to ensure data quality, compliance, and accountability.

Other Experiences:
Have experience working with web designer tools like Adobe Dreamweaver CC, WordPress& Joomla.
Proficient in Manual, Functional and Automation testing.
Also experienced in Smoke, Integration, Regression, Functional, Front End and Back End Testing.
Capable in developing/writing Test Plans, Test Cases, and Test Scripts based on User Requirements, and SAD documentation.
Highly experienced in writing test cases and executing in HP Interactive Testing Tools: Quality Centre, Quick Test Professional (QTP).

Technical Skills:
Reporting Tools: Tableau and Looker
Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and HBase, CAWA, Spark, spark-sql, Impala, Mapr-DB, Azure, Oracle Big Data Discovery, Kafka, Nifi
Hadoop Ecosystems: MapR, Cloudera, AWS EMR, Horton Works.
Cloud Platforms:AWS, GCP, Azure
Servers: Application Servers (WAS, Tomcat), Web Servers (IIS6, 7, IHS).
Operating Systems: Windows 2003 Enterprise Server, XP, 2000, UNIX, Red Hat Enterprise Linux Server release 6.7
Databases: SQL Server 2005, SQL 2008, Oracle 9i/10g, DB2, MS Access2003, Teradata, postgresSQL
Languages:Python, Bash, SQL, XML, JSP/Servlets, Struts, spring, HTML, PHP, JavaScript, jQuery, Web services, Scala.
Data Modelling: Star-Schema and Snowflake-schema.
ETL Tools: Knowledge on Informatica & IBM Data stage 8.1,SSIS



Education:
Title of the Degree College/University Year of Passing
Master of Information Technology & Management Studies University Of Ballarat
Vic, Australia 2013
Bachelor Of Information Technology University Of Ballarat
Vic, Australia 2011
Board of Intermediate Education Narayana Jr. College
Telengana, India 2008
Board of Secondary Education St.Ann s Grammar High School
Malkajgiri, Hyd, India 2006


Work Experience:
iTech-Go, Clarkston, MI Apr 2022 Till Date
Client: PaloAlto Networks
SrData Engineer

Responsibilities:
Designed and implemented robust data architectures on Google Cloud Platform (GCP), incorporating industry best practices and ensuring scalability and performance.
Conducted data modeling and schema design for efficient storage and retrieval, optimizing BigQuery performance for complex queries.
Leveraged BigQuery to process and analyze large datasets efficiently, optimizing query performance and reducing costs.
Implemented partitioning and clustering strategies to enhance BigQuery query efficiency and reduce data processing time.
Led data ingestion projects in BigQuery and Databricks, incorporating API, Gsheets, file, RDBMS, and SFTP sources.
Developed and maintained yearly and quarterly reports on BigQuery, contributing to the creation of intricate executive dashboards.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring, troubleshooting, manage and review data backups, and manade and review Hadoop log files.
Contributed to various agile projects, including Smart Recruiters pipeline, HR Dashboards, Tableau Automation through ServiceNow, IT360, SR Dashboards, and Internal Mobility, leveraging Databricks for enhanced data processing.
Acquired extensive knowledge in the HR domain, enhancing recruiting capabilities.
Loading data from SAP to Hadoop environment using Sqoop.
Responsible for Hadoop cluster monitoring using the tools like Nagios, Ganglia and ambari.
Monitor Red Hat Linux servers health(Master, Edge, worker nodes) in Zabbix and troubleshoot/resolve reported issues.
Plan, schedule and apply patches on Linux servers to mitigate security vulner abilities.
Install, configure, upgrade and support RedHat Enterprise Linux servers and packages in VMware vSphere/ESX environment.
Utilized GCP technologies, including BigQuery, Kubernetes Engine, GCP buckets, and Cloud Functions, alongside Databricks for advanced data processing and analytics.
Built and orchestrated hundreds of pipelines on Airflow and Databricks, ensuring near real-time availability of data and dashboard reporting.
Designed and exposed APIs in Java Spring Boot and Flask for ServiceNow Tableau access requests, facilitating data processing into the Tableau system.
Proficient in Python, SQL, Spark, and Databricks for data engineering tasks.
Implemented debugging and monitoring solutions using Airflow, Datadog, Grafana, Kibana, Google Cloud Watch, and notifications via Datadog for Slack and emails.
Played a pivotal role in designing and planning new solutions for data pipelines, ensuring seamless communication with Business, Data Analysts, Business Intelligence, Technical Directors, and Product teams.
Provided production job support, addressing issues, and enhancing features in an agile environment.
Demonstrated expertise in building and enhancing data pipelines, aligning with business requirements.
Engineered scalable systems that effectively meet project requirements, guaranteeing efficient data processing and handling.
Design and implement multiple ETL solutions with more than 50 data sources by extensive SQL scripting, ETL tools, Python, shell scripting, and scheduling tools, including Databricks.
Wrote scripts in BQ SQL and Spark for creating complex tables with high-performance metrics like partitioning, clustering, and skewing.
Worked with Google Data Catalogue, Databricks, and other Google Cloud APIs for monitoring, query, and HR-related analysis for BigQuery and Databricks usage.
Created BigQuery authorized views for row-level security or exposing the data to other teams.


Cross Sense Analytics, Farmington Hills, MI Aug 2021 Mar 2022
Client: State of Ohio(Bureau of Worker s Compensation)
Sr Data Engineer

Responsibilities:
Responsible for creating Technical Design documents, Source to Target mapping documents and Test Case documents to reflect the ELT process.
Extracted data from various source systems like Oracle, Sql Server and flat files as per the requirements.
Installed and configured Hadoop MapReduce, Developed multiple Mapreduce jobs in Java for data clearing and preprocessing
Writing scripts for data cleansing, data validation, data transformation for the data coming from different source systems.
Worked on Hadoop cluster and data querying tools to store and retrieve data from the stored databases.
Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Environment : Hadoop, Hive, Java,
Worked on processing the data and testing using Spark SQL and on real-time processing by Spark Streaming and Kafka using Python.
Perform assorted Unix/Linux administration tasks, including daily audits on Linux, Solaris, VMware ESXi hosts, SAN and NAS devices, and resolve reported errors.
Adding in preparations for controlled changes to be introduced over each weekend into the Linux Solaris and storage environment.
Scripted using Python and PowerShell for setting up baselines, branching, merging, and automation processes across the process using GIT.
Worked with different file formats like Parquet files and also Impala using PySpark for accessing the data and performed Spark Streaming with RDDs and Data Frames.
Worked on Data Integration for extracting, transforming, and loading processes for the designed packages.
Designed and deployed automated ETL workflows using AWS lambda, organized and cleansed the data in S3 buckets using AWS Glue, and processed the data using Amazon Redshift.
Used Informatica admin tools to manage logs, user permissions, and domain reports. Generate and upload node diagnostics. Monitor Data Integration Service jobs and applications. Domain objects include application services, nodes, grids, folders, database connections, operating system profiles etc


GreenByte Technologies, Hyderabad, India Mar 2017-Jun2021
SrBig Data Developer

Responsibilities:
Helped client to understand performance issues on the cluster by analysing the Clouderastats.
Designed and implemented Optum Data Extracts and HCG Grouper Extracts on AWS.
Improved memory and time performances for several existing pipelines.
Developed data ingestion modules (both real time and batch data load) to data into various layers in S3, Redshift and Snowflake using AWS Kinesis, AWS Glue, AWS
Lambda and AWS Step Functions
Perform Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
Used Bash Shell Scripting, Sqoop, AVRO, Hive, Impala, HDP, Pig, Python, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
Build pipelines using spark, sparksql, hive, hbase tools and build pipelines using AWS airflow and exploring the power of distributed computing on AWS EMR
Loaded processed data into different consumption points like Apache solr, Hbase, at scale cubes for visualization and search.
Automated the workflow using Talend Big Data.
Scheduled jobs using Autosys.
Experienced in managing and reviewing Hadoop log files.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.

Environment: AWS services, AWS S3, AWS Glue, Lambda, Oracle SQl, Cloudera, Spark, Python, SQL,Talend workload automation, Jenkins, Git, postgresSQL



The Australian health system, Melbourne, Australia Mar 2015 Jan 2017
Sr Big Data Developer/ Digital Transformation (Cloudera)

Responsibilities:
Designed and implemented data integration solutions to extract, transform, and load (ETL) Epic Electronic Health Record (EHR) data into data warehouses, enabling comprehensive reporting and analytics.
Developed data models to ensure accurate representation of clinical and operational data from Epic Systems, facilitating a better understanding of patient care and hospital performance.
Established robust data quality checks and validation procedures to ensure the integrity and accuracy of clinical and operational data transferred from Epic Systems to Foundry.
Ensured the secure handling of sensitive patient data by implementing data encryption, access controls, and adherence to healthcare compliance standards, such as HIPAA.
Tuned ETL workflows to improve data processing efficiency and performance, reducing data latency and ensuring timely access to healthcare data for analytics.
Established and maintained data warehousing infrastructure specifically tailored to Epic EHR data, optimizing data storage and retrieval for reporting and analysis.
Built and maintained data extraction processes from Epic Systems, including Clarity, Caboodle, Chronicles, and other Epic modules, ensuring data accuracy and consistency.
Automated ETL workflows using tools such as Informatica, Talend, or custom scripts, streamlining data processing from Epic sources to the data warehouse.
Implemented data quality checks and validation processes to ensure the accuracy and integrity of clinical and operational data from Epic Systems.
Designed and developed Epic-specific reports and dashboards for clinical and administrative teams using BI tools like Tableau, Power BI, or Cognos.
Tuned ETL processes and data warehouse structures to enhance query performance, reducing report generation time and improving user experience.
Implemented data governance policies, including data lineage, data dictionary, and data access controls, to maintain data consistency and ensure compliance with healthcare regulations.
Leveraged advanced analytics and machine learning techniques to extract insights from Epic data, aiding in clinical decision support, patient outcomes analysis, and operational improvement.
Ensured the security and privacy of patient data by implementing robust data encryption, access controls, and compliance with HIPAA regulations.
Worked closely with healthcare professionals and clinicians to understand their reporting and analytics needs, translating them into actionable data solutions.
Successfully managed data migration and transformation during Epic EHR system upgrades, ensuring continuity of data access and reporting capabilities.
Created comprehensive documentation and conducted training sessions for end-users and IT staff on the use of Epic data and BI tools.
Provided technical support and troubleshooting for Epic-related data issues and assisted in problem resolution, ensuring minimal disruptions to clinical operations.
Implemented data transformation processes to standardize and cleanse Epic data, making it ready for analysis and reporting within the Foundry environment.
Successfully managed data migration and transformation during upgrades to Epic EHR and Foundry data platform, maintaining data accessibility and reporting capabilities.



Telstra, Melbourne, Australia Dec 2012 Jan 2015
Sr Big Data Advance Analytics Consultant

Responsibilities:
Worked collaboratively with MapR vendor and client to manage and build out of large data clusters.
Helped design big data clusters and administered them.
Worked both independently and as an integral part of the development team.
Communicated all issues and participated in weekly strategy meetings.
Administered back-end services and databases in the virtual environment.
Did several benchmark tests on Hadoop sql engines (Hive, Spark-sql, Impala) and on different data formats Avro, sequence, Parquet using different compression codecs like Gzip, snappy etc.
Worked on sentiment analysis and structured content programs for creating text analytics app.
Created and Implemented applications on Oracle Big Data Discovery for Data visualization, Dashboard and Reports.
Collected data from different databases (i.e. Oracle, My Sql) to Hadoop. Used CA Workload Automation for workflow scheduling and monitoring. .
Worked on Designing and Developing ETL Workflows using Java for processing data in MapRFS/Hbase using Oozie.
Experienced in managing and reviewing Hadoop log files. Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Developed Sqoop scripts to import export data from relational sources Teradata and handled incremental loading on the customer, transaction data by date.
Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts. Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
Responsible for analysing and cleansing raw data by performing Hive queries and running Pig scripts on data. Created Hive tables, loaded data and wrote Hive queries that run within the map.

Environment: MapR eco system, ODI, Oracle Endeca, Oracle BigData Discovery, CA workload automation

Origin, Melbourne, Australia Jan 2011 Oct 2011
Java Developer (Contract)

Responsibilities:
Designed and developed Web Services using Java/J2EE in WebLogic environment. Developed web pages using Java Servlet, JSP, CSS, Java Script, DHTML, HTML5, and HTML. Added extensive Struts validation.
Involve in the Analysis, Design, and Development and testing of business requirements.
Developed business logic in JAVA/J2EE technology.
Implemented business logic and generated WSDL for those web services using SOAP.
Worked on Developing JSP pages
Implemented Struts Framework
Developed Business Logic using Java/J2EE
Modified Stored Procedures in MYSQL Database.
Developed the application using Spring Web MVC framework.
Worked with Spring Configuration files to add new content to the website.
Worked on the Spring DAO module and ORM using Hibernate. Used Hibernate Template and HibernateDaoSupport for Spring-Hibernate Communication.
Configured Association Mappings such as one-one and one-many in Hibernate
Worked with JavaScript calls as the Search is triggered through JS calls when a Search key is entered in the Search window
Worked on analyzing other Search engines to make use of best practices.
Collaborated with the Business team to fix defects.
Worked on XML, XSL and XHTML files.
Interacted with project management to understand, learn and to perform analysis of the Search Techniques.
Used Ivy for dependency management.
Keywords: artificial intelligence machine learning javascript business intelligence sthree database information technology golang hewlett packard microsoft California Michigan

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];3931
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: