Home

Akhila - Cloud Engineer
[email protected]
Location: Austin, Texas, USA
Relocation: Yes
Visa: GC
Over 9+ years of experience in designing, developing, testing, managing Bigdata, Java, Python, Multi-cloud, AI & ML, Data Migration, Cloud Based applications using the Hadoop, Spark, PySpark, AWS, GCP, Azure in Banking, Healthcare, E-commerce, and Transportation Domains.
Solid understanding of Object-Oriented Programming (OOP) concepts. Familiar with Software Development Life Cycle (SDLC) and extensive experience with Agile and SCRUM.
Worked on the AWS components like Amazon EC2, RDS, S3, Kinesis, Lambda functions, EMR and few other services of AWS.
Worked on the GCP components like VM s, workflows, cloud storage, cloud SQL, DataProc, Big Query, Cloud Functions, and a few other components.
Worked on Azure components like ADF (Azure Data Factory), Azure functions, Blob Storage.
Extensively involved in deployment activities with Devops team for building and deployment related issues
Good knowledge of Hadoop MRV1 and MRV2 (YARN) Architecture.
Experienced in Big data solutions and Hadoop Ecosystem related technologies. Well versed with Big Data solution planning, designing, development and POC s.
Experience in working with various Hadoop infrastructures such as Hive, Sqoop, HBase, Yarn UI to check job status, HUE Manager to submit Hadoop / Spark scripts.
Strong experience building real time streaming pipelines using Kafka and Spark Structured streaming.
Experience in transferring data from RDBMS to HDFS and HIVE table using PySpark.
Experience in building and orchestrating multiple Data pipelines, end-to-end ETL and ELT Processes for Data
ingestion/ Migration and transformation in GCP by coordinating the tasks among the team.
Hands on Experience in building data pipelines using Python/ PySpark / HiveQL / Presto / Big-Query and building python DAG in Apache Airflow.
Experience in working with Hadoop clusters using Amazon EMR and Cloudera Environment.
Used Zepplin with Spark for Visualization purpose.
Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
Experience in working with Data modeling (Dimensional & Relational) concepts like Star - Schema Modeling, Snowflake Schema Modeling, and SCD (Slowly changing Dimension)
Working Experience on Different visualization tools like Tableau and Kibana.
Good Knowledge in using different libraries related to Data Science and Machine Learning including Scikit-learn, NumPy, SciPy, Matplotlib, Pandas, Seaborn, NLTK, Statistical Models
Experience with database client tools such as MySQL workbench, SQL Developer, SQL Yog.
Technical expertise in extracting data and working with data from multiple Relational database sources such as Oracle, MS SQL Server, MySQL
Working experience on No-SQL databases like HBase, CASSANDRA, MongoDB and Elastic Search.
Working experience on various file formats including Parquet files, Avro files, JSON files, delimited text files, clickstream log files, Apache log files, XML files, CSV, ORC and others.
Expertise in developing and deploying applications using Tomcat and JBOSS.
Used Log4J, ANT and Maven to increase productivity and software quality.
Experience in using various Version Control tools like GIT, CVS and SVN.
Good analytical and communication skills and ability to work independently with minimal supervision and perform as part of a team.
Expertise in working with different java frameworks like Spring, Spring Boot, Hibernate, JPA and implementing various Java and J2EE design patterns.
Working experience in using different IDE s like PyCharm, I python notebook, google Colab notebooks, Spyder for developing python and PySpark related applications.
Having experience in implementing RESTful API s, API life cycle management and consuming RESTful services using spring-boot and hibernate.
Good understanding on security concepts and familiar in working with DevOps toolset (Docker, Kubernetes, ansible, Nagios) and practices and worked closely with dev-ops team for deployment and cluster scalability operations.
Highly motivated team player with analytical, organizational, and technical skills, and unique ability to adapt quickly to challenges and changing environments.
Involve in Support activities like Defect Triaging, Deployments to higher environments, Bug fixing and Production Support.


TECHNICAL SKILLS:
Programing languages C, C++, Java, Python, Shell Scripting, SQL, PL/SQL, Hive QL,
Big Data Distributions, Reporting Tools & Eco Systems Cloudera, Hortonworks, Apache Hadoop, HDFS, Map Reduce, YARN, Spark, PySpark, Power BI, Tableau
Scheduling & Monitoring Tools Zookeeper, Oozie, Cloudera Manager, Apache Airflow
GCP Stack Workflows, Big Query, GCP Cloud Storage, GCP data flow, Composer, Cloud DataProc, Cloud SQL, Cloud Functions, Cloud Pub/Sub,
AWS Stack S3, EMR, RDS, Lambda, Step functions,
Cloud Platforms AWS, GCP, Databricks
Messaging Queue / Data Services Hive, Sqoop, Flume, Kafka, Event Hub, Rabbit MQ.
Relational & No-SQL Databases Oracle, MySQL, MS SQL, Cassandra, HBase, Mongo DB.
IDE / Developmental Tools PyCharm, Jupiter Notebook, Spyder, Google Colab, Eclipse, STS, Swagger, SQL YOG, MySQL Workbench, Sonar Cube, Presto
Java & J2EE Technologies & Web Servers Core Java, Servlets, JSP, JDBC, Spring, Spring-Boot, Hibernate, Tomcat, Jboss
Other Software Tools ELK Stack, Postman, TOAD, SQL developer tool, ServiceNow, Jira.
Version Control & Build Tools CVS, SVN & Bit Bucket (GIT), Maven, Jenkins (DevOps), Ant, Gradle

WORK HISTORY
Sr. Data Engineer,
J&J
11/2022 - Current
Developed strategy for cloud migration and implementation of best practices using AWS services like database migration service and AWS server migration service from On-Premises to the cloud.
Responsible for Setup and building AWS infrastructure using resources VPC, EC2, S3, Dynamo DB, IAM, EBS, Route53, SNS, SES, SQS, Cloud Watch, Cloud Trail, Security Group, Auto scaling, and RDS using Cloud Formation templates.
Hands on experience on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.
Worked on ETL Migration services by developing and deploying AWS Lambda functions to generate a serverless data pipeline that can be written to Glue Catalog and queried from Athena.
Worked with Spark integrated with AWS Glue jobs and made use of Glue Crawler and catalog components to handle dynamic schema for incremental datasets.
clients include Technology Crossover Ventures, Redpoint Ventures, VantagePoint Venture capital and Jafco Ventures.
Developed Apache Airflow Dags for various data jobs using bash operator and python operator.
Worked with different file formats like CSV, JSON, and Parquet.
Worked with Hive external tables based on data present in s3 bucket that is partitioned by date column according to the order of their arrival.
Used Redpoint Interaction to generate automatic emails to the various customers on daily and weekly basis.
Extending HIVE/PIG core functionality by using custom User Defined Functions (UDF), User Defined Table-Generating Functions (UDTF), and User Defined Aggregating Functions (UDAF) for Hive and Pig.
Utilized Apache Kafka, Apache Spark, HDFS, and Apache Impala to build near real-time data pipelines that get, transform, store, and analyze click stream data to provide a better-personalized user experience.
Created a program in Python to handle PL/SQL functions like cursors and loops, which Snowflake does not support.
Handled ingestion of data from different data sources into HDFS using Sqoop, Flume and perform transformations using Hive.
Used Tableau as a front-end BI tool and Snowflake as a back-end database to design and develop dashboards, workbooks, and complex aggregate calculations.
Written scripts from scratch to create AWS infrastructure using languages such as BASH and Python. Utilized many Jenkins plugins and Jenkins API.
Produced reports and documentation for all testing efforts, results, activities, data, logging, and tracking.
Tracked defects, generated defect reports using developed management tools, and discussed technical issues with the QA team. Used Spark Data Frames API over platforms to perform analytics on Hive data and used Spark Data Frame operations to perform required validations in the data.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Wrote Spark applications for Data Validation, Cleansing, Transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis.
Prepared scripts to automate the ingestion process using PySpark and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake.
Snowflake Cloud Data Warehouse tables, views, secure views, and user defined functions were created.
Data from AWS S3 was extracted and loaded into Snowflake Cloud Data Warehouse as CSV and JSON files.
Used Spark and Python for Data cleaning, pre-processing, and modelling.
Implemented real-time data driven secured REST APIs for data consumption using AWS (Lambda, API Gateway, Route 53, CloudWatch, Kinesis), Swagger, Okta and Snowflake. Develop automation scripts to transfer the data from on premise clusters.
Loaded the files data from ADLS Server to the Google Cloud Platform Buckets and created the Hive Tables for the end users.
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark databricks cluster.

Involved in performance tuning and optimization of long running spark jobs and queries (Hive/SQL) Implemented Real-time streaming of AWS CloudWatch Logs to Splunk using Kinesis Firehose.
Developed using object-oriented methodology a dashboard to monitor all network access points and network performance metrics using Django, Python and JSON.
Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API).

Tech Stack: AWS, Python, Data ingestion, Apache Airflow, Hadoop,Redpoint, Data Applications OLTP, RDM,Hive, HDFS, Pig, Zookeeper, Oozie, Sqoop,RDM,RPI Spark, PySpark, Impala, Kafka, Flume, MapReduce, Cassandra, Terraform, Snowflake, Netezza, Oracle, SQL, XML, JSON, CSV, Ansible, Jenkins, Tableau, Agile, Scrum, Databricks.
Sr. Data Engineer,
Lab Corp, Worcester, MA
11/2021 - 11/2022
Developed strategies for handling large datasets using partitions, Spark SQL, broadcast joins, and performance tuning.
Deployed and monitored scalable infrastructure on Amazon web services (AWS) and configuration management instances.
Implement new tools such as Kubernetes with Docker to assist with auto-scaling and continuous integration (CI) and upload a Docker image to the registry so the service is deployable through Kubernetes.
Used Redpoint Interaction to generate automatic emails to the various customers on daily and weekly basis.

Used Restful Web Services API to connect with the MapR table. Developed connection to Database through restful web services API.
Involved and worked on Python Open stack APIs and used several Python libraries such as Python, NumPy, and matplotlib.
Used Informatica Designer to create complex mappings using different transformations to move data to a Data Warehouse.
Managed mission-critical Hadoop cluster and Kafka at the production scale, especially Cloudera distribution.
Worked on migration activity of on-premises data warehouse to GCP Big Query.
Built and implemented performant data pipelines using Apache Spark on AWS EMR. Performed maintenance of data integration programs into Hadoop and RDBMS environments from structured and semi-structured data source systems.
Worked on analyzing Hadoop cluster and big data analytical and processing tools, including Sqoop, Hive, Spark, Kafka, Pig, and PySpark.
Managed servers on the Amazon Web Services (AWS) platform using Ansible configuration management tools. Created instances in AWS and migrated data to AWS from Data Center.
Worked on creating a few Tableau dashboard reports, Heat map charts and supported numerous dashboards, pie charts, and heat map charts built on the Teradata database.
Written multiple MapReduce programs for data extraction, transformation, and aggregation from numerous file formats, including XML, JSON, CSV & other compressed file formats.
Used the Spark - Cassandra Connector to load data to and from Cassandra.
Developed performance tuning on existing Hive queries and UDFs to analyze the data.
Used Terraform to successfully deploy Azure Infrastructure using Terraform via an Azure DevOps Pipeline.
Providing daily reports to the Development Manager and participating in the design and development phases. Utilized Agile Methodology and SCRUM Process. Deployed LAMP based applications in AWS environment, including provisioning MySQL-RDS and establish connectivity between EC2 instance and MySQL-RDS via security groups.
Automated cloud deployments using AWS Cloud Formation Templates.
Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups, Optimized volumes and EC2 instances.
Virtualized the servers using Docker for the test environments and dev environments needs, also configuration automation using Docker containers.
Creating AWS users and groups through IAM and use permissions to allow and deny their access to AWS resources.
Tech Stack: AWS, GCP, Python, R, Open stack, Informatica, Hadoop, Red point Hive, HDFS, Pig, Zookeeper, Sqoop, PySpark, Kafka, Flume, MapReduce, Cassandra, Terraform, Teradata Spark, Oracle, SQL, XML, JSON, CSV, Tableau.
Data Engineer,
Client: Cerner Corporation, North Kansas City, MO
01/2020- 08/2021
Installed and maintained web servers Tomcat and Apache HTTP in UNIX.
Wrote ETL Scripts using PL/SQL to write the data into Hadoop and maintain the ETL Pipeline.
Designed and integrated a solution around Alerting using AWS SNS and integrated it with Spark and Kinesis.
Created stored procedures and SQL queries to pull data into the Power pivot model. Worked on Agile as well as waterfall methodology.
Developed and Tested dashboard and features using CSS, JavaScript, Django, and Bootstrap.
Wrote Stored Procedures in SQL and Scripts in Python for data loading.
Built an Interface between Django and Salesforce and Django with RESTAPI.
Collaborated with the team to build data pipelines and UI for the website s data analysis modules using AWS and GIT.
Created various types of data visualizations using R and Tableau.
Involved in developing the REST Web services to expose the business methods to external services in the project.
Design dimensional model, data lake architecture, and data vault on Snowflake.
Used agile scrum software JIRA to report progress on software projects.
Wrote Python modules to view and connect the Apache Cassandra instance.
Worked on automating builds using Maven with Jenkins/Hudson for CI/CD process.
Involved in debugging and troubleshooting issues and fixed many bugs in two main applications, the primary source of data for customers and the internal customer service team. Used Kafka for stream processing.
Tech Stack: Python, SQL, ETL, Hadoop, RPI, AWS, CSS, JavaScript, REST, GIT, JIRA, Agile, Jenkins, Maven, Snowflake
Big Data Engineer,
Nimble Lifetech Pvt. Ltd, India
05/2018 - 11/2019
Responsible for gathering report requirements, collecting data using SQL Script, and creating Power BI Reports.
Analyzed and validated data with business owners to identify trends and problem areas and propose improvements.
Wrote various SQL scripts to create datasets for reporting.
Develop a bigdata web application using Agile methodology in Scala as Scala has the capability of combining functional and object-oriented programming.
Use Spark to process the data before ingesting the data into the HBase. Both Batch and real-time spark jobs were created using Scala.
Expertise in creating Excel (pivot table, charts, and v - lookups) on-demand and ad-hoc reports.
Install and set up Power BI Gateway inside a VM for data refresh.
Performed data query, extraction, compilation, and reporting tasks.
Use SBT to build the Scala project.
Schedule SQL Jobs using SQL server agent and Report jobs using Power BI Gateway.
Develop multiple Power BI dashboards for Leadership team decision-making.
Tech Stack: Java, Spring MVC, Hibernate, JMS, HTML5, CSS, Eclipse, Oracle.
Hadoop Developer,
FinTech Software Solutions Pvt. Ltd, IND
01/2017 - 04/2018
Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
Responsible for development of real time data ingestion from Kafka to Mongo DB
Involved in requirement gathering for new event definition.
Pig script development for loading real time data to Hadoop hive.
Involved in developing Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Developed a metadata driven framework for generating batch events from Hive and publish to Kafka.
Worked on NiFi data pipeline to process large set of data and configured Lookup s for Data Validation and integrity.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD.
Involved in converting Hive/SQL queries into transformations using Python.
Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
Worked on HDP 2.2 and Enabled Name Node HA.
Deployed cluster with Hortonworks Ambari.
Involved in loading data to HDFS from various sources.
Responsible for on-boarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
Implemented Kerberos in cluster to authenticate users.
Implemented high availability for Name nodes on the Hadoop cluster.
Done with Backup of Name node Metadata through NFS.
Helped in setting up Rack topology in the cluster.
Installed Oozie workflow engine to run multiple Hive and pig jobs.
Maintained, audited, and built new clusters for testing purposes using Ambari.
Extensively involved in Installation and configuration of Cloudera distribution Hadoop Name Node, Job Tracker, Task Trackers and Data Nodes.
Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
Managed and reviewed Hadoop Log files.
Load log data into HDFS using Flume. Worked extensively in creating Map Reduce jobs to
power data for search and aggregation.

Environment: HDFS, Flume, Hbase, Pig, Oozie, Kerberos, LDAP, YARN, Hortonworks, Cloudera and Ambari
Big Data Developer,
Client: Karvy Data Management Services Ltd, India
01/2016 - 12/2016
Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production.
Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
Established custom Map Reduces programs to analyze data and used Pig Latin to clean unwanted data.
Installed and configured Hive and wrote Hive UDF to successfully implement business requirements.
Involved in creating hive tables, loading data into tables, and writing hive queries that are running in Map Reduce way.
Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc. in Hive tables.
Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
Experience in working with Spark SQL for processing data in the Hive tables.
Developing Scripts and Tidal Jobs to schedule a bundle (group of coordinators), which consists of various Hadoop Programs using Oozie.
Involved in writing test cases, implementing unit test cases.
Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
Hands on experience with Accessing and performing CURD operations against HBase data using Java API.
Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
Developed spark applications using Scala for easy Hadoop transitions.
Extensively used Hive queries to query data according to the business requirement.
Used Pig for analysis of large data sets and brought data back to HBase by Pig
Environment: Hadoop, HDFS, Map Reduce, Hive, Flume, Sqoop, PIG, MySQL and Ubuntu, Zookeeper, CDH3/4 Distribution, Java Eclipse, Oracle, Shell Scripting.

Data Analyst,
Client: Aceline Tech Solutions Pvt Ltd, India.
05/2014 - 12/2015
Developed end to end enterprise Applications using Spring MVC, REST and JDBC Template Modules.
Written well designed testable, efficient java code.
Understanding and analyzing complex issues and addressing challenges arising during the software development process, both conceptually and technically.
Implemented best practices of Automated Build, Test and Deployment.
Developed design patterns, data structures and algorithms based on project need.
Worked on multiple tools such as Toad, Eclipse, SVN, Apache and Tomcat.
Deployed models via APIs into applications or workflows.
Worked on User Interface technologies like HTML5, CSS/SCSS.
Wrote Stored procedure and SQL queries based on project need.
Deployed built jar into the application server.
Created Automated Unit Tests using Flexible/Open-Source Frameworks
Developed Multi-threaded and Transaction Handling code (JMS, Database).
Environment: Java, Spring MVC, Hibernate, JMS, HTML5, CSS/SCSS, Junit, Eclipse, and Oracle.
Representing data using visualizations such using Charts.
Development of backend (Oracle) to fulfill the UI requirement.
Used Trend Lines, Reference Lines, and statistical techniques to describe the data.
Used Measure name and Measure Value fields to create visualizations with multiple measures and dimensions.
Responsible for dashboard design, look and feel and development.
Used parameters and input controls to give users control over certain values.
Combined data sources by joining multiple tables and using data blending.
Understanding the functional and technical specification.
Deep experience with the design and development of Tableau visualization solutions.
Preparing Dashboards using calculations, parameters in Tableau.
Keywords: cprogramm cplusplus continuous integration continuous deployment quality analyst artificial intelligence machine learning user interface message queue business intelligence sthree database active directory rlang information technology microsoft procedural language Massachusetts Missouri

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1799
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: