Home

lakshmi priya - AWS DATA ENGINEER
[email protected]
Location: Nashville, Tennessee, USA
Relocation: YES
Visa: H4
Lakshmipriya Rajendran
201-484-0055

PROFESSIONAL SUMMARY:
AWS Certified Data Engineer 11 years of experience in Amazon web services (AWS),AWS Elastic beanstalk, AWS DynamoDB, AWS lambda, S3, SQS, SNS, AWS Step function, EMR with good technical expertise, business experience, and communication skills to drive high - impact business outcomes.
Extensively worked on AWS Cloud services like EC2, VPC, IAM, RDS, ELB, EMR, EB, Auto-scaling, S3, Cloud Front, Glacier, Elastic Beanstalk, Lambda, Elastic Cache, Route53, OpsWorks, Cloud Watch, Cloud Formation, RedShift, DynamoDB, SNS, SQS, SES, Kinesis, Firehose, Cognito IAM.
Hands on experience on Google cloud platform (GCP) in all the bigdata products Bigquery, Cloud Data Proc, Cloud Dataflow, Google cloud storage, composer (Airflow as a service),Dataprep, Datafusion, Data catalog, cloud PubSub, cloud Functions and cloud provisioning tools such as Terraform and CloudFormation.
Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, Sqoop, Oozie, Flume, Drill and spark for data storage and analysis, Scala, Python, SQL and NoSQL databases such as HBase, MongoDB, Cassandra.
Experience in installing, configuring, administrating, and managing Hadoop cluster and services using Cloudera manager and also assisted Deployment team in setting up Hadoop cluster and services.
Good experience in Oozie Framework and Automating daily import jobs.
Highly experienced in importing and exporting data between HDFS and Relational
Database Management systems using Sqoop.
Collected logs data from various sources and integrated into HDFS using Flume.
Worked with various file formats such as CSV, JSON, XML, ORC, Avro, and Parquet.
Expertise in writing DDLs and DMLs scripts in SQL and HQL for analytics applications in RDBMS and Hive.
Expertise in working with Hive optimization techniques like Partitioning, Bucketing, vectorizations and Map side-joins, Bucket-Map Join, skew joins, and creating Indexes.
Experience in fetching data into AWS Data Lake from various databases like MYSQL, Oracle, DB2 and SQL Server.
Experience in designing stunning visualizations and KPI using Tableau software and publishing and presenting dashboards, stories on web and desktop platforms.
Experience in ETL process involves data integration and reporting.
Foundation knowledge in data stream processing using Kafka by analyzing an application logs.
Experience with ETL architecture in AWS using Pyspark on EMR workflow.
Expertise in technical proficiency in Designing, Data Modeling for data warehouse/Business Intelligence Applications
Good experience in running the security and vulnerabilities scan in Gitlab using Veracode, CheckMarx and SonarQube.
Good experience in Gitlab CI/CD pipeline for DEVOPS, DEVSECOPS and GITOPS strategy.

EDUCATION:
Master s in software engineering from Vellore Institute of Technology.

CERTIFICATION:
AWS certified cloud Practitioner


TECHNICAL SKILL:
Big Data Technologies HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Storm, Flume, Spark, Apache Kafka, Zookeeper, Ambari, Oozie, MongoDB, Cassandra, Mahout, Puppet, Avro, Parquet, Snappy, Falcon.
NO SQL Databases Postgres, HBase, Cassandra, MongoDB, Amazon DynamoDB, Redis
Hadoop Distributions Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR, and Apache.
Languages Scala, Python, R, XML, XHTML, HTML, AJAX, CSS, SQL, PL/SQL, HiveQL, Unix, Shell Scripting
Source Code Control GitHub, CVS, SVN, ClearCase
Cloud Computing Tools Amazon AWS, (S3, Elastic beanstalk, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch, CloudFront), Microsoft Azure, GCP
GCP Cloud Platform Big Query, Cloud Data Proc, GCS Bucket, G-Cloud Function, Apache Beam, Cloud Shell, GSUTIL, BQ Command Line, Cloud Data Flow
Databases Teradata Snowflake, Microsoft SQL Server, MySQL, DB2
DB languages MySQL, PL/SQL, PostgreSQL & Oracle
Build Tools Jenkins, Maven, Ant, Log4j
Business Intelligence Tools Tableau, Power BI
Development Tools Eclipse, IntelliJ, Microsoft SQL Studio, Toad, NetBeans
ETL Tools Talend, Pentaho, Informatica, Ab Initio, SSIS
Development Methodologies Agile, Scrum, Waterfall, V model, Spiral, UML


PROFESSIONAL EXPERIENCE:

McAfee, Remote
Sr. AWS Data Engineer Dec 2023 Current
Responsibilities:
Designed and implemented microservices architecture using Elastic Bean Stalk.
Developed serverless components using AWS Lambda, Dynamo DB and SQS for event-driven services.
Created CI/CD pipelines for automated deployment and scaling using GitHub and GitHub Action.
Developed ETL using py-spark in Databricks for data extraction, transformation and aggregation from multiple file formats and data sources for analyzing and transforming the data to uncover insights into the customer usage patterns.
Developed Django rest framework to implement the event driven architecture for seamless data flow from elastic beanstalk Api to data bricks for processing.
Created medallion architecture for subscription data. Analyzed Alteryx tool and converted it into Databricks Notebooks
Ingested raw data in bronze layer in databricks unity catalog and developed Pyspark module to convert the data from Bronze layer to Silver layer.
Developed SQL module to convert the Silver layer data to Gold and developed workflows for the entire process.
Developed Terraform module to deploy the workflows in Databricks environment using Github Actions.
Automated Terraform deployment of workflows in dev/stg/prod environment based on PR tiltle in Github.
Migrated Notebooks from one Databricks workspace to another using Terraform Databricks Provider with Github.

Verizon, Remote Dec 2021 Nov 2023
Sr. AWS Data Engineer
Responsibilities:
Migrated existing on-premises applications to AWS and used AWS services like EC2, S3 for data sets processing and storage and maintained Hadoop cluster on AWS EMR.
Worked in building ETL pipeline for data ingestion, data transformation, data validation on cloud service AWS.
Developed the Pyspark code for AWS Glue jobs and for EMR.
Migration of Informatica Mappings/Sessions/Workflows from Dev to test and test to stage and stage to prod environments.
Created functions and assigned roles in AWS Lambda to run python scripts.
Created, modified and executed DDL in table AWS Redshift and snowflake tables to load data.
Wrote a program to download a SQL dump from there equipment maintenance site and then loaded it in GCS bucket. On the other side loaded this SQL dump from GCS bucket to MYSQL (hosted in Google cloud SQL) and load the data from MYSQL to bigquery using python, scala, spark and Dataproc.
Processed and loaded bound and unbound data from Google pub/sub to Bigquery using cloud Dataflow with python.
Developed and deployed the outcome using spark and Scala code in Hadoop cluster running on GCP.
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
Created Data ingestion framework in Snowflake for both Batch and Real-time Data from different file formats (XML, JSON, Avro) using Snowflake Stage and Snowflake Data Pipeline.
Performed various transformations like sort, join, aggregations, filter in-order to retrieve various datasets using Apache spark.
Extracted appropriate features from datasets in-order to handle bad, null, partial records using spark SQL.
Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra.
Created ETL pipelines using Python Scripts Extract from SQL Server using pyodbc package and performed transformations using some lambda functions, and Custom functions.
Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
Launched EC2 instances with various AMI's. Integrated EC2 instances with various AWS tools by using IAM roles.
Developed Apache SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming, Spark SQL and Spark-shell (Scala).
Actively involved in scrum meetings and followed Agile Methodology for implementation.
Environment: Snowflake, Snow SQL, SQL Assistant, HDFS, Sqoop, Flume, Apache Spark, Spark SQL, Kafka, Scala, Hive, Map Reduce, HBase, Agile Methods, Linux, Cassandra, PySpark, Linux, AWS S3, AWS EMR, AWS EC2, AWS Glue, AWS Redshift, AWS Lambda, AWS Cloud Watch.



Anthem Blue Cross Blue Shield, Chicago IL Jul 2019 - Nov 2021
Sr. Data Engineer
Responsibilities:
Worked on developing Kafka producer and consumers, Cassandra clients and Pyspark with components HDFS, Hive.
Designed, architected and implemented scalable cloud-based web applications using AWS.
Performed the migration of Hive and MapReduce Jobs from on - premises MapR to AWS cloud using EMR.
Worked on AWS Data pipeline to configure data loads from s3 into Redshift.
Gave release support to the new Hadoop projects coming up into production and performing the data extract in Hadoop platform.
Developed Spark programs using Scala to compare the performance of Spark with Hive and SparkSQL.
Used rest API with Python to ingest data from and some other site to Bigquery.
Build a program with python and Apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Bigquery tables.
Migrated previously written cron jobs to airflow in GCP.
Created Spark Streaming jobs using Python to read messages from Kafka & download JSON files from AWS S3 buckets.
Created AWS Lambda functions using python for deployment management in AWS and designed, investigated and implemented public facing websites on Amazon web services and integrated it with other applications infrastructure.
Experience with Snowflake cloud data warehouse and AWS s3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.
Used Jupyter notebooks and spark-shell to develop, test and analyze Spark jobs before scheduling customized Spark jobs.
Modeled Hive partitions extensively for data separation and faster data processing and followed Hive best practices for tuning.
Extracted and exported data from DB2 into AWS for analysis, visualization, and report generation.
Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operations.
Involved in various NOSQL databases like HBase, Cassandra in implementing and integration.
Developed dashboards and visualizations to help business users analyze data as well as providing data insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.
Environment: Sqoop, kafka, Spark, Scala, Pig, Oozie, Zookeeper, Pyspark, Agile Methods, Linux, MYSQL, HDFS, Hive, HBase, Map Reduce, Cassandra, AWS S3, AWS EMR, AWS Glue, AWS Lamda, AWS Redshift, Snowflake, Cloudera

The Walt Disney Company, Los Angeles, CA Sep 2018 - Jun 2019
Spark/Big Data Engineer
Responsibilities:
Designed a data workflow model to create a data lake in the Hadoop ecosystem so that reporting tools like Tableau can plugin to generate the necessary reports.
Created Source to Target Mappings (STM) for the required tables by understanding the business requirements for the reports.
Worked on Snowflake environment to remove redundancy and load real-time data from various data sources into HDFS using Kafka
Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed.
Hive tables were created on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format.
Written multiple MapReduce programs in Java for data extraction, transformation, and aggregation from multiple file-formats including XML, JSON, CSV, and other compressed file formats.
Loading log data directly into HDFS using Flume.
Leveraged AWS S3 as a storage layer for HDFS.
Encoded and decoded JSON objects using PySpark to create and modify the data frames in Apache Spark
Used Bit Bucket as the code repository and frequently used Git commands to clone, push, pull code to name a few from the Git repository
Hadoop Resource manager was used to monitor the jobs that were run on the Hadoop cluster
Used Confluence to store design documents and the STMs.
Meet with business and engineering teams on a regular basis to keep the requirements in sync and deliver on the requirements
Used Jira as an agile tool to keep track of the stories that were worked on using the Agile methodology.
Environment: SPARK, Hive, Pig, Flume Intellij IDE, AWS CLI, AWS EMR, AWS S3, Rest API, shell scripting, Git, Spark, PySpark, SparkSQL

Cognizant, Chennai, India June 2014 Jul 2016
Data Engineer Anthem
Responsibilities:
Build a Data-driven ETL Architecture in AWS using AWS state machine.
Developed ingestion PySpark script to bring data from upstream data source to Amazon S3
Developed Python in AWS Lambda which integrated with Amazon API-Gateway to expose as a serverless Restful API.
Designed and setup enterprise data lake in AWS to provide support to all the on-demand data support for cloud platform.
Import/ingest the large datasets to AWS data lake from different sources like HDFS into spark RDD and created spark job that will perform ingestion from on-prem to AWS S3 using PySpark.
Designed, built and coordinated an automated build & release CI/CD process using Gitlab and using Terraform enterprise to deploy the infrastructure in AWS Cloud.
Design and developed AWS lambda to integrate with on-prem webservice and send the response to SQS and SNS.
Developed a Step function and orchestrated multiple lambdas in a workflow to capture the events and use API-gateway to create serverless API function.
Responsible for provisioning, design, architecture, and support of enterprise cloud across multiple platforms using Terraform.
Environment: AWS, SQL, AWS services (AWS S3, AWS Lambda, Step Function, Api-gateway, SQS, RDS), Terraform, GitLab, Python, EMR.

Cognizant, Chennai, India May 2011 June 2014
Data Analyst Kohls
Responsibilities:
Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using python.
Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
Extracted data using SQL from data sources and performed Exploratory Data Analysis (EDA) and Data Visualizations using Tableau.
Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
Designed data models and data flow diagrams using MS Visio.
Developed, Implemented &Maintained the Conceptual, Logical & Physical Data Models using Erwin for forwarding/Reverse Engineered Databases.
Environment: SQL, Informatica, ODS, OLTP, Oracle 10g, Hive, OLAP, Excel, MS Visio, Hadoop
Keywords: continuous integration continuous deployment business intelligence sthree database rlang information technology microsoft procedural language California Illinois

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];3501
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: