Divya Nalla - Data Engineer |
[email protected] |
Location: Burbank, Illinois, USA |
Relocation: |
Visa: H1B |
Divya N
Herndon, VA 20170 | +1 5713531573 | [email protected] Summary 9+ years of professional IT experience in Data Integration, Data Warehouse/Data Mart projects using ETL technologies Python, AWS, Pyspark, Scala, Big data, GCP, BigQuery, Airflow, Data Proc, Big Table, Azure, Data Factory, ETL, Informatica. Extensive Experience on Cloud components - AWS, Lake formation, CloudWatch, Event Bridge, Glue ETL, EMR, S3, SQS, SNA, Glue Catalog, Athena, Quick Sight, Kinesis, MSK, AMQ, Lambda, Step Functions, Redshift, MongoDB, RDS. Solid experience on GCP, Dataproc, GCS, and Cloud functions, BigQuery, Big Table, Azure Data Factory DataBricks. Strong experience on RESTful APIs using Lambda functions, Python, and API Gateway. Experience using Python, PySpark Data Frames, Datasets, RDDs to process and integrate data from heterogeneous sources. Experience using jobs scheduling and or with Airflow and redshift connections. Strong Experience building data pipelines using Databricks, Python. Experience in building pipelines using data formats such as: CSV, XML, Parquet, Iceberg, AVRO, JSON, and JSON-LD. Experience using security concepts such as: Lake Formation, IAM, Service roles, Encryption, KMS, Secrets Manager. Used AWS Glue to build and run ETL jobs in AWS. Loaded Redshift warehouse by using AWS glue. Experience in analyzing the data collected from Kinesis streams. Used Spark Streaming API for reading the batches of data from Kinesis Stream. Have good understanding of networking concepts VPCs, Subnets, NAT, Route Tables and containers EKS, EC2. Experience in deployment of Linux Scripting/Pyspark scripts using Jenkins, Git, CI/CD, JIRA, TDD. Expertise on ETL Framework design & implementation, Real-Time Data Integration and used Spark Streaming to collect the data from different systems to reporting system. Experience in using Boto3 scripting and coding Lambda functions. Experienced on data integration from various systems using Sqoop. Performed complex querying analytics with Amazon Redshift. Have good knowledge on API Management. Experience on AWS EMR with spark and Java. Worked on creating AWS components with Terraform and migrating the components to portals. Expertise in Amazon RDS (Oracle, Postgres) and Non-Relational Database - Amazon DynamoDB. Experience of writing complex SQL's provisioning the AWS scripts, Glue Jobs, Lamda's and other AWS modules with Terraform and Git. Expertise on Snowflake with Snowpipe, SnowSQL, Time travel, zero cloning and Snowpark. Effectively worked on SOAP, RESTful API, and Microservices integration with AWS. Developed end-to-end ETL using technologies like INFORMATICA, UNIX, Java and Autosys. Strong experience on HBase NoSQL database and DynamoDB. Experience in using Kafka for Real-time data processing. Strong experience on documenting Low Level Design, Unit test plan, Unit test cases and Deployment Documents Auto flow diagrams, Auto flow templates, UTC, UTR, LDMS and CII. Have Strong experience in agile methodology Involved in sprint planning, stand-ups, retrospective meetings. Interacting with Client's for their issues and Business requirements. Extensive experience on Hadoop ecosystem i.e., HDFS, Spark, Java, Python, Kafka, Hive, Sqoop, Pig, Oozie, RDBMS and Map Reduce. Worked on Oozie workflows for triggering a spark application, which reads the data from HDFS or S3 location for processing Loaded the data set into HDFS, Hive and processed the batch data using Hive, Pig and automated the workflow using Oozie. Skills Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Oozie, Kafka, Cassandra, Nifi. Cloud: AWS S3, EC2, Glue, Lambda, SQS, SNS, Redshift, Kinesis, Athena, Azure Data Factory (ADF). Script/Languages: Python, UNIX Shell Scripting, SQL/PLSQL, T-SQL, VBA, Python, Spark, Java ETL/BI: Informatica PowerCenter. Database: DB2, Oracle, SQL Server, Snowflake, PostgreSQL, Teradata, NoSQL. Data Modeling: Dimensional and Relational Modeling, Star Schema, Snowflake. Mainframes: COBOL, JCL, VSAM, Tandem Screen COBOL. Scheduling Tools: Autosys, Control-M, and Airflow. Other Tools: GitHub, ServiceNow, Jira, Intellij, Confluence, Postman. Experience SR. DATA ENGINEER/AWS/PYTHON | 07/2022 - Current Deloitte - McLean, Virginia Conduct stakeholder meetings to gather requirements and performed in-depth data analysis to uncover new business insights. Develop RESTful APIs utilizing Lambda functions, Python, and API Gateway to facilitate seamless data communication. Utilize Sqoop to efficiently extract data from relational databases and load it into S3 Lake Formation. Employee Terraform to provision AWS modules and scripts, including Glue Jobs and Lambda functions, streamlining infrastructure deployment. Manage version control and build processes using Git repositories, Jenkins, and Airflow, ensuring efficient deployment. Leverage Python programming within Spark for data collection and analysis, utilizing libraries such as NumPy, pandas, and SciPy for sophisticated data operations. Utilize SQS and SNS for streaming processes and email notifications, enhancing real-time data handling. Execute ETL operations using AWS Glue, creating, and monitoring ETL workflows and triggers. Develop Lambdas for various business cases and events, ensuring seamless event-driven architecture. Build pipelines in AWS EMR for data loading into Redshift and deployed PySpark compute artifacts. Schedule and execute compute tasks in Airflow, ensuring efficient task management. Orchestrat the migration of S3 data into AWS Redshift DB tables and Snowflake tables and views. Implement Snowflake solutions utilizing Snowpipe, SnowSQL, Time travel, zero cloning, and Snowpark, optimizing data handling. Utilize Impala for processing various file types like Avro and Parquet within different systems. Effectively create and migrated AWS components using Terraform, ensuring infrastructure reliability. Design and executed Spark applications for caching, mapping, reducing by key, and other data processing tasks. Integrate data from diverse systems including AWS, GCS, and Big Data systems for comprehensive analysis. Collaborate with the Data Science team to implement advanced analytical models in Hadoop Cluster over large datasets. Develop Python and Spark applications tailored to specific functional requirements. Design and execute PySpark jobs for ETL/ELT transformations, with a focus on error handling and data enrichment. Facilitate importing and exporting data from various databases and file types into HDFS and S3 using Sqoop and AWS Transfer Family (SFTP). Develop SPARK job programs to handle a variety of file formats including CSV, XML, Parquet, Iceberg, AVRO, JSON, JSON-LD, and Fixed-width files, ensuring comprehensive data processing capabilities. DATA ENGINEER | 09/2019 - 04/2022 Centene Corporation Buffalo - Hyderabad, India Meeting various stakeholders for requirement gathering and data analysis for new business insights. Analysis and feasible study of requirements. Developed the Application using Python and Spark based on the existing functionality. Effectively worked on Git repository. Worked on Databricks/Spark on building bronze, silver, gold layers for data analytics. Effectively worked on Python/PySpark programming for collecting and analyzing the data. Implemented best practices on related error handling logic in Python scripts. Worked on budling pipelines with AWS Glue, executing ETL jobs in AWS. Performed ETL operations with AWS Glue & Lambda. Strong experience on workflows and triggers in AWS Glue. Integrating different sources data to RDS, worked on loading data to RDS with AWS Glue Worked on creating workflows with AWS step functions. Worked on data migrating to Dynamo DB from on premises, worked on all operations on Dynamo DB, helped analytical team for Dynamo DB data. Worked on building pipeline in AWS EMR to load into Redshift. Developed analytic data solutions on AWS with AWS services (Lambda, S3, Glue, Dynamo DB). DATA ENGINEER | 10/2016 - 09/2019 Transplace, Inc. - Hyderabad, India Worked on Creation, Development and Deployment of SSRS reports and Tableau Dashboards. Involved in Technical decisions for Business requirement, Interaction with Business Analysts, Client team, Development team. Worked on creation/review of functional requirement specifications and supporting documents for business systems. Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, HBase database and Sqoop. Written multiple Map Reduce programs to extract data for extraction, transformation, and aggregation from more than 20 sources having multiple file formats including XML, JSON, CSV &other compressed file formats. Implemented Spark Core in Scala to process data in memory. Implemented Lambda to configure Dynamo DB Auto scaling feature and implemented Data Access Layer to access AWS Dynamo DB data. Performed job functions using Spark APIs in Scala for real time analysis and for fast querying purposes. Involved in creating Spark applications in Scala using cache, map, reduceByKey etc. functions to process data. Created Oozie workflows for Hadoop based jobs including Sqoop, Hive and Pig. Created Hive External tables and loaded the data into tables and query data using HQL. Migrating AWS S3 files data to snowflake tables and views. Migrating AWS Redshift DB tables to S3, then moving the data to snowflake. Worked on Kafka snowflake integration. Worked on creating Delta Lake, performed ETL operations with Apache Spark. After processing and refining the data using Delta Lake, loaded the transformed data into Snowflake's data warehouse. Loaded Redshift warehouse by using AWS glue. Performed ETL operations with AWS Glue. BIG DATA DEVELOPER & DATA ENGINEER/PYTHON | 09/2015 - 10/2016 Eastman - Hyderabad, India Developed the Application using Scala and Spark based on the existing functionality. Analysis and feasible study of requirements Run Proof of Concepts to validate the feasibility of the design. Importing and exporting data from different databases like Mainframes, Oracle, MySQL, different type files into HDFS and S3 using Sqoop and AWS Transfer Family (SFTP) Written PySpark scripts to load data from different type of formats. All the transformation related error handling logic is written in the Python scripts. Written Hive queries and UDF's for few of the business requirements Involved in writing Shell scripts to run the PySpark script file and Error handling in the Shell script. Worked on RDD's and DF in transformation process. Created Staging tables in Hive and lookup tables are created in HBase. Established Integration between Hive and HBase Performed transformations on data as per the requirement. Education JNTU - Bachelor of Technology Electronics and Communications Engineering, 02/2015 Keywords: continuous integration continuous deployment business intelligence sthree database information technology Virginia |