Rajashekar - Data Engineer |
[email protected] |
Location: New Brunswick, New Jersey, USA |
Relocation: NO |
Visa: H1B |
Rajashekhar Reddy
+18485000844 | [email protected] | https://www.linkedin.com/in/rajashekhark8-/ Summary Senior Data Engineer with extensive experience designing and building robust data solutions. Expertise in AWS services and Hadoop-based technologies, utilized to automate data migrations and create efficient ETL processes. Proven ability to orchestrate complex data pipelines with tools like AWS Glue, achieving significant performance improvements. Leverages substantial experience to design tailored data engineering solutions to meet product and user requirements effectively. Skills Data Engineering, AWS, Azure, Hadoop, Apache Spark, Hive, Python, PySpark, AWS Glue, ETL, Data Warehousing, SQL, Unix Shell Scripting, Tableau, Power BI, Agile, Jira, Data Migration, Data Optimization, Data Management, Data Analysis, Big Data Ecosystem, HDFS, Kafka, HBase, Sqoop, AWS Services, Data Streaming, Data Frameworks, Data Storage, Data Transformation, Data Quality, CI/CD, Git, Terraform, DynamoDB, Snowflake, Google BigQuery, RDBMS, PostgreSQL, Oracle, Scala, Oozie, MapReduce, Airflow, Cloud Formation, Cloud Watch, Lambda Functions, JSON, XML, CSV, Data Modeling, Data Profiling, Data Cleaning, Data Enrichment, Data Standardization, Data Catalog, Microservices, Kubernetes, Docker, AWS X-Ray, Looker, Informatica, Ab Initio, PL/SQL, NumPy, Data Lake development, Data Warehouse development, ETL/ELT tools, AWS cloud architecture, Relational databases, Data cleansing, Data validation, Data wrangling, Scripting Work Experience Verizon Basking Ridge, NJ AWS Data Engineer Sep 2023 - Present Designed and developed ETL processes in AWS Glue to migrate and transform data from various external sources like S3 and Parquet/Text Files into Redshift, utilizing Data Lake and Data Warehouse development skills. Implemented data extraction, aggregation, and consolidation with PySpark within AWS Glue, underpinning AWS cloud architecture. Deployed AWS Lambda functions, optimizing data-processing workflows by using both pre-built and custom AWS Lambda libraries. Created efficient data transformations and integrations with Glue Studio, enhancing ETL development and workflow effectiveness. Designed relational database structures in Hive and strategically loaded and processed data through HDFS using Sqoop. Managed and executed data-processing tasks using JSON, CSV, and XML file formats in AWS to improve data storage and retrieval. Generated capabilities of data cleansing and validation by implementing Python-based scripts to ensure metadata integrity and correctness in AWS infrastructure. UBS Weehawken, NJ AWS Data Engineer Jul 2022 - Aug 2023 Collaborated with an Agile team to review business requirements and produce precise source-to-target data mappings, aligning with user needs and expectations. Optimized data processing by employing PySpark and AWS Glue, improving data throughput and addressing complex data validation and transformation challenges. Created comprehensive reports and dashboards using tools including Tableau and PowerBI, leveraging SQL for data analysis to communicate clear insights to stakeholders. Constructed robust data pipelines including ETL frameworks on AWS, handling various transformations and ensuring efficient Big Data processing using EC2, S3, and EMR services. Implemented secure and high-performance data processing with PySpark on managed AWS services for enhanced workload management. Utilized SQL and scripting to define effective relational database solutions in Redshift, enhancing data manipulation and storage. Merck Pharma Branchburg, NJ AWS Data Engineer Sep 2021 - Jun 2022 Designed and implemented end-to-end data pipelines, ETL, and ELT processes in AWS, utilizing Git for version control to ensure code integrity and collaboration. Developed Informatica Design mappings, utilizing various transformations to streamline data processing. Automated data migration workflows using AWS Lambda functions and AWS Step Functions, reducing manual intervention, and improving efficiency. Designed and developed core API services using Python and Spark for seamless data flow within the organization. Utilized Ab Initio's graphical development environment for visually designing and building complex data integration workflows, ensuring scalability and maintainability. Successfully integrated Looker with ETL processes for streamlined data extraction, transformation, and loading. Conducted in-depth data profiling to assess data quality, identifying and resolving anomalies, and improving overall data accuracy. Created and managed Databricks jobs for scheduled and automated data processing tasks. Developed and optimized PL/SQL stored procedures, triggers, and functions to streamline data processing tasks. Automated to check the data quality using PySpark Scripts to check the partitions existence and their statistics. Executed SQL queries and ETL processes on RDBMS and NoSQL databases for effective data manipulation. Leveraged Scala and Python for optimizing code performance and developing real-time data processing applications. Worked on the integration of microservices with cloud platforms like AWS for seamless and scalable data processing. Designed and implemented ETL processes using Databricks for data extraction, transformation, and loading. Implemented end-to-end data processing pipelines in Databricks, handling large volumes of data efficiently. Created several Databricks Spark jobs with PySpark to perform several tables-to-table operations. Managed data catalog using AWS Glue, ensuring accurate and consistent metadata for data in the AWS Data Lake. Integrated Databricks with cloud platforms like AWS for seamless and scalable data processing. Implemented microservices scaling using Kubernetes and Docker. Designed, developed, and implemented performant ETL pipelines using Python API of Apache Spark. Managed ETL pipelines in and out of the data warehouse, combining Python and Snowflake Snow SQL for data processing. Implemented pipeline automation and CI/CD workflows for faster and more reliable pipeline development and deployment. Designed and implemented core API services using Python and Spark. Proficient in managing file permissions, users, and groups in Unix/Linux environments. Audintel Inc. Hyderabad, India AWS Data Engineer Jun 2018 - Jun 2021 Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets. Created a Lambda Deployment function and configured it to receive events from your S3 bucket. Designed the data models to be used in data intensive AWS Lambda applications which are aimed to do complex analysis creating analytical reports for end-to-end traceability, lineage, definition of Key Business elements from Aurora. Writing code that optimizes performance of AWS services used by application teams and provide Code level application security for clients (IAM roles, credentials, encryption, etc.). Creating AWS Lambda functions using python for deployment management in AWS and designed, investigated and implemented public facing websites on Amazon Web Services and integrated it with other applications infrastructure. Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function. Analyzed SQL scripts and designed the solutions to implement using PySpark. Responsible for Building Cloud Formation templates for SNS, SQS, Elastic search, Dynamo DB, Lambda, EC2, VPC, RDS, S3, IAM, Cloud Watch services implementation and integrated with Service Catalog. Regular monitoring activities in Unix/Linux servers like Log verification, Server CPU usage, Memory check, Load check, Disk space verification, to ensure the application availability and performance by using cloud watch and AWS X-Ray. implemented AWS X-Ray service inside Confidential, it allows development teams to visually detect node and edge latency distribution directly from the service map Tools. Developed data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations. Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/DB2/COBOL/Text Files into AWS Redshift. Developed file cleaners using Python libraries and made it clean. Experience in building Snow pipe, In-depth knowledge of Data Sharing in Snowflake Database, Schema and Table structures. Exploring DAG's, their dependencies and logs using Airflow pipelines for automation with a creative approach. Designed and implemented a fully operational production grade largescale data solution on Snowflake. Utilized Python Libraries like NumPy for AWS. Used Amazon EMR for map reduction jobs and test locally using Jenkins. Create external tables with partitions using Hive, AWS Athena, and Redshift. Developed the PySprak code for AWS Glue jobs and for EMR. Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift. Responsible for Designing Logical and Physical data modelling for data sources on Confidential Redshift. Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources. Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table. Education Clark University Master's, Information Technology Keywords: continuous integration continuous deployment business intelligence sthree database information technology procedural language New Jersey |