Mupalaneni - Sr Data Engineer |
[email protected] |
Location: Dallas, Texas, USA |
Relocation: ANY |
Visa: H1B |
Carrier Objective:
Technology Savvy Professional with nearly 13 years of rich multi-functional IT experience in diverse domains with a strong record of contributions in driving operational excellence in global environment, targeting next level assignments in IT Project Management, IT Infrastructure Services, Cloud Infrastructure & Service Delivery Management with an organization of high repute for mutual growth. PROFESSIONAL SUMMARY: Around 13+ years of experience in handling Data, Data migration, Data warehouse, creating data pipelines for reputed organizations utilizing technologies like Python, PySpark, Snowflake, Microsoft SQL, PostgreSQL, Oracle, MYSQL, Mongo DB, AWS, Hadoop, Hive, Apache Airflow, Tableau Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory. Experience with cloud-based services such as AWS EMR, EC2, S3, Athena, Glue, EKS, RDS, Kinesis, CODEBUILD, SNS, SQS, RDS and Redshift to work in distributed data models that provide fast and efficient processing of big data. Recommend best practices for use of Solr collections for SaaS scenarios involving large number of tenants, each with large number of documents (1 million+). Experience in building Aerospike clusters in non - prod and production environments. Expertise in Data masking, Data subsetting, Synthetic test data generation, and Data archive using Informatica TDM/ILM Suite. Developed custom Map Reduce jobs to extract, transform, and load (ETL) data from various sources into the Hadoop ecosystem with Cloudera. Experience in tuning solr documents for best relevance Implemented Dynamic Data Masking in Azure SQL database and Azure Synapse Analytics with different masking functions and datatypes using Azure portal and T-SQL commands. Experience in providing 24*7 production support on Aerospike and PCF Hands-on experience with the Snowflake cloud data warehouse for integrating data from multiple source systems, including loading nested JSON-formatted data into the Snowflake table. Generated a script in AWS Glue to transfer the data and utilized AWS Glue to run ETL jobs and run aggregation on PySpark code. Created and configured different Masking rules using different masking techniques to perform Inplace, Instream masking in Informatica TDM to mask the sensitive PHI, PII data information. Experience with Configuring AWS EC2instances, EMR cluster with S3buckets, Auto-Scaling groups and Cloud Watch. Hands on experience on SOLR schema creation, indexing data and Elasticsearch. Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, Relational Data Ingestion Implemented Apache Spark using Scala and SparkSQL for faster testing and processing of data. Hands-on experience in designing and managing data warehouses on Snowflake, including schema design, table creation, and optimization. Experienced in processing big data on the Apache Hadoop framework using Map Reduce programs. Perform data visualization techniques to present the insights visually using Tableau, Excel and PowerBI. Knowledge and experience with Continuous Integration and Continuous Deployment using containerization technologies like Kubernetes, Docker, and Jenkins. Experience in creating, scheduling, and monitoring pipelines using Apache Airflow. Hands on Experience with NoSQL using Mongo DB. Performed Data Analysis and Data validation by writing complex SQL queries. Experience in Data cleaning, Data Manipulation using Python and SQL. Strong expertise in data analysis, developing SQL Queries and fine-tuning existing code Good experience in Python, Django, Flask, Pyramid, Ansible, Docker, CI/CD. In-Depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming with PySpark and PANDA library. Experience with configuration and development on multiple Hadoop distribution platforms like Cloudera and Hortonworks (on premise). Experienced in managing and optimizing Linux/Unix environments to ensure reliable and secure operation of cloud-based systems. Ability to automate web interactions using Python Selenium, with proficiency in data scraping Documenting the process and requirements as well as resolve conflicts or ambiguities Experience in handing various REST APIs in performing GET and POST operations. Having experienced in Agile Methodologies, Scrum stories and sprints Worked in various levels of SDLC (System Development Life Cycle) which involves Analysis, Design, Development, Testing, Implementation, and support with extensive exposure in database. A Self-starter with a positive attitude, willingness to learn new concepts and acceptance of challenges. Keywords: continuous integration continuous deployment sthree database information technology |