Gautham Rallapalli - Data Engineer |
[email protected] |
Location: Mechanicsburg, Pennsylvania, USA |
Relocation: Yes |
Visa: H1B |
Rallapalli G
Email : [email protected] Ph No : +1 469 459 6394 Professional Summary: Experienced Data Engineer with over 11 years of IT expertise, specializing in Snowflake data warehousing, cloud computing, and data integration. Proven track record in designing scalable ETL processes, optimizing Snowflake environments, and ensuring data security and compliance. ETL Applications: Crafted scalable Extract Transform Load (ETL) applications for handling structured and unstructured data sources. Utilized Python and SQL for developing and optimizing the ETL process. Real-time Processing: Streamlined real-time data processing and stream analytics using Spark Streaming and Kafka. Managed data flow, monitoring, and recovery scenarios for 24/7 data availability. Cloud Computing: Implemented data processing and storage solutions using AWS services like S3, EMR, EC2, Glue, Lambda, Athena, CloudWatch and Redshift, ensuring scalability and performance. Data Warehouse: Managed large-scale data storage using Snowflake, optimized SQL queries for efficient data retrieval, and ensured secure data handling. Involved in data migration from on-premise systems to the Snowflake cloud data warehouse. Expert in implementing and managing Snowflake environments, with hands-on experience in SnowSQL, Snowpipe, and RBAC design to ensure secure, efficient, and scalable data solutions. Proficiency in implementing and managing data warehouse solutions using tools like Snowflake and Amazon Redshift to organize, store, interpret, and retrieve large amounts of data efficiently. Developed and maintained data pipelines using Apache Spark, Hadoop, and Kafka to handle large-scale data processing tasks. Proficiency in Python, with a strong understanding of libraries such as Pandas for data manipulation, NumPy for numerical data processing, and libraries like Matplotlib and Seaborn for data visualization. Strong background in web application development using Python, Django, Amazon, HTML, CSS, JavaScript, and PostgreSQL. Proficiency in combining data from many heterogeneous data sources and building linked services; knowledge of how to fill Dimension and Fact tables in data marts and warehouses; With SSIS and Azure Data Factory, OLTP and OLAP databases' data may be cleaned and standardized. Experience in building frameworks and automating complex workflows using Python for Test Automation. Software Development: Followed Agile/Scrum methodologies for software development, ensuring high productivity and timely delivery of user stories. Frequently used Jira for project management and bug tracking. Experience with AWS Cloud Formation: Used AWS CloudFormation for provisioning and managing a collection of AWS resources, automating infrastructure setup and improving deployment speed and reliability. Collaborated closely with data science teams to provide clean, usable data for machine learning models, facilitating data-driven decision-making processes. AI Aware : Conceptual and functional understanding of Large Language Models (Llama 2, Claude 2, Amazon Bedrock, Gemini ), NLP and OpenAI. TECHNICAL SKILLS Programming Languages Python 2.7/3.7 , Pyspark, Java Machine Learning Regression Analysis, Decision Tree, Random Forests, Support Vector Machine, K-Means Clustering, KNN Big Data Ecosystem Apache Spark, Apache Hadoop, Apache Hive, Kafka Cloud Computing AWS : EC2, S3, RDS, EMR, Glue, Lambda, CloudWatch, Cloud formmation, Athena, IAM, VPC, SNS, SQS Azure : Blob Storage, Data Factory, HDInsight, Devops,CosmosDB, SSIS Databases MySQL, Oracle 11g, Postgres Data Warehousing Snowflake, Amazon Redshift, Databricks NoSQL Databases MongoDB Data Visualization PowerBI, Matplotlib, Seaborn, ggplot2 Others VisualStudio, MS Project, Github, Apache Kakfa, Terraform, Docker, SnowSQL, Snowpipe, RBAC Design CERTIFICATIONS Professional Scrum Master- I Certified Snowflake Professional : Snowpro Core EDUCATION Executive PG Diploma in Data Science from International Institute of Information Technology, Bangalore (IIIT-B), India 2023 Bachelors of Technology & Science from Raghu Engineering college , India 2013 EXPERIENCE CoreTek Labs Jul 2022 Till Date Data Engineer Responsibilities: Documented and standardized data architecture processes, enhancing knowledge sharing and ensuring future scalability. Architected Snowflake data environments, optimizing virtual warehouses, databases, and schemas for performance and cost-efficiency. Developed and maintained scalable data pipelines and ETL processes using SQL and SnowSQL, resulting in a 25% increase in data processing efficiency. Perform data cleansing, transformation, and validation to ensure accuracy and consistency. Monitor and troubleshoot data pipelines to identify and resolve issues in a timely manner using Datadog. Utilized GitHub for source code management and version control of Spark applications and ETL pipelines. Managed and supported key AWS services including EC2, Lambda, and IAM, ensuring high availability and security compliance. Provisioned and managed AWS resources using Terraform, automating infrastructure setup and improving deployment speed. Supported data science teams in applying ML models by providing clean, usable data, thus facilitating data-driven decision-making processes. Proficient in Python Programming, with a strong grasp of Data Structures, Object-Oriented Programming, Decorators, Generators, List Comprehension, Threading, and Pytest, demonstrating a comprehensive understanding and application in data engineering tasks. Automated the collection of vast amounts of structured and unstructured data from various sources including APIs, databases, and file systems using Python. Integrated Snowflake with third-party APIs for automated data ingestion, enhancing data availability. Implemented and managed data governance policies ensuring data quality, consistency, and integrity. Enforced security measures to protect data within Snowflake, ensuring compliance with industry standards. Developed SnowSQL scripts to automate data extraction and transformation processes, improving efficiency and data accuracy. Utilized SnowSQL for complex query operations, ensuring high performance and reliability in data retrieval and manipulation. Implemented Snowpipe to streamline real-time data ingestion, reducing latency and enhancing data pipeline performance. Developed and maintained automated data pipelines that significantly reduced manual data handling, improving data quality and availability for decision-making processes. Established data quality monitoring protocols using Python scripts, ensuring high data accuracy and integrity across multiple data sources and systems. Designed and enforced RBAC policies within Snowflake to ensure secure data access and compliance with regulatory standards. Regularly reviewed and updated RBAC configurations to accommodate changes in team structure and data access requirements, maintaining a secure and efficient data environment. Created Docker files to automate the build and deployment processes, streamlining the development workflow and reducing manual interventions. Utilized Docker Compose to define and manage multi-container Docker applications, ensuring seamless integration between services. Integrated Docker with version control systems such as Git, enabling automated builds and deployments triggered by code changes. Created visually engaging reports and dashboards for senior management, facilitating data-driven decision-making and strategy development. Environment: Python 2.7, Flask, HTML5/CSS, PostgreSQL, MySQL, Jupyter Notebook, PyCharm, JIRA, PowerBI, Terraform, AWS Cloud Watch, AWS S3, Docker, Snowflake, SnowSQL, Snowpipe JGC Corporation, Yokohama, Japan Mar 2016 Jun 2022 Data Engineer Project 1: Development of Advance Work Packing System (AWP) Responsibilities: Automated Data Pipelines: Creation of automated pipelines that regularly collect, clean, transform, and store data, significantly reducing manual effort and error rates. Data Quality Monitoring: Implementation of scripts that continuously monitor data quality and alert the team to potential issues. Advanced Analytics and Reporting: Utilization of the cleaned and structured data to build predictive models and generate insights that inform strategic decisions. Leveraged Python libraries such as Pandas, NumPy, and Matplotlib for complex data manipulation, analysis, and visualization tasks, handling datasets of varying sizes and complexity. Contributed to a 30% increase in operational efficiency through the automation of data-related tasks, allowing the analytics team to focus on higher-value activities. Contributed to the creation of predictive models and analytics tools that leveraged cleaned and structured data, providing insights into customer behavior and market trends, and informing data-driven strategic planning. Engineered a variety of Spark applications to cleanse, transform, and enrich click stream data for more effective analysis and reporting. Actively involved in data cleansing, event enrichment, data aggregation, de-normalization, and data preparation for machine learning and reporting. Enhanced error tolerance and reliability of Spark applications through systematic troubleshooting. Fine-tuned Spark applications/jobs, improving overall processing efficiency and reducing processing time. Enhanced data pipelines with automated ML model scoring and evaluation using Spark MLlib, improving real-time analytics. Developed Spark-Streaming applications to consume data from Kafka topics. Maximized data handling capabilities by leveraging Spark s in-memory capabilities and broadcast variables for efficient joins and transformations. Gained substantial experience working with EMR cluster and S3 in the AWS cloud, enhancing cloud-based data handling efficiency. Orchestrated the continuous integration of applications using Airflow, improving deployment workflow and minimizing deployment downtime. Constructed ETL Data Pipelines for ingestion of structured, semi-structured, and heterogeneous data into Snowflake, using Containerized Airflow on AWS ECS. Coordinated closely with the infrastructure, network, database, and application teams to ensure data quality and availability. Adhered to Agile methodologies throughout project duration for improved project deliverables and timelines. Environment: Python 2.7, Flask, Django,HTML5/CSS, PostgreSQL, MySQL, Jupyter Notebook, PyCharm, JIRA, PowerBI, Sparkm, AWS S3, Snowflake, Agile Methodologies Project 2: Conceptual Study & Development of DATA Hub Responsibilities: Contributed in conceptual study and system design of initital development of Data Hub in accordance with industry standards and DMBOK. Collaborated with cross-functional teams to define data requirements and deliver actionable insights. Assisted and created Documents like Data Management Procedure, Workflows and Implementation procedure, Swimelane charts. Worked extensively in cleaning, transforming, and normalizing large datasets to ensure accuracy and readiness for analysis. Expanded website functionality, using Flask Framework in Python to control the web application logic. Built automation scripts using confidential API and Python BeautifulSoup to scrap the data from social network and other websites using Python. Develop views and templates with Django view controller and template language to create user friendly website interface. Handling missing data, detecting outliers, and correcting inconsistencies in datasets using Python. Used Pandas for complex data manipulations, including merging, concatenating, pivoting, and reshaping datasets. Write efficient Python code to automate data processing tasks, enhancing productivity and data accuracy. Implemented advanced data cleaning techniques to handle inconsistencies, duplicates and missing values, improving data accuracy by 60%. Utilized Python to perform statistical analysis, idnentify trends, and extract insights from data. Utilized Python to ingest data from multiple sources, standardizing formats and structures and performed aggreation to support comprehesive data analysis and insights generation. Built Strategic, Analytical, and operational dashboards using PowerBI. Responsible for building scalable distributed data solutions using Azure Data lake, Azure Databricks, Azure HDInsight & Azure Cosmos DB. Familiar with using SSIS for incremental data loading, data cleansing, and data validation. Proficient in performance tuning SSIS packages for optimal performance, including optimizing data flows and using appropriate data types. Created Pipelines in Azure Data Factory using Datasets/Pipeline/ to Extract, Transform and Load data from different sources like Azure SQL, Blob storage, Databricks, Azure SQL Data warehouse. Worked on Azure Data Factory to extract the data from relational sources to Data Lake. Collaborated with cross-functional teams to identify data integration requirements and ensure data accuracy. Developed and deployed serverless applications using Azure Functions to automate data processing tasks and improve scalability. Designed and implemented real-time data pipelines with Databricks Structured Streaming and Delta Lake. Developed and deployed Databricks notebooks for data transformation and analytics. Created interactive dashboards using PowerBI and Databricks SQL, providing actionable insights to stakeholders. Collaborated with data scientists to integrate machine learning models into the Databricks environment for real-time scoring and predictions Environment: Python 2.7, Flask, HTML5/CSS, PostgreSQL, MySQL, Jupyter Notebook, PyCharm, JIRA, PowerBI, Azure Data Factory, Azure Data lake, Databricks, Azure Blob storage, Azure HDInsight, Cosmos DB, SSIS Project 3: Integrated Construction Management System (I-CMS) Responsibilities: Used Python Flask framework to build and modular & maintainable applications. Automated data movements using Python Scripts. Involved in splitting, validating and processing of files. Created core Python API whihc was used among multiple modules. Created unit test/regression test framework for working/ new code. Developed and designed Python based API (RESTful WebService) to interact with company s website. Successfully implemented Django Framework to design server applications. Build and test functionality within production pipeline. Used Django configuration to manage URL and application parameters. Developed Merge jobs in Python to extract and load data into MySQL database. Created GIT repository and added the project to GitHub. Utilized Agile methodologies and JIRA to track sprint cycles. Collaborated with cross-functional teams to translate business requirements into technical specifications and mentored junior developers in best practices. Leveraged Databricks Delta Lake for scalable and high-performance data lake solutions, ensuring ACID transactions and data reliability. Deployed workflows and orchestrated ETL processes, improving data flow and processing efficiency. Designed and implemented data pipelines using Databricks to integrate and process construction data, improving project management and operational efficiency. Developed automated data quality checks within Databricks, ensuring high data accuracy and reducing manual data validation efforts. Integrated Databricks with Azure Data Factory to orchestrate end-to-end data workflows, enhancing data processing and management capabilities. Environment: Python 2.7, Flask,Django, HTML5/CSS, MySQL, Jupyter Notebook, PyCharm, JIRA, PowerBI, Github, Databricks Exxon Mobil, Qatar Sep 2013 Feb 2016 Project Engineer Responsibilities: Organization and Management of the project engineering issues for the assigned facilities. Co-ordinate all interface and change management (MOC) activities for the project. Organize and facilitate Design and Safety Reviews, HAZOP, SILs, P&ID Reviews etc. Discussion with Client and development of Critical Spare Parts requirement. Participate in 3D Model review (Maintenance study) and close out the comments with Client. Maintained, supervised changes, and distributed controlled documents to corresponding departments. Responsible for compiling reports, analyze contracts and ensure the documents are filled out accurately. Maintained filing system of all change orders both electronically and physical copies. Assist with the development and setup of the EDMS database for the project and maintain user access. Assist project team with update drawings, RFI logs, NCR logs, Transmittal logs and action items. Verify vendor supply documentation is accurate by comparing the BOM to the drawings. Train and assist team members in document control procedures and tools. Document archival and retrieval from off-site storage facility. Preparing and maintaining Project Asset Register. Project Close out and Warranty Claim Management. Developing Internal Project Management, Control and Completion System. Keywords: artificial intelligence machine learning sthree database information technology microsoft Colorado Delaware Idaho |