Looking for python/pyspark developer at Remote, Remote, USA |
Email: [email protected] |
Title: python/pyspark developer Location: Whippany, NJ Key Responsibilities: Develop, optimize, and maintain ETL pipelines using PySpark to process large-scale datasets across distributed environments. Design and implement complex data transformation logic using PySpark and other Big Data tools. Work with various Big Data technologies such as Hadoop, Hive, HBase, Kafka, and Spark to build robust, scalable data systems. Collaborate with data engineers and data scientists to integrate data from multiple sources and create unified datasets. Write efficient, reusable, and scalable Python code to handle both batch and real-time data processing tasks. Ensure data quality, consistency, and reliability by implementing data validation, monitoring, and error handling. Fine-tune and optimize PySpark jobs to improve performance in distributed environments. Manage and maintain data flows in HDFS, ensuring scalability and fault tolerance. Perform data extraction, aggregation, and reporting using SQL and NoSQL databases. Participate in system design discussions and provide recommendations for architecture and performance improvements -- Keywords: information technology New Jersey Looking for python/pyspark developer [email protected] |
[email protected] View all |
Tue Oct 08 19:17:00 UTC 2024 |