PYSpark Developer WHIPPANY, NJ at Remote, Remote, USA |
Email: [email protected] |
Role Name: PYSpark Developer Location: Job Summary: 6-8 years of experience We are looking for a highly skilled Python/PySpark Developer with hands-on experience in Big Data technologies to join our dynamic data engineering team. The ideal candidate will have a strong background in building scalable, distributed data processing systems, using PySpark, and working on large datasets. You will collaborate with data scientists, engineers, and other stakeholders to design and implement efficient data pipelines. Key Responsibilities: Develop, optimize, and maintain ETL pipelines using PySpark to process large-scale datasets across distributed environments. Design and implement complex data transformation logic using PySpark and other Big Data tools. Work with various Big Data technologies such as Hadoop, Hive, HBase, Kafka, and Spark to build robust, scalable data systems. Collaborate with data engineers and data scientists to integrate data from multiple sources and create unified datasets. Write efficient, reusable, and scalable Python code to handle both batch and real-time data processing tasks. Ensure data quality, consistency, and reliability by implementing data validation, monitoring, and error handling. Fine-tune and optimize PySpark jobs to improve performance in distributed environments. Manage and maintain data flows in HDFS, ensuring scalability and fault tolerance. Perform data extraction, aggregation, and reporting using SQL and NoSQL databases. Participate in system design discussions and provide recommendations for architecture and performance improvements" -- Keywords: information technology PYSpark Developer WHIPPANY, NJ [email protected] |
[email protected] View all |
Tue Oct 29 20:31:00 UTC 2024 |