Python PySpark Developer || Dallas, TX/Pittsburgh, PA at Dallas, Texas, USA |
Email: [email protected] |
Title: Python PySpark Developer Location: Dallas, TX/Pittsburgh, PA (Onsite) Job Type: W2 Contract/FTE Job Description Job Summary: We are seeking an experienced Python Developer with a strong background in PySpark to join our data engineering team. The ideal candidate will have a robust understanding of big data processing, experience with Apache Spark, and a proven track record in Python programming. You will be responsible for developing scalable data processing and analytics solutions in a cloud environment. Key Responsibilities: Design, build and maintain scalable and efficient data processing pipelines using PySpark. Develop high-performance algorithms, predictive models, and proof-of-concept prototypes. Work closely with data scientists and analysts to transform data into actionable insights. Write reusable, testable, and efficient Python code. Optimize data retrieval, develop dashboards, and reports for business stakeholders. Implement data ingestion, data cleansing, deduplication, and data consolidation processes. Leverage cloud-based big data services and architectures (AWS, Azure, or GCP) for processing large datasets. Collaborate with cross-functional teams to define and refine data and analytics requirements. Ensure systems meet business requirements and industry practices for security and privacy. Stay updated with the latest innovations in big data technologies and PySpark enhancements. Required Skills and Qualifications: Bachelors or Masters degree in Computer Science, Engineering, or a related field. Minimum of 3 years of experience in Python development. Strong experience with Apache Spark and its components (Spark SQL, Streaming, MLlib, GraphX) using PySpark. Demonstrated ability to write efficient, complex queries against large data sets. Knowledge of data warehousing principles and data modeling concepts. Proficient understanding of distributed computing principles. Experience with at least one cloud provider (AWS, Azure, GCP), including their big data processing services. Strong problem-solving skills and ability to work under tight deadlines. Excellent communication and collaboration abilities. Preferred Skills: Experience with additional big data tools like Hadoop, Kafka, or similar technologies. Familiarity with machine learning frameworks and libraries. Experience with data visualization tools and libraries. Knowledge of containerization and orchestration technologies (Docker, Kubernetes). Contributions to open-source projects or a strong GitHub portfolio showcasing relevant projects Thanks & Regards Alok Ranjan Pathak | Sr Technical Recruiter Email: [email protected] Ampstek LLC Global IT Partner | www.ampstek.com -- Keywords: information technology wtwo Pennsylvania Texas Python PySpark Developer || Dallas, TX/Pittsburgh, PA [email protected] |
[email protected] View all |
Mon Jul 15 21:48:00 UTC 2024 |