Hiring For Certified Databricks Engineer, Remote :: Certification is Must at Remote, Remote, USA |
Email: [email protected] |
Title :- Certified Data Bricks Data Engineer Location: Remote (Pacific Time) Duration: Four month contract Requirements: Background Check (Candidates must have LinkedIn profile with picture or be able to share Photo ID with submittal) Certification of Data Bricks is Must Title: Certified Databricks Data Engineer Key Skills: Databricks Certification, Spark, performance tuning, data ingestion, large-scale data warehouse (20TB or above). Job Description: We are seeking an exceptional Senior Data Engineer specializing in Databricks and Apache Spark with a strong focus on performance optimization and data architecture. The ideal candidate will possess deep technical expertise in tuning and optimizing large-scale data processing systems, with a robust understanding of data architecture principles that enhance performance. This role is critical to our data infrastructure, ensuring our pipelines are efficient, scalable, and reliable. Key Responsibilities: Performance Optimization and Tuning: Lead efforts in optimizing Spark job performance, including tuning memory management, resource allocation, and execution plans to achieve the highest efficiency and throughput. Analyze performance metrics and logs to identify areas for improvement and implement best practices for Spark optimization. Develop and implement advanced partitioning, caching strategies, and data layout optimizations to minimize processing time and cost. Advanced Pipeline Development and Data Architecture: Architect, develop, and optimize complex data pipelines and ETL workflows using Databricks and Spark, ensuring they are highly performant, scalable, and reliable. Design and implement data architectures that support high performance, including choosing appropriate data storage formats, indexing strategies, and data partitioning schemes. Collaborate on the design and implementation of data models that support efficient querying and data retrieval. Technical Leadership in Performance Engineering: Serve as the go-to expert on Spark performance optimization, guiding the team in adopting advanced techniques and tools for performance improvement. Provide mentorship and training on performance tuning, data architecture, and best practices in Spark and Databricks to the engineering team. Performance Monitoring and Troubleshooting: Set up and maintain comprehensive monitoring and alerting systems for Spark applications to proactively detect performance issues. Diagnose and resolve complex performance problems, implementing preventive measures to avoid recurrence. Collaboration and Stakeholder Engagement: Collaborate closely with data scientists, data analysts, and other engineering teams to understand performance requirements and ensure optimal data processing capabilities. Communicate complex performance findings and recommendations effectively to technical and non-technical stakeholders. Required Skills and Experience: Expert-Level Knowledge in Apache Spark Performance: Extensive experience with Spark internals, including in-depth understanding of the Catalyst optimizer, Tungsten execution engine, and Spark SQL query execution. Proven track record of optimizing Spark job performance in production environments, particularly in reducing latency and improving resource efficiency. Databricks Expertise: Deep familiarity with Databricks architecture, cluster management, and advanced configuration settings that affect performance. Experience with optimizing Databricks notebooks and workflows, leveraging the platforms features for performance gains. Data Architecture Skills: Strong expertise in data architecture, including designing data models, data flows, and data storage solutions that optimize performance. Knowledge of best practices in data management, including data partitioning, indexing, and the use of data lakes or data warehouses. Strong Programming and Optimization Skills: Proficient in Python and/or Scala for Spark development, with a strong emphasis on writing efficient, high-performance code. Expertise in SQL query optimization and tuning. Comprehensive Knowledge of Big Data Ecosystem: Familiarity with data storage formats (e.g., Parquet, ORC), distributed computing principles, and cloud infrastructure services (AWS, Azure, GCP). Innovative Problem-Solving: Ability to creatively address and solve performance issues, leveraging deep technical knowledge and innovative approaches. Preferred Qualifications: Certifications and Professional Recognition Databricks Certified Professional Data Engineer or equivalent certifications focused on performance and optimization. Contributions to the community through blogs, talks, or publications on Spark performance and data architecture topics. Experience with Performance Benchmarks and Load Testing Experience in designing and conducting performance benchmarks and load tests to validate and verify system performance under various conditions. Thanks & Regards Manish Kumar Sr. Technical Recruiter Email: [email protected] Linkedin: Manish Chaudhary Jaat | LinkedIn ANVETA, Inc Address: 1333 Corporate Drive, Suite #108 Irving, TX 75038, USA -- Keywords: information technology golang Idaho Texas Hiring For Certified Databricks Engineer, Remote :: Certification is Must [email protected] |
[email protected] View all |
Fri Oct 04 19:27:00 UTC 2024 |