Need Site Reliability Engineer at Remote, Remote, USA |
Email: [email protected] |
From: ayush, Scalable Systems [email protected] Reply to: [email protected] Job Title: Site Reliability Engineer (SRE) - HDInsight Location: Bellevue, WA Job Description: We are seeking a highly motivated and skilled Site Reliability Engineer to join our HDInsight team. In this role, you will play a critical part in ensuring the stability, performance, and reliability of our big data platform. You will work closely with the HDInsight product team to provide exceptional SRE support, helping customers resolve complex issues and ensuring smooth operations. Responsibilities: Customer Support: Provide top-notch support to customers, troubleshooting and resolving HDInsight-related issues via Incident Communication Management (ICM). Performance Optimization: Analyze and optimize the performance of Spark, Hive, and Hadoop jobs, ensuring efficient and scalable big data processing. Root Cause Analysis: Investigate production incidents, identify root causes, and implement effective mitigations to prevent future occurrences. Tool Development: Build and maintain tools and services that enhance the debuggability and supportability of HDInsight. Proactive Monitoring: Monitor the health of clusters for key customers, identifying potential problems before they impact operations. Migration Support: Assist in the migration of big data workloads, leveraging your expertise to ensure seamless transitions. Essential Skills: Deep understanding of big data technologies and Hadoop ecosystem (HBase, Kafka, etc.) Hands-on experience with Hadoop administration and troubleshooting Proficiency in cloud technologies, particularly AWS Strong problem-solving and analytical skills Excellent communication and customer service skills Experience with Hortonworks Data Platform (HDP) is a plus Desirable Skills: Knowledge of Spark, Hive, and other big data processing frameworks Familiarity with performance tuning techniques for big data workloads Experience with scripting and automation (e.g., Python, Bash) Understanding of DevOps principles and practices Qualifications: Bachelor's degree in Computer Science or a related field 3+ years of experience in Site Reliability Engineering or a similar role Proven track record of supporting and troubleshooting large-scale distributed systems Benefits: Competitive salary and benefits package Opportunity to work on cutting-edge big data technologies Collaborative and innovative work environment Potential for professional growth and development If you are passionate about big data and have a strong desire to help customers succeed, we encourage you to apply! Keywords: Site Reliability Engineer, SRE, Big Data, Hadoop, HBase, Kafka, AWS, Cloud, Hortonworks, HDInsight, Spark, Hive, Performance Tuning, Troubleshooting, Customer Support. Keywords: Washington Need Site Reliability Engineer [email protected] |
[email protected] View all |
Wed Jun 05 06:42:00 UTC 2024 |