Job Details

Home

Need Site Reliability Engineer at Remote, Remote, USA

Email: [email protected]

From:

ayush,

Scalable Systems

[email protected]

Reply to: [email protected]

Job Title:

Site Reliability Engineer (SRE) - HDInsight

Location:

Bellevue, WA

Job Description:

We are seeking a highly motivated and skilled Site Reliability Engineer to join our HDInsight team. In this role, you will play a critical part in ensuring the stability, performance, and reliability of our big data platform. You will work closely with the HDInsight product team to provide exceptional SRE support, helping customers resolve complex issues and ensuring smooth operations.

Responsibilities:

Customer Support:

Provide top-notch support to customers, troubleshooting and resolving HDInsight-related issues via Incident Communication Management (ICM).

Performance Optimization:

Analyze and optimize the performance of Spark, Hive, and Hadoop jobs, ensuring efficient and scalable big data processing.

Root Cause Analysis:

Investigate production incidents, identify root causes, and implement effective mitigations to prevent future occurrences.

Tool Development:

Build and maintain tools and services that enhance the debuggability and supportability of HDInsight.

Proactive Monitoring:

Monitor the health of clusters for key customers, identifying potential problems before they impact operations.

Migration Support:

Assist in the migration of big data workloads, leveraging your expertise to ensure seamless transitions.

Essential Skills:

Deep understanding of big data technologies and Hadoop ecosystem (HBase, Kafka, etc.)

Hands-on experience with Hadoop administration and troubleshooting

Proficiency in cloud technologies, particularly AWS

Strong problem-solving and analytical skills

Excellent communication and customer service skills

Experience with Hortonworks Data Platform (HDP) is a plus

Desirable Skills:

Knowledge of Spark, Hive, and other big data processing frameworks

Familiarity with performance tuning techniques for big data workloads

Experience with scripting and automation (e.g., Python, Bash)

Understanding of DevOps principles and practices

Qualifications:

Bachelor's degree in Computer Science or a related field

3+ years of experience in Site Reliability Engineering or a similar role

Proven track record of supporting and troubleshooting large-scale distributed systems

Benefits:

Competitive salary and benefits package

Opportunity to work on cutting-edge big data technologies

Collaborative and innovative work environment

Potential for professional growth and development

If you are passionate about big data and have a strong desire to help customers succeed, we encourage you to apply!

Keywords:

Site Reliability Engineer, SRE, Big Data, Hadoop, HBase, Kafka, AWS, Cloud, Hortonworks, HDInsight, Spark, Hive, Performance Tuning, Troubleshooting, Customer Support.

Keywords: Washington
Need Site Reliability Engineer
[email protected]

[email protected]
View all

Wed Jun 05 06:42:00 UTC 2024

To remove this job post send "job_kill 1453236" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

ayush.yadav@scalable-systems.com wrote:
From:

ayush,

Scalable Systems

ayush.yadav@scalable-systems.com

Reply to:   ayush.yadav@scalable-systems.com

Job Title:

Site Reliability Engineer (SRE) - HDInsight

Location:

Bellevue, WA

Job Description:

We are seeking a highly motivated and skilled Site Reliability Engineer to join our HDInsight team. In this role, you will play a critical part in ensuring the stability, performance, and reliability of our big data platform. You will work closely with the HDInsight product team to provide exceptional SRE support, helping customers resolve complex issues and ensuring smooth operations.

Responsibilities:

Customer Support:

Provide top-notch support to customers, troubleshooting and resolving HDInsight-related issues via Incident Communication Management (ICM).

Performance Optimization:

Analyze and optimize the performance of Spark, Hive, and Hadoop jobs, ensuring efficient and scalable big data processing.

Root Cause Analysis:

Investigate production incidents, identify root causes, and implement effective mitigations to prevent future occurrences.

Tool Development:

Build and maintain tools and services that enhance the debuggability and supportability of HDInsight.

Proactive Monitoring:

Monitor the health of clusters for key customers, identifying potential problems before they impact operations.

Migration Support:

Assist in the migration of big data workloads, leveraging your expertise to ensure seamless transitions.

Essential Skills:

Deep understanding of big data technologies and Hadoop ecosystem (HBase, Kafka, etc.)

Hands-on experience with Hadoop administration and troubleshooting

Proficiency in cloud technologies, particularly AWS

Strong problem-solving and analytical skills

Excellent communication and customer service skills

Experience with Hortonworks Data Platform (HDP) is a plus

Desirable Skills:

Knowledge of Spark, Hive, and other big data processing frameworks

Familiarity with performance tuning techniques for big data workloads

Experience with scripting and automation (e.g., Python, Bash)

Understanding of DevOps principles and practices

Qualifications:

Bachelor's degree in Computer Science or a related field

3+ years of experience in Site Reliability Engineering or a similar role

Proven track record of supporting and troubleshooting large-scale distributed systems

Benefits:

Competitive salary and benefits package

Opportunity to work on cutting-edge big data technologies

Collaborative and innovative work environment

Potential for professional growth and development

If you are passionate about big data and have a strong desire to help customers succeed, we encourage you to apply!

Keywords:

Site Reliability Engineer, SRE, Big Data, Hadoop, HBase, Kafka, AWS, Cloud, Hortonworks, HDInsight, Spark, Hive, Performance Tuning, Troubleshooting, Customer Support.

Keywords: Washington 
Need Site Reliability Engineer
ayush.yadav@scalable-systems.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 13

Location: Bellevue, Washington