Job Details

Home

Looking for Site Reliability Engineer || Remote at Remote, Remote, USA

Email: [email protected]

Title: Site Reliability Engineer

Location: Remote

Terms: Contract

Job Details:

Mandatory skill sets needed:

GutHub Actions

AWS Cloud Formation

AWS CodePipeline

In Depth understanding of Secure Coding practices operationalization

Job Summary:

We are looking for a skilled SRE Reliability Engineer to join our Site Reliability Engineering (SRE) team. The primary focus of this role is to ensure the reliability, availability, and performance of our systems and services. You will work closely with software engineers, DevOps teams, and other SREs to build and maintain resilient systems that meet our service level objectives (SLOs). Your expertise will help us identify potential reliability risks, automate processes, and improve our incident response capabilities.

Key Responsibilities:

Reliability Engineering:

Design and implement strategies to improve the reliability and availability of our services.

Develop and maintain service level objectives (SLOs), service level indicators (SLIs), and service level agreements (SLAs) to measure and ensure system reliability.

Identify and mitigate potential risks to system reliability through proactive measures, including redundancy, fault tolerance, and capacity planning.

Monitoring and Alerting:

Set up and fine-tune monitoring and alerting systems to detect anomalies and issues in real-time.

Implement service level objectives (SLOs), service level indicators (SLIs), and service level agreements (SLAs) to measure system reliability and performance.

Performance and Reliability Analysis:

Analyze system performance data to identify bottlenecks, trends, and potential issues.

Work with development and operations teams to optimize application performance and improve system reliability.

Automation and Tooling:

Automate the collection and processing of observability data to reduce manual effort and improve accuracy.

Develop custom tools and scripts to extend observability capabilities as needed.

Collaboration:

Work closely with development teams to integrate observability best practices into the software development lifecycle.

Collaborate with security teams to ensure observability tools and practices align with security and compliance requirements.

Qualifications:

Experience:

5+ years of experience in Site Reliability Engineering, DevOps, or a related role with a focus on system reliability and performance.

Strong background in monitoring, alerting, and incident management tools and practices.

Experience with cloud platforms (AWS, Azure, GCP) and container orchestration tools (e.g., Kubernetes, Docker).

Skills:

Proficiency in scripting and automation languages (e.g., Python, Bash, Go).

Strong understanding of networking, system performance, and reliability principles.

Knowledge of service level management, including SLOs, SLIs, and SLAs.

Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).

Soft Skills:

Excellent problem-solving and analytical skills, with a proactive approach to identifying and addressing system vulnerabilities.

Strong communication skills, with the ability to work effectively with cross-functional teams.

A commitment to continuous learning and staying current with the latest industry trends and technologies.

Regards,

Srijan Roy

Cynet Systems

--

Keywords: information technology golang
Looking for Site Reliability Engineer || Remote
[email protected]

[email protected]
View all

Thu Sep 12 02:48:00 UTC 2024

To remove this job post send "job_kill 1742725" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

srijan.roy93@gmail.com wrote:
Title: Site Reliability Engineer

Location: Remote

Terms: Contract

Job Details:

Mandatory skill sets needed:

GutHub Actions

AWS Cloud Formation

AWS CodePipeline

In Depth understanding of Secure Coding practices operationalization

Job Summary:

We are looking for a skilled SRE Reliability Engineer to join our Site Reliability Engineering (SRE) team. The primary focus of this role is to ensure the reliability, availability, and performance of our systems and services. You will work closely with software engineers, DevOps teams, and other SREs to build and maintain resilient systems that meet our service level objectives (SLOs). Your expertise will help us identify potential reliability risks, automate processes, and improve our incident response capabilities.

Key Responsibilities:

Reliability Engineering:

Design and implement strategies to improve the reliability and availability of our services.

Develop and maintain service level objectives (SLOs), service level indicators (SLIs), and service level agreements (SLAs) to measure and ensure system reliability.

Identify and mitigate potential risks to system reliability through proactive measures, including redundancy, fault tolerance, and capacity planning.

Monitoring and Alerting:

Set up and fine-tune monitoring and alerting systems to detect anomalies and issues in real-time.

Implement service level objectives (SLOs), service level indicators (SLIs), and service level agreements (SLAs) to measure system reliability and performance.

Performance and Reliability Analysis:

Analyze system performance data to identify bottlenecks, trends, and potential issues.

Work with development and operations teams to optimize application performance and improve system reliability.

Automation and Tooling:

Automate the collection and processing of observability data to reduce manual effort and improve accuracy.

Develop custom tools and scripts to extend observability capabilities as needed.

Collaboration:

Work closely with development teams to integrate observability best practices into the software development lifecycle.

Collaborate with security teams to ensure observability tools and practices align with security and compliance requirements.

Qualifications:

Experience:

5+ years of experience in Site Reliability Engineering, DevOps, or a related role with a focus on system reliability and performance.

Strong background in monitoring, alerting, and incident management tools and practices.

Experience with cloud platforms (AWS, Azure, GCP) and container orchestration tools (e.g., Kubernetes, Docker).

Skills:

Proficiency in scripting and automation languages (e.g., Python, Bash, Go).

Strong understanding of networking, system performance, and reliability principles.

Knowledge of service level management, including SLOs, SLIs, and SLAs.

Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).

Soft Skills:

Excellent problem-solving and analytical skills, with a proactive approach to identifying and addressing system vulnerabilities.

Strong communication skills, with the ability to work effectively with cross-functional teams.

A commitment to continuous learning and staying current with the latest industry trends and technologies.

Regards,

Srijan Roy

Cynet Systems

Keywords: information technology golang 
Looking for Site Reliability Engineer || Remote
srijan.roy93@gmail.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,