Job Details

Home

URGENT NEED ----- Site Reliability Engineer (SRE) Alpharetta GA Onsite - Locals at Alpharetta, Georgia, USA

Email: [email protected]

From:

Alex keylor,

NAVTECH INC

[email protected]

Reply to:   [email protected]

I have an opportunity for " Site Reliability Engineer (SRE)     Alpharetta GA Onsite - Locals "  and I am looking for a candidate who can join Immediately if you are interested, reply to me with your updated resume or if you could refer someone I would really appreciate it.

Position : Site Reliability Engineer (SRE)

Location : Alpharetta GA Onsite

Duration : 6+ Months

Visa Status : Any - No OPT

Need local to GA F2F Interview Required

Job Description:

Seeking an experienced Site Reliability Engineer who can operate independently with limited guidance and oversight. This individual will be passionate about end-user experience and will be part of a tight-knit, distributed engineering team developing and delivering a comprehensive data operations management solution for Client Data Fabric Platform. SRE is a critical role in the entire SDLC from coding, scaling, and ensuring production stability that includes responding to on-call incidents.

Data Fabric is a GCP cloud-native modern data management platform which allows Client to acquire and curate data, provide entity resolution, and ingest into a single environment. It is deployed globally in multiple regions, highly secured and complies with regional and internal regulatory controls with strict governance and oversight. Business units, Data Scientists and many other stakeholders use APIs to consume data managed by the Data Fabric and operate data exchanges to monetize data through B2B and B2C channels.

Data operations management solution consists of:

         A web portal UI/UX that provides a single point of access to all data management and data reliability engineering

         A suite of backend API services that services the UI and integrates with low-level Data Fabric and other third-party system APIs

         Modern data lakehouse (data lake, data warehouse, batch and streaming ELT pipelines)

The data operations roadmap envisions a set of rich management capabilities including:

         Serves a large community of geographically dispersed data operations stakeholders

         Data quality and observability management to detect, alert, and prevent data anomalies

         Troubleshooting, triaging and resolving data and data pipeline issues

         OLAP, batch and streaming big data processing, and BI reporting

         MLOps

         Real-time dashboards, alerting and notifications, case management, user/group management, AuthZ, and many other foundational capabilities

Tech Stack

         Frontend: Angular 17+, JavaScript, TypeScript, HTML, SCSS, Webpack Module Federation, Tailwinds CSS, Angular Material, Angular Elements

         Backend: Java (JDK 17+), Spring Framework 6.X.X, Spring Boot 3.X.X, NestJS 10.X.X, REST and GraphQL microservices, NodeJS

         Tools & Frameworks: Nx build management, Monorepo architecture, Jenkins CI/CD, Fortify, Sonar, GitHub

         Cloud & Data: GCP (GKE, Composer + Airflow, Dataflow + Apache Beam, BigQuery, BigTable, Firestore, GCS, PubSub, Vertex AI), Terraform, Helm Charts, GitOps

         Other Technologies: Websockets, SSE, event-driven architecture

Environment

         Culture: Fast-paced, creative, results-oriented

         Team Structure: Agile, working in 2-week sprints using Aha and Jira for project management

         Expectations: Self-starters who can work independently with limited guidance, delivering solutions that end-users value and love

General Responsibilities

         Contribute to Development Activities: SRE is expected to participate in SDLC activities that include design, develop, test, deploy, and operate, covering both frontend and backend

         Cross-Functional Work: Collaborate with global teams to integrate with existing internal systems and GCP cloud

         Issue Resolution: Triage and resolve product or system issues, ensuring quality and performance

         Documentation: Write technical documentation, support guides, and run books

         Agile Practices: Participate in sprint planning, retrospectives, and other agile activities

         Compliance: Ensure software meets secure development guidelines and engineering standards

SRE Accountability

         General: Use coding, automation, and software engineering principles to ensure scalability, performance, and reliability efficiently and toil-free

         IAC: Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK)

         CI/CD: Build CI/CD pipelines for build, test and deployment of application and cloud architecture patterns, using platform (Jenkins) and cloud-native toolchains

         Automation: Build automated tooling to deploy service requests to push a change into production. Build runbooks that are comprehensive and detailed to manage detect, remediate and restore services

         Change Management: Work closely with the dev team to ensure all DevSecOps issues are addressed timely, in compliance with Equifax security policies, and adherence to Engineering Handbook

         Incident management: Solve problems and triage complex distributed architecture service maps. On call for high severity application incidents and improving run books to improve MTTR

         RCA and postmortem: Lead root cause analysis and blameless postmortem and own the call to action to remediate recurrences

         Customer Focus: Address service disruptions and downtime ensuring end-customer needs are met, and drive processes for a flawless customer experience ensuring

         Reliability and Availability: Ensure monitoring of SRE golden signals, SLO, SLIs, and SLAs are honoured within error budgets. Work closely with devs, QE, POs, and other stakeholders providing continuous feedback on uptime, scalability, and reliability, and influence best practices with aim of providing excellent operational experiences

         Reliability roadmap: Own the reliability roadmap by taking a holistic view of all data operations management capabilities that includes participating in Production Readiness Review (PRR), and working with stakeholders to ensure DR plans are in place

Must-Have Skills

         General experience: 5-7 years of experience in software engineering, systems administration, database administration, and networking. System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes), and shell scripting

         Cloud-Native Application Development: 3+ years. Solid experience with developing and supporting cloud-native applications. Experience with cloud-based security: IAM, AuthZ

         End-user Application Experience: 3+ years experience as a SRE supporting an end-user facing application, e.g web/mobile/desktop app that includes UI, APIs, and backend systems

         Development Experience: 2+ years of general proficiency with Java, or JavaScript/NodeJS

         Frontend Experience: Experience with Angular, JavaScript, TypeScript, or modern web application development frameworks

         Architecture Knowledge: Understanding of modular systems, performance, scalability, security

         Agile Experience: Agile development mindset and experience

         Service-Oriented Architecture: Knowledge of RESTful web services, JSON, AVRO

         Application Troubleshooting: Debugging, performance tuning, production support

         Documentation Skills: Strong written and verbal communication

         General SDLC: Experience with CI/CD concepts and can use tools including Jenkins/Bamboo, and release management concepts. Understanding of GCP services related to big data like BigQuery, Dataflow, Pub/Sub,GCS, Composer/Airflow. Or, similar solutions in AWS: Redshift, SNS, SQS, S3, Kinesis and others

Nice-to-Have Skills

         Big Data Processing: ETL/ELT experience

         Scripting Languages: Groovy, Python

         Cloud Certification: Relevant certifications in cloud technologies

Regards,

Alex . K

NAVTECH INC

P

:

(224) 348-1340

E

:

[email protected]

1600 Golf Road. Suite 1200, Rolling Meadows, IL 60008

www.Navtechusa.com

E-Verified Company

.

Keywords: cprogramm continuous integration continuous deployment artificial intelligence user interface user experience access management business intelligence sthree information technology Georgia Illinois
URGENT NEED ----- Site Reliability Engineer (SRE) Alpharetta GA Onsite - Locals
[email protected]

[email protected]
View all

Wed Jul 03 02:05:00 UTC 2024

To remove this job post send "job_kill 1531324" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

alex@navtechusa.com wrote:
From:

Alex keylor,

NAVTECH INC

alex@navtechusa.com

Reply to:   alex@navtechusa.com

I have an opportunity for  " Site Reliability Engineer (SRE)     Alpharetta GA  Onsite - Locals "  and I am looking for a candidate who can join Immediately if you are interested, reply to me with your updated resume or if you could refer someone I would really appreciate it.

Position  : Site Reliability Engineer (SRE)

Location : Alpharetta GA  Onsite

Duration : 6+ Months

Visa Status : Any - No OPT

Need local to GA F2F Interview Required

Job Description:

Seeking an experienced Site Reliability Engineer who can operate independently with limited guidance and oversight. This individual will be passionate about end-user experience and will be part of a tight-knit, distributed engineering team developing and delivering a comprehensive data operations management solution for Client Data Fabric Platform. SRE is a critical role in the entire SDLC from coding, scaling, and ensuring production stability that includes responding to on-call incidents.

Data Fabric is a GCP cloud-native modern data management platform which allows Client to acquire and curate data, provide entity resolution, and ingest into a single environment. It is deployed globally in multiple regions, highly secured and complies with regional and internal regulatory controls with strict governance and oversight. Business units, Data Scientists and many other stakeholders use APIs to consume data managed by the Data Fabric and operate data exchanges to monetize data through B2B and B2C channels.

Data operations management solution consists of:

A web portal UI/UX that provides a single point of access to all data management and data reliability engineering

A suite of backend API services that services the UI and integrates with low-level Data Fabric and other third-party system APIs

Modern data lakehouse (data lake, data warehouse, batch and streaming ELT pipelines)

The data operations roadmap envisions a set of rich management capabilities including:

Serves a large community of geographically dispersed data operations stakeholders

Data quality and observability management to detect, alert, and prevent data anomalies

Troubleshooting, triaging and resolving data and data pipeline issues

OLAP, batch and streaming big data processing, and BI reporting

MLOps

Real-time dashboards, alerting and notifications, case management, user/group management, AuthZ, and many other foundational capabilities

Tech Stack

Frontend: Angular 17+, JavaScript, TypeScript, HTML, SCSS, Webpack Module Federation, Tailwinds CSS, Angular Material, Angular Elements

Backend: Java (JDK 17+), Spring Framework 6.X.X, Spring Boot 3.X.X, NestJS 10.X.X, REST and GraphQL microservices, NodeJS

Tools & Frameworks: Nx build management, Monorepo architecture, Jenkins CI/CD, Fortify, Sonar, GitHub

Cloud & Data: GCP (GKE, Composer + Airflow, Dataflow + Apache Beam, BigQuery, BigTable, Firestore, GCS, PubSub, Vertex AI), Terraform, Helm Charts, GitOps

Other Technologies: Websockets, SSE, event-driven architecture

Environment

Culture: Fast-paced, creative, results-oriented

Team Structure: Agile, working in 2-week sprints using Aha and Jira for project management

Expectations: Self-starters who can work independently with limited guidance, delivering solutions that end-users value and love

General Responsibilities

Contribute to Development Activities: SRE is expected to participate in SDLC activities that include design, develop, test, deploy, and operate, covering both frontend and backend

Cross-Functional Work: Collaborate with global teams to integrate with existing internal systems and GCP cloud

Issue Resolution: Triage and resolve product or system issues, ensuring quality and performance

Documentation: Write technical documentation, support guides, and run books

Agile Practices: Participate in sprint planning, retrospectives, and other agile activities

Compliance: Ensure software meets secure development guidelines and engineering standards

SRE Accountability

General: Use coding, automation, and software engineering principles to ensure scalability, performance, and reliability efficiently and toil-free

IAC: Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK)

CI/CD: Build CI/CD pipelines for build, test and deployment of application and cloud architecture patterns, using platform (Jenkins) and cloud-native toolchains

Automation: Build automated tooling to deploy service requests to push a change into production. Build runbooks that are comprehensive and detailed to manage detect, remediate and restore services

Change Management: Work closely with the dev team to ensure all DevSecOps issues are addressed timely, in compliance with Equifax security policies, and adherence to Engineering Handbook

Incident management: Solve problems and triage complex distributed architecture service maps. On call for high severity application incidents and improving run books to improve MTTR

RCA and postmortem: Lead root cause analysis and blameless postmortem and own the call to action to remediate recurrences

Customer Focus: Address service disruptions and downtime ensuring end-customer needs are met, and drive processes for a flawless customer experience ensuring

Reliability and Availability: Ensure monitoring of SRE golden signals, SLO, SLIs, and SLAs are honoured within error budgets. Work closely with devs, QE, POs, and other stakeholders providing continuous feedback on uptime, scalability, and reliability, and influence best practices with aim of providing excellent operational experiences

Reliability roadmap: Own the reliability roadmap by taking a holistic view of all data operations management capabilities that includes participating in Production Readiness Review (PRR), and working with stakeholders to ensure DR plans are in place

Must-Have Skills

General experience: 5-7 years of experience in software engineering, systems administration, database administration, and networking. System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes), and shell scripting

Cloud-Native Application Development: 3+ years. Solid experience with developing and supporting cloud-native applications. Experience with cloud-based security: IAM, AuthZ

End-user Application Experience: 3+ years experience as a SRE supporting an end-user facing application, e.g web/mobile/desktop app that includes UI, APIs, and backend systems

Development Experience: 2+ years of general proficiency with Java, or JavaScript/NodeJS

Frontend Experience: Experience with Angular, JavaScript, TypeScript, or modern web application development frameworks

Architecture Knowledge: Understanding of modular systems, performance, scalability, security

Agile Experience: Agile development mindset and experience

Service-Oriented Architecture: Knowledge of RESTful web services, JSON, AVRO

Application Troubleshooting: Debugging, performance tuning, production support

Documentation Skills: Strong written and verbal communication

General SDLC: Experience with CI/CD concepts and can use tools including Jenkins/Bamboo, and release management concepts. Understanding of GCP services related to big data like BigQuery, Dataflow, Pub/Sub,GCS, Composer/Airflow. Or, similar solutions in AWS: Redshift, SNS, SQS, S3, Kinesis and others

Nice-to-Have Skills

Big Data Processing: ETL/ELT experience

Scripting Languages: Groovy, Python

Cloud Certification: Relevant certifications in cloud technologies

Regards,

Alex . K

NAVTECH INC

(224) 348-1340

Alex@navtechusa.com

1600 Golf Road. Suite 1200, Rolling Meadows, IL 60008

www.Navtechusa.com

E-Verified Company

Keywords: cprogramm continuous integration continuous deployment artificial intelligence user interface user experience access management business intelligence sthree information technology Georgia Illinois 
URGENT NEED -----  Site Reliability Engineer (SRE)     Alpharetta GA  Onsite - Locals
alex@navtechusa.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 22

Location: Alpharetta, Georgia