Home

Data Engineer, 100% Remote at Remote, Remote, USA
Email: [email protected]
From:

Abhishek,

StellentIT

[email protected]

Reply to:   [email protected]

Data Engineer,

100% Remote

12+ Months

Phone + Skype

Responsibilities:
Day to day:

This role will focus on workflow management for data sets in the genomic space. Ability to create workflow pipelines by way of data management and engineering.
Utilize Python, AWS, and Kubernetes to design, develop, optimize, and maintain scalable bioinformatics workflows for processing and analyzing large-scale genomics datasets in the cloud and in-house
Include a flexible modular architecture into the workflows to enable the exchange of analysis components and different algorithms
Implement the bioinformatics data processing pipelines using workflow management tools and programming languages such as Python
Work with team members to perform quality control and validation of pipelines to ensure accuracy and reproducibility of results
Document the development processes, including code, workflows, data flow diagrams, and standard operating procedures, following software development and DataOps best practices

Qualifications: (Recruiter)
Must Haves:
Bachelors or higher in Engineering (prefer someone outside of Biology/ sciences)  
Open on years experience as long as they have the following:
Ability to create robust & scalable data-workflows/ pipelines
Python
AWS
Kubernetes

Plus Haves:
Life Sciences/ Bioinformatics/ Genomics background
Perfect fit:

Disqualifiers:

Interview Process: (Account Manager)
45 minute skills assessment candidate will access a Github file/ doc. Focused in Python skills
Teams w/ Maurcio, panel w/ his boss and 2 engineers on the team

Ending Questions: (Account Manager)
When can we put some time in calendar to walk through candidates we are coming across Thurs
Are there any other recruiting companies or internal HR working on this role Yes LOTS
Are there any other people in process No. Will be setting up interviews next week

Project Scope and Brief Description:

The position is for work in the bioinformatics space, principally writing new and/or maintaining existing bioinformatics workflows and pipelines such as an Eukaryote Genome Annotation Pipeline. As such the role requires knowledge of Cloud technologies (AWS, Kubernetes, Container orchestration) as well as experience with industry-level scientific workflow management.

Responsibilities:
Design, develop, optimize, and maintain scalable bioinformatics workflows for processing and analyzing large-scale genomics datasets in the cloud and in-house
Include a flexible modular architecture into the workflows to enable the exchange of analysis components and different algorithms
Implement the bioinformatics data processing pipelines using workflow management tools and programming languages such as Python
Work with team members to perform quality control and validation of pipelines to ensure accuracy and reproducibility of results
Document the development processes, including code, workflows, data flow diagrams, and standard operating procedures, following software development and DataOps best practices

Skills / Experience:

Required Qualifications
Previous experience developing industrial scale scientific data workflows.
Strong programming skills in Python including libraries for Data Science such as NumPy, Pandas, NetworkX, matplotlib, etc. 
Working knowledge of container technologies (such as Docker, ContainerD, or Podman) and container orchestration.
Experience with data pipeline tools (like Argo, Ray, AirFlow, Redun or NextFlow).
Familiarity with the AWS platform (IAM, EC2, S3, CloudWatch, Spot instances) and Kubernetes, EKS, ECS, AWS Batch or other Cloud compute architectures.
Ability to work both independently and collaboratively with good communication skills. Interest in learning new technologies

Preferred Qualifications
Specific experience analyzing large genomics datasets
Familiarity with common bioinformatics tools and datatypes for the analysis of NextGen sequencing data
Familiarity with statistical analysis methods and tools commonly used in bioinformatics analysis such as Gene Expression or ChIPSeq
Knowledge of any additional programming languages such as C, Rust, Perl, R, Unix Shell or others

Keywords: cprogramm sthree rlang
Data Engineer, 100% Remote
[email protected]
[email protected]
View all
Tue Jun 18 19:24:00 UTC 2024

To remove this job post send "job_kill 1490036" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,