Data Engineer, 100% Remote at Remote, Remote, USA |
Email: [email protected] |
From: Abhishek, StellentIT [email protected] Reply to: [email protected] Data Engineer, 100% Remote 12+ Months Phone + Skype Responsibilities: Day to day: This role will focus on workflow management for data sets in the genomic space. Ability to create workflow pipelines by way of data management and engineering. Utilize Python, AWS, and Kubernetes to design, develop, optimize, and maintain scalable bioinformatics workflows for processing and analyzing large-scale genomics datasets in the cloud and in-house Include a flexible modular architecture into the workflows to enable the exchange of analysis components and different algorithms Implement the bioinformatics data processing pipelines using workflow management tools and programming languages such as Python Work with team members to perform quality control and validation of pipelines to ensure accuracy and reproducibility of results Document the development processes, including code, workflows, data flow diagrams, and standard operating procedures, following software development and DataOps best practices Qualifications: (Recruiter) Must Haves: Bachelors or higher in Engineering (prefer someone outside of Biology/ sciences) Open on years experience as long as they have the following: Ability to create robust & scalable data-workflows/ pipelines Python AWS Kubernetes Plus Haves: Life Sciences/ Bioinformatics/ Genomics background Perfect fit: Disqualifiers: Interview Process: (Account Manager) 45 minute skills assessment candidate will access a Github file/ doc. Focused in Python skills Teams w/ Maurcio, panel w/ his boss and 2 engineers on the team Ending Questions: (Account Manager) When can we put some time in calendar to walk through candidates we are coming across Thurs Are there any other recruiting companies or internal HR working on this role Yes LOTS Are there any other people in process No. Will be setting up interviews next week Project Scope and Brief Description: The position is for work in the bioinformatics space, principally writing new and/or maintaining existing bioinformatics workflows and pipelines such as an Eukaryote Genome Annotation Pipeline. As such the role requires knowledge of Cloud technologies (AWS, Kubernetes, Container orchestration) as well as experience with industry-level scientific workflow management. Responsibilities: Design, develop, optimize, and maintain scalable bioinformatics workflows for processing and analyzing large-scale genomics datasets in the cloud and in-house Include a flexible modular architecture into the workflows to enable the exchange of analysis components and different algorithms Implement the bioinformatics data processing pipelines using workflow management tools and programming languages such as Python Work with team members to perform quality control and validation of pipelines to ensure accuracy and reproducibility of results Document the development processes, including code, workflows, data flow diagrams, and standard operating procedures, following software development and DataOps best practices Skills / Experience: Required Qualifications Previous experience developing industrial scale scientific data workflows. Strong programming skills in Python including libraries for Data Science such as NumPy, Pandas, NetworkX, matplotlib, etc. Working knowledge of container technologies (such as Docker, ContainerD, or Podman) and container orchestration. Experience with data pipeline tools (like Argo, Ray, AirFlow, Redun or NextFlow). Familiarity with the AWS platform (IAM, EC2, S3, CloudWatch, Spot instances) and Kubernetes, EKS, ECS, AWS Batch or other Cloud compute architectures. Ability to work both independently and collaboratively with good communication skills. Interest in learning new technologies Preferred Qualifications Specific experience analyzing large genomics datasets Familiarity with common bioinformatics tools and datatypes for the analysis of NextGen sequencing data Familiarity with statistical analysis methods and tools commonly used in bioinformatics analysis such as Gene Expression or ChIPSeq Knowledge of any additional programming languages such as C, Rust, Perl, R, Unix Shell or others Keywords: cprogramm sthree rlang Data Engineer, 100% Remote [email protected] |
[email protected] View all |
Tue Jun 18 19:24:00 UTC 2024 |