HPC engineer | architect || New York City, NY(Hybrid - 2 days a week, 3 days remote) at York, New York, USA |
Email: pavish.b@vdartinc.com |
UID: dd1dd941648c4892835306dd684e8841 From: Pavish, VDart inc pavish.b@vdartinc.com Reply to: pavish.b@vdartinc.com Hi, Job Title: HPC engineer / architect Location: New York City, NY(Hybrid - 2 days a week, 3 days remote) Duration: Contract Job Description: Support day-to-day operations of large-scale parallel file systemsN Deploy and Maintain Linux HPC infrastructure across multiple datacenters Assist HPC engineers and architects with day-to-day operations and tickets Skills Experience working in a large-scale research based HPC environment Proven experience working with distributed file storage solutions (i.e., GPFS) Experience with deploying and troubleshooting Linux Operating Systems (RHEL/CentOS) Experience with Scripting and Automation (Ansible, Python, Scripting) Solid understanding of job schedulers (LSF/SLURM) Experience with GPU-based compute infrastructure (including CUDA) HPC engineer/architect Responsibilities: Design, architect and oversee implementation of Linux based HPC clusters and storage Deploy physical hardware using HPC deployment tools and configuration and orchestration tools (Ansible) Parallel file system (GPFS) performance tuning, monitoring and troubleshooting Perform systems benchmarking, and developing automated tests for the HPC environment, ensuring the reliability and efficiency of our computational infrastructure Infiniband network maintenance and troubleshooting Automate and monitor the HPC user lifecycle process Slurm installation, configuration, performance tuning and troubleshooting Plan, design and implement a transition from the LSF scheduler to Slurm Manage the Slurm scheduler and translate Research policies into scheduler configurations Consult with faculty and students to develop research pipelines for use on the HPC cluster Develop and maintain user lifecycle software suite in Python, implement CI/CD pipeline Test and automate upgrades of critical system applications using Ansible and scripts. The ability to communicate effectively with clinicians, researchers, and other team members to develop technological solutions is key Thanks and regards, Keywords: continuous integration continuous deployment New York HPC engineer | architect || New York City, NY(Hybrid - 2 days a week, 3 days remote) pavish.b@vdartinc.com |
pavish.b@vdartinc.com View All |
01:25 AM 12-Feb-25 |