Home

AI Infrastructure Engineer at Remote, Remote, USA
Email: [email protected]
From:

Vijay,

3ktechnologies

[email protected]

Reply to:   [email protected]

Role: AI Infrastructure Engineer

Location:  San Diego (Onsite)

Job Summary:

We are seeking an experienced AI Infrastructure Engineer to set up and manage GPU hardware in our data center, optimizing it for AI workloads and high-performance computing. This role involves designing, implementing, and maintaining the infrastructure needed to support large-scale machine learning models and AI applications. The ideal candidate will have a strong background in GPU architecture, data center operations, and cloud infrastructure, with hands-on experience configuring and managing high-performance GPUs for AI.

Key Responsibilities:

Design and implement scalable GPU-based infrastructure for AI/ML workloads in a data center environment.

Configure, install, and maintain GPU clusters and nodes, ensuring optimal performance and resource allocation.

Set up GPU hardware, firmware, and software layers (drivers, libraries, frameworks like CUDA, cuDNN, and TensorRT).

Collaborate with AI/ML teams to understand workload requirements and tailor infrastructure for performance and efficiency.

Monitor and manage GPU performance, resource usage, and scalability to support AI operations.

Implement solutions for GPU orchestration and job scheduling (e.g., Kubernetes, Slurm).

Ensure network connectivity, security, and redundancy for seamless GPU operations in the data center.

Troubleshoot hardware and software issues related to GPUs and provide support for system upgrades and maintenance.

Optimize power consumption, cooling, and resource utilization in the data center.

Document infrastructure setup, configurations, and standard operating procedures.

Required Skills and Qualifications:

Bachelors degree in Computer Science, Electrical Engineering, or a related field.

3+ years of experience in GPU infrastructure design, implementation, and management in data centers.

Expertise with GPU hardware (e.g., NVIDIA, AMD), parallel computing, and AI frameworks (e.g., TensorFlow, PyTorch).

Strong knowledge of GPU programming models like CUDA, and experience with GPU performance tuning.

Familiarity with cloud-based infrastructure (AWS, GCP, Azure) and hybrid cloud architectures.

Experience with containerization, orchestration tools (Kubernetes, Docker), and distributed computing.

Knowledge of networking protocols, data center infrastructure, power management, and cooling systems.

Excellent troubleshooting, problem-solving, and communication skills.

Preferred Qualifications:

Experience with AI/ML workload optimization and deep learning pipeline support.

Knowledge of storage solutions and distributed file systems for AI datasets.

Familiarity with automation tools (Ansible, Terraform) for infrastructure provisioning and management.

Certifications in data center management or cloud platforms.

Thanks & Regards

Vijay Kumar M

Technical Recruiter 

3K Technologies, LLC 

www.3ktechnologies.com    
[email protected]

Keywords: artificial intelligence machine learning information technology
AI Infrastructure Engineer
[email protected]
[email protected]
View all
Tue Oct 29 23:34:00 UTC 2024

To remove this job post send "job_kill 1888796" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 1

Location: ,