Home

AI Infrastructure Engineer at San Diego, CA Onsite at Remote, Remote, USA
Email: [email protected]
From:

Jaya Krishna,

3K Technologies

[email protected]

Reply to:   [email protected]

AI Infrastructure Engineer

Location: On-site (in San Diego, CA)

Long Term

Job Summary:

We are seeking an experienced AI Infrastructure Engineer to set up and manage GPU hardware in our data center, optimizing it for AI workloads and high-performance computing. This role involves designing, implementing, and maintaining the infrastructure needed to support large-scale machine learning models and AI applications. The ideal candidate will have a strong background in GPU architecture, data center operations, and cloud infrastructure, with hands-on experience configuring and managing high-performance GPUs for AI.

Key Responsibilities:

Design and implement scalable GPU-based infrastructure for AI/ML workloads in a data center environment.

Configure, install, and maintain GPU clusters and nodes, ensuring optimal performance and resource allocation.

Set up GPU hardware, firmware, and software layers (drivers, libraries, frameworks like CUDA, cuDNN, and TensorRT).

Collaborate with AI/ML teams to understand workload requirements and tailor infrastructure for performance and efficiency.

Monitor and manage GPU performance, resource usage, and scalability to support AI operations.

Implement solutions for GPU orchestration and job scheduling (e.g., Kubernetes, Slurm).

Ensure network connectivity, security, and redundancy for seamless GPU operations in the data center.

Troubleshoot hardware and software issues related to GPUs and provide support for system upgrades and maintenance.

Optimize power consumption, cooling, and resource utilization in the data center.

Document infrastructure setup, configurations, and standard operating procedures.

Required Skills and Qualifications:

Bachelors degree in Computer Science, Electrical Engineering, or a related field.

3+ years of experience in GPU infrastructure design, implementation, and management in data centers.

Expertise with GPU hardware (e.g., NVIDIA, AMD), parallel computing, and AI frameworks (e.g., TensorFlow, PyTorch).

Strong knowledge of GPU programming models like CUDA, and experience with GPU performance tuning.

Familiarity with cloud-based infrastructure (AWS, GCP, Azure) and hybrid cloud architectures.

Experience with containerization, orchestration tools (Kubernetes, Docker), and distributed computing.

Knowledge of networking protocols, data center infrastructure, power management, and cooling systems.

Excellent troubleshooting, problem-solving, and communication skills.

Preferred Qualifications:

Experience with AI/ML workload optimization and deep learning pipeline support.

Knowledge of storage solutions and distributed file systems for AI datasets.

Familiarity with automation tools (Ansible, Terraform) for infrastructure provisioning and management.

Certifications in data center management or cloud platforms.

Thanks & Regards 

Jaya Krishna

1114 Cadillac Ct, Milpitas, CA 95035

www.3ktechnologies.com

  | 

[email protected]

Gmail:Jayakrishnatalasila Yahoo:jaya3kt          

Analytics | BI | Big Data | Cloud |Software Engg.

t: +1 (408)713-6640

Keywords: artificial intelligence machine learning business intelligence information technology California Connecticut
AI Infrastructure Engineer at San Diego, CA Onsite
[email protected]
[email protected]
View all
Tue Oct 29 22:21:00 UTC 2024

To remove this job post send "job_kill 1888275" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,