ML AI Infrastructure Architect at Remote, Remote, USA |
Email: [email protected] |
From: saloni chaurasia, tekinspirations [email protected] Reply to: [email protected] Hi, I Hope you are doing great. Please find below position if you have any matching candidate as per requirement. Please send me updated resume with candidate information. ML/AI Infrastructure Architect Remote Need DL Visa and LinkedIn at the submission time Position Summary: Responsibilities: Architect and deploy robust, scalable GPU clusters for AI and ML workloads across multitenant and single-tenant infrastructures. Design and optimize InfiniBand networks and NVIDIA/Mellanox/Cumulus solutions to meet the high-performance requirements of diverse AI applications. Engage directly with clients to gather requirements, provide expert advice, and tailor infrastructure solutions that align with their specific AI and ML projects. Lead collaborative discussions with AI researchers, developers, and client stakeholders to ensure infrastructure capabilities are fully leveraged. Troubleshoot and resolve complex infrastructure issues, ensuring high availability and performance for all clients. Continuously assess and integrate new technologies and methodologies to enhance the infrastructures capabilities and efficiency. Develop comprehensive documentation and training materials for clients, enhancing their understanding and effective use of the infrastructure. Conduct workshops and training sessions for clients, focusing on best practices for AI and ML infrastructure utilization. Uphold the highest standards of data protection and comply with all regulatory requirements, ensuring a secure environment for client data. Required Qualifications: Experience with or knowledge of terabyte/petabyte SAN and data pipelines. Deep knowledge of DGX platform details e.g. SuperPOD/BasePOD differences. Proven experience in managing multitenant infrastructures, particularly for AI/ML workloads. Expert knowledge of NVIDIA/Mellanox/Cumulus networking technologies, Infiniband architecture, and GPU cluster management. Exceptional problem-solving skills and the ability to adapt solutions to meet individual client needs. Strong consultative skills with a focus on client engagement and stakeholder management. Excellent communication and interpersonal skills, with the ability to convey technical concepts to non-technical audiences. Experience in training and mentorship, with the ability to co-develop and deliver educational content for clients. Preferred Qualifications: Certifications relevant to network engineering, system administration, or cybersecurity. Knowledge of TensorFlow, PyTorch, and other frameworks. Knowledge of vector, graph, and traditional OLAP/OLTP databases. Knowledge of Snowflake, Databricks, and other data warehousing products. Knowledge of applied AI data analysis and traditional data analytics. Knowledge in non-NVIDIA networking vendors i.e. Juniper, Cisco, Nokia. Experience with cloud services and technologies relevant to AI and ML deployment. Knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes.) Interview Process: One phase interview with EVP of Engineering and Operations, as well as additional engineers to validate skillset Regards, Saloni Chaurasia { Technical Recruiter } TEK Inspirations LLC Pvt. Ltd. | 13573 Tabasco Cat Trail, Frisco, TX 75035, United States E-Mail: [email protected] Keywords: artificial intelligence machine learning Colorado Texas ML AI Infrastructure Architect [email protected] |
[email protected] View all |
Wed Apr 10 02:34:00 UTC 2024 |