AI/Machine Learning & HPC Specialization) - Remote at Remote, Remote, USA |
Email: [email protected] |
(AI/Machine Learning & HPC Specialization) Type: Contract Location: USA Remote Key Responsibilities: Design, implement, and manage cloud-based infrastructure that supports AI/ML workflows. Collaborate with data scientists and ML engineers to deploy scalable machine learning models into production. Ensure the security, scalability, and reliability of AI/ML systems in the cloud. Optimize cloud resources for cost-effective and efficient use. Stay current with the latest in cloud services, AI/ML tools, and industry best practices. Provide technical leadership and guidance in cloud and AI/ML architecture. Develop and maintain CI/CD pipelines for AI/ML model training and deployment. Monitor and troubleshoot AI/ML applications and cloud environments. Document system design and operational procedures. Collaborate with AI/ML and HPC teams to understand their computing and storage needs. Qualifications: Bachelors or Masters degree in Computer Science, Engineering, or related field. Proven experience in cloud computing (AWS, Azure, GCP) and cloud architecture. Strong background in AI/ML technologies, with experience in depl oying ML models. Proficiency in scripting languages (Python, Bash) and containerization technologies (Docker, Kubernetes). Proficiency with virtual compute environments (EC2). Hands-on experience with High Performance Computing (HPC) and server node Cluster Management Strong Knowledge of Linux/Unix operating systems (RHEL/Ubuntu) Experience with job schedulers (like SLURM, PBS), resource management, and system monitoring tools (DynaTrace). Understanding of storage solutions and file systems used in HPC (such as Lustre, GPFS). Experience with infrastructur e as code (IaC) tools like Terraform or CloudFormation. Knowledge of networking, security, and database technologies in a cloud environment. Excellent problem-solving, communication, and team collaboration skills. Preferred Skills: Familiarity with machine learning frameworks (TensorFlow, PyTorch) and data pipelines. Certifications in cloud architecture (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, etc.). Experience in an Agile development environment. Prior work with distributed computing and big data technologies (Hadoop, Spark). Operational experience running large scale platforms, including AI/ML platforms -- Thanks and Regards,Praveen J Email Address - [email protected] Talent Acquisition Specialist http://adepttechservices.com 11340 Lakefield Dr., Suite 200, Johns Creek, GA 30097 Ph: (678)-785-3342 -- Keywords: continuous integration continuous deployment artificial intelligence machine learning information technology Georgia |
[email protected] View all |
Fri Feb 02 22:04:00 UTC 2024 |