Click any tag below to further narrow down your results
Links
This article explains Slonk, a system developed at Character.ai that combines SLURM and Kubernetes to manage GPU research clusters effectively. It addresses the challenges of providing a reliable scheduling environment for researchers while maintaining the operational benefits of Kubernetes. The open-source snapshot offers tools and configurations for others to implement similar systems.
Together Instant GPU Clusters offer self-service access to high-performance NVIDIA GPU clusters for AI workloads, enabling teams to deploy resources quickly without long-term commitments. The service supports Kubernetes and Slurm for orchestration and provides optimized networking with NVIDIA Quantum-2 InfiniBand and NVLink for enhanced performance. Customers have full control over their software environment and can easily provision clusters for short-term projects.