Full tutorial: https://www.youtube.com/watch?v=yOj9PYq3XYM
Finally NVFP4 models has arrived to ComfyUI thus SwarmUI with CUDA 13. NVFP4 models are literally 100%+ faster with minimal impact on quality. I have done grid quality comparison to show you the difference on FLUX 2, Z Image Turbo and FLUX 1 of NVFP4 versions. To make CUDA 13 work, I have compiled Flash Attention, Sage Attention & xFormers for both Windows and Linux with all of the CUDA archs to support literally all GPUs starting from GTX 1650 series, RTX 2000, 3000, 4000, 5000 series and more.
In this full tutorial, I will show you how to upgrade your ComfyUI and thus SwarmUI to use latest CUDA 13 with latest libraries and Torch 2.9.1. Moreover, our compiled libraries…
Full tutorial: https://www.youtube.com/watch?v=yOj9PYq3XYM
Finally NVFP4 models has arrived to ComfyUI thus SwarmUI with CUDA 13. NVFP4 models are literally 100%+ faster with minimal impact on quality. I have done grid quality comparison to show you the difference on FLUX 2, Z Image Turbo and FLUX 1 of NVFP4 versions. To make CUDA 13 work, I have compiled Flash Attention, Sage Attention & xFormers for both Windows and Linux with all of the CUDA archs to support literally all GPUs starting from GTX 1650 series, RTX 2000, 3000, 4000, 5000 series and more.
In this full tutorial, I will show you how to upgrade your ComfyUI and thus SwarmUI to use latest CUDA 13 with latest libraries and Torch 2.9.1. Moreover, our compiled libraries such as Sage Attention works with all models on all GPUs without generating black images or videos such as Qwen Image or Wan 2.2 models. Hopefully LTX 2 presets and tutorial coming soon too. Finally, I introduce a new private cloud GPU platform called as SimplePod like RunPod. This platform has all the features of RunPod same way but much faster and cheaper.
📂 Resources & Links:
ComfyUI Installers: [ https://www.patreon.com/posts/ComfyUI-Installers-105023709 ]
SimplePod: [ https://simplepod.ai/ref?user=secourses ]
SwarmUI Installer, Model Auto Downloader and Presets: [ https://www.patreon.com/posts/SwarmUI-Install-Download-Models-Presets-114517862 ]
How to Use SwarmUI Presets & Workflows in ComfyUI + Custom Model Paths Setup for ComfyUI & SwarmUI Tutorial: [ https://youtu.be/EqFilBM3i7s ]
SECourses Discord Channel for 7/24 Support: [ https://discord.com/invite/software-engineering-courses-secourses-772774097734074388 ]
NVIDIA NVFP4 Blog Post More: [ https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/ ]
⏱️ Video Chapters:
- 00:00:00 New ComfyUI installer (CUDA 13, Torch 2.9.1, Triton + attention libs)
- 00:00:19 NVFP4 speedup claims vs real tests; why CUDA 13 enables new models
- 00:00:34 Prebuilt FlashAttention/SageAttention/xFormers for many GPUs (Windows + Linux)
- 00:01:00 Quality roadmap: FLUX2 Dev, Z Image Turbo, FLUX Dev (BF16/FP8/GGUF/NVFP4)
- 00:01:23 Downloader adds NVFP4: FLUX2 Dev, FLUX Dev (Context/Dev), Z Image Turbo
- 00:01:51 SimplePod AI intro: RunPod-style pods, cheaper rates, permanent storage
- 00:02:36 Musubi Tuner FP8 Scaled: quality myths vs GGUF + why scaled matters
- 00:03:10 Quantization & precision (FP32/BF16/FP8/GGUF) + Qwen3 low-VRAM encoders
- 00:03:34 ComfyUI v73 zip: CUDA 13 included; update NVIDIA drivers only (v72 deprecated)
- 00:04:13 Update steps: overwrite zip, delete venv, run install/update .bat
- 00:05:02 Python: 3.10 recommended (supports 3.10-3.13); fresh vs update
- 00:06:02 New installer flow: uv speed, standalone use, backend libs detected
- 00:07:12 Stability flags: –cache-none vs –disable-smart-memory (OOM/stuck fixes)
- 00:07:54 SwarmUI presets: 32 presets supported; drag/drop + auto model downloader
- 00:08:25 Update SwarmUI model-downloader zip (extract + overwrite)
- 00:08:49 Download bundles/models (Z Image Turbo Core + NVFP4 options)
- 00:09:25 Update/launch SwarmUI; point to updated ComfyUI backend + set args
- 00:10:32 Live gen test: Z Image Turbo BF16 @1536x1536
- 00:11:29 Switch to NVFP4: VRAM cache behavior; 1024x1024
- 00:12:36 FLUX2 Dev quality: FP8 Scaled vs NVFP4 side-by-side comparisons
- 00:13:33 Speed chart: FLUX2 NVFP4 about 193% faster than FP8 Scaled
- 00:14:10 Z Image Turbo quality: BF16 vs NVFP4 vs FP8 Scaled (quant method)
- 00:15:25 FLUX Dev: FP8 Scaled approx GGUF Q8; NVFP4 currently shows degradation
- 00:16:45 What precision means + model size examples (FP32/BF16/FP8 Scaled/NVFP4)
- 00:18:07 Practical recommendations: BF16 best; avoid FP16; raw FP8 vs FP8 Scaled
- 00:19:43 GGUF explained: block quant, slower runtime; use only when RAM is too low
- 00:21:36 Precision hierarchy recap + when to pick FP8 mixed/scaled over GGUF
- 00:21:58 SimplePod setup: register, add credits, open template link
- 00:22:31 Template config + RunPod price comparison (disk, ports, GPU selection)
- 00:24:02 Persistent volume: create + mount to /workspace
- 00:25:11 Launch RTX Pro 6000 pod; SimplePod vs RunPod pricing differences
- 00:26:29 Temp vs persistent disk: deleting instance wipes temp data - backup!
- 00:26:55 JupyterLab: upload zips, apt install zip, unzip ComfyUI in workspace
- 00:27:48 Run install script; unzip SwarmUI; start the model downloader
- 00:29:02 Downloader path for ComfyUI + folder structure; download Z Image Turbo bundle
- 00:30:08 Start ComfyUI; confirm CUDA 13 + Torch 2.9.1; connect via port 3000 Direct
- 00:31:08 Preset demo: Z Image Turbo Quality 1; fix VAE path; monitor VRAM
- 00:33:18 File Browser Direct: download outputs/models fast; upload files back
- 00:34:41 Restart server; install/start SwarmUI; open Cloudflared URL
- 00:36:26 SwarmUI backend: /workspace/ComfyUI/main.py + args; import presets
- 00:37:27 Download FLUX2 Core + NVFP4; share model paths between SwarmUI & ComfyUI
- 00:39:27 FLUX2 NVFP4 generation @2048x2048; VRAM usage + step speed
- 00:40:43 Cloud GPU pitfall: diagnosing a power-capped GPU
- 00:41:28 Resume: re-run template w/ volume; reconnect fast
- 00:45:02 Wrap-up: SimplePod pros (direct/secure, cheaper storage)