with NVIDIA GPU passthrough*, obviously.
7 min readJust now
–
Press enter or click to view image in full size
I’m running Proxmox 9 on an Intel i5–12400 with an RTX 3050 8GB. As AI/LLM rigs go, it’s not that impressive, but it’s enough to play with smaller models for basic text generation, rudimentary code completion, and many embedding models, without sending my data to third-party APIs. And running vLLM in an LXC allows me to share that one GPU with other services (e.g. Immich, Jellyfin, Frigate).
Setting up vLLM isn’t necessarily difficult but, even after doing it several times, it can be tedious. These are the notes I’ve taken along the way, if only for my own reference next weekend.
**️⃣ ********Pedantically, this isn’t GPU passthrough, it’s NVIDIA GPU access vi…
with NVIDIA GPU passthrough*, obviously.
7 min readJust now
–
Press enter or click to view image in full size
I’m running Proxmox 9 on an Intel i5–12400 with an RTX 3050 8GB. As AI/LLM rigs go, it’s not that impressive, but it’s enough to play with smaller models for basic text generation, rudimentary code completion, and many embedding models, without sending my data to third-party APIs. And running vLLM in an LXC allows me to share that one GPU with other services (e.g. Immich, Jellyfin, Frigate).
Setting up vLLM isn’t necessarily difficult but, even after doing it several times, it can be tedious. These are the notes I’ve taken along the way, if only for my own reference next weekend.
**️⃣ ********Pedantically, this isn’t GPU passthrough, it’s NVIDIA GPU access via device passthrough (not PCIe/VFIO passthrough). Accurate or not, “GPU passthrough” is how this is often referenced. Besides, the title was already too long.
Installing NVIDIA Drivers on the Proxmox Host
First, a huge shout out to Yomi for writing Using an Nvidia GPU with Proxmox LXC*, *which got me through my initial set up. My process has evolved over the last few months but the underlying concepts are the same.
The NVIDIA drivers need to be installed on the Proxmox host, but first pve-headers needs to be installed. These are the kernel headers for the exact Proxmox kernel in use, and they’re required for the NVIDIA driver to build kernel modules during installation. The driver can’t compile or load properly without matching headers.
apt install pve-headers-$(uname -r)
For the next step we’ll need the graphics card model. I use lspci to confirm which GPU is in this Proxmox node. On my machine, both the integrated GPU and NVIDIA GPU (RTX 3050) show up like this:
lspci | grep VGA00:02.0 VGA compatible controller: Intel Corporation Alder Lake-S GT1 [UHD Graphics 730] (rev 0c)01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [Geforce RTX 3050] (rev a1)
Now it’s time to find the latest drivers from NVIDIA. *(I’m using version *580.95.05* here, but make sure to find the correct driver for your GPU.) *Copy the link to the driver installer, use wget to download it, then make it executable and run the installer with the dkms flag.
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/580.95.05/NVIDIA-Linux-x86_64-580.95.05.runchmod +x NVIDIA-Linux-x86_64-580.95.05.run./NVIDIA-Linux-x86_64-580.95.05.run --dkms
When prompted for the kernel module type select “NVIDIA Proprietary”. From there, all of the default options should be fine.
Press enter or click to view image in full size
Select the NVIDIA Proprietary driver
After the driver installation completes, I run nvidia-smi to verify the driver is fully installed. If it doesn’t immediately work, try rebooting.
Press enter or click to view image in full size
Screenshot of my nvidia-smioutput on the Proxmox host
ℹ️ Fun fact,
*nvidia-smi*stands for NVIDIA System Management Interface, and it is part of the NVIDIA Management Library (NVML).
Setting up a new Ubuntu LXC for vLLM
If you you want to add GPU passthrough to an existing LXC, jump to the next section.
For spinning up a new LXC, Proxmox VE Helper-Scripts is an amazing resource, and I use their script for creating an Ubuntu LXC. It handles the creation of the LXC and even the GPU passthrough through a simple configuration wizard. Run the script and, when prompted, select “Advanced Install”.
Many of the defaults are fine, but these are the options that definitely need to change:
Confirmation screen with selected options
- HOSTNAME: Okay, I lied. You don’t have to change this one but you probably should. I named mine “vllm”.
- DISK SIZE: 32GB
- CPU CORES: 4
- RAM SIZE: 8GiB (8192 MiB)
- GPU PASSTHROUGH: <Yes> “Which GPU type to passthrough?”
In my case, the installer detects the integrated GPU from my Intel CPU and asks “which GPU type to passthrough?” Type nvidia and hit enter. Once the script is done running, a new LXC will have access to the GPU but it won’t be able to use it just yet.
⚠️ Aside: While setting this up I ran into a bug with the installer. It would create the base LXC but then get stuck on a “no network in LXC” error, eventually failing and terminating the installation. Turns out, this was related to running Tailscale on the Proxmox host. I summarized my findings and the possible fixes in community-scripts/ProxmoxVE#8706.
Manually configuring an existing LXC
GPU passthrough can also be configured manually by updating the LXC config file. It should be as simple as appending these six lines:
nano /etc/pve/lxc/{lxc id}.conf
dev0: /dev/nvidia0,gid=44dev1: /dev/nvidiactl,gid=44dev2: /dev/nvidia-uvm,gid=44dev3: /dev/nvidia-uvm-tools,gid=44dev4: /dev/nvidia-caps/nvidia-cap1,gid=44dev5: /dev/nvidia-caps/nvidia-cap2,gid=44
Aside: Initially, I set this up using raw
lxc.*directives for cgroups and bind mounts. After looking at the configuration generated by Proxmox Helper Scripts, I switched to usingdev*entries instead, which are more robust and resolved a GPU passthrough issue I was seeing after container and host restarts.
Installing NVIDIA libraries on the LXC
Once GPU passthrough is set up, the NVIDIA user-space libraries need to be installed on the LXC. From the Proxmox host, use pct push to copy the installer to the LXC filesystem. Substitute your Container ID and driver filename, of course:
pct push {lxc id} ./NVIDIA-Linux-x86_64-580.95.05.run /root/NVIDIA-Linux-x86_64-580.95.05.run
Now, from the LXC shell, make the driver installer executable, and run it with the --no-kernel-module flag. When done, nvidia-smi should now work in the LXC too. (If not, try rebooting the LXC.)
cd /rootchmod +x NVIDIA-Linux-x86_64-580.95.05.run ./NVIDIA-Linux-x86_64-580.95.05.run --no-kernel-modulenvidia-smi
Press enter or click to view image in full size
Screenshot of my nvidia-smi output on the LXC
Excellent. We now have an Ubuntu LXC on Proxmox 9 with NVIDIA GPU Passthrough (aka *device *passthrough).
Install vLLM
From this point forwards, all commands should be run in the LXC shell. First, a few more packages need to be installed. It’s likely going to take several minutes, so maybe take the opportunity to get up, move around, stretch.
apt updateapt install -y pkg-config build-essential python3-dev nvidia-cuda-toolkit
vLLM is built on Python, so the next step is to install uv.
ℹ️ **uv **is “an extremely fast Python package and project manager, written in Rust” and “a single tool to replace
pip,pip-tools,pipx,poetry,pyenv,twine,virtualenv, and more”.
For the most up to date steps, head over to the uv site for the installation instructions. Like the Ubuntu LXC installer, they also have a one-liner install option.
uv installation script
I followed the instructions to install uv and restart the shell, then confirmed it was successfully installed by checking the version of uv.
uv --version
After installing uv, I created a new Python environment and used it to install vLLM. I verified it was successfully installed by checking the version of vLLM. This install is likely to also take several minutes. Drink water.
uv venv --python 3.12 --seeduv pip install vllm --torch-backend=autouv run vllm --version0.12.0
That’s it! Now vLLM is installed!
Hello world! (CLI)
To see it in action, I wrote a short python script to call an LLM. There are thousands of text generation models available on Hugging Face that work with vLLM, and for this test I chose openai-community/gpt2, just because its pretty small.
nano hello.py
from vllm import LLM, SamplingParamsprint ( LLM("openai-community/gpt2", seed=5) .generate("Hello world!", SamplingParams(max_tokens=50, temperature=0.3, stop=["."]))[0] .outputs[0] .text)
Save that file and then run it in the uv environment.
uv run hello.py
The first time this is run it will automatically download the requested model, which may take some time. Eventually, the script should end with a response from the model.
Hello world! (API)
vLLM also ships with an OpenAI-compatible server. The server can be started by running this from the command line:
uv run vllm serve openai-community/gpt2
And use curl to call the API:
curl http://{ip address}:8000/v1/completions \-H "Content-Type: application/json" \-d '{ "model": "openai-community/gpt2", "prompt": "Hello world!", "max_tokens": 50, "temperature": 0.3, "stop": ["."], "seed": 5 }' | python3 -m json.tool
{ "id": "***", "object": "text_completion", "created": 1765777169, "model": "openai-community/gpt2", "choices": [ { "index": 0, "text": "\n\nThis is a new chapter for me, and I'm so happy to finally be able to share it with you", "logprobs": null, "finish_reason": "stop", "stop_reason": ".", "token_ids": null, "prompt_logprobs": null, "prompt_token_ids": null } ], "service_tier": null, "system_fingerprint": null, "usage": { "prompt_tokens": 3, "total_tokens": 28, "completion_tokens": 25, "prompt_tokens_details": null }, "kv_transfer_params": null}
Wrap Up
One more thing. When configuring an LLM it can be helpful to see what the GPU is doing. In addition to nvidia-smi, I also like to leverage nvtop to get real time stats on GPU usage. I’ve found it to be really helpful at dialing in exact token limits, plus its just fun to see each API call happening in real time.
Press enter or click to view image in full size
That’s it! Notice any bugs? Have a better way of doing this? Did this help you build something cool? Let me know!