- 15 Nov, 2025 *
I recently had to build a system to share two Nvidia L40S GPUs (each with 48 GB of memory) among a group of users. The L40S does not support MIG (Multi-instance GPU) as it was designed as a PCIe-only GPU. Therefore, hardware slicing an individual GPU was not an option. Nvidia’s vGPU was another option, but it would require separate per-GPU or per-user licensing, which did not make sense for a small academic setup.
How can I share a couple of GPUs with a small group of users without MIG, using vGPU, and at the same time without giving everyone root access on the host?
The TLDR answer is:
Genv for soft GPU partitioning and enforcement
Docker for per-user environments
Some XFS tricks for disk quotas
A lot of glue scripts
This post walks you through wh…
- 15 Nov, 2025 *
I recently had to build a system to share two Nvidia L40S GPUs (each with 48 GB of memory) among a group of users. The L40S does not support MIG (Multi-instance GPU) as it was designed as a PCIe-only GPU. Therefore, hardware slicing an individual GPU was not an option. Nvidia’s vGPU was another option, but it would require separate per-GPU or per-user licensing, which did not make sense for a small academic setup.
How can I share a couple of GPUs with a small group of users without MIG, using vGPU, and at the same time without giving everyone root access on the host?
The TLDR answer is:
Genv for soft GPU partitioning and enforcement
Docker for per-user environments
Some XFS tricks for disk quotas
A lot of glue scripts
This post walks you through what I built, what worked, and what still needs improvement.
What is genv and why did I use it?
Genv is an open source GPU environment and cluster manager. You can think of it as “virtual environments for GPUs”
It allows you to define environments with:
Limited GPU memory capacity
Number of GPUs
An enforcer that terminates processes in environments that use more than their allocated memory.
This was perfect for running training/inference scripts on GPUs. As an admin, I had a way to enforce the memory capacity that was allocated for that environment without needing MIG or vGPU licenses. The soft partitioning was good enough for a small set of trustworthy users.
High-level architecture
Bare metal Ubuntu 24.04 host
Nvidia drivers and CUDA installed
Nvidia container toolkit configured for Docker
Genv installed and configured
genv-docker as a Docker runtime
Docker data root on an XFS loopback image with project quotas
genv enforce running as a systemd service
One long-lived container per user
Image built from nvidia/cuda:12.9.1-runtime-ubuntu24.04
Python, PyTorch, TensorFlow, Jupyter, etc, baked in
GPU access via genv runtime
Limited GPU memory per container
Limited disk usage per container
SSH login
Users SSH into the host
A ForceCommand script auto-starts or attaches them to their container
Their home directory is mounted inside the container
From the user’s perspective, this is a VM that they can SSH into. However, they land inside their own GPU-ready container, which includes the necessary libraries/tools pre-installed, as well as external drives mounted, allowing them to access their larger files/data.
Host side setup
On the host, I started with the basic Nvidia container stack:
Install GPU drivers and CUDA - I used ubuntu-drivers autoinstall to get an appropriate driver and then followed Nvidia’s guide to install the container toolkit and hook it into Docker.
1.
Install genv and genv-docker - Following the genv docs, I installed the core genv tool and the Docker integration, and added the genv runtime into Docker’s daemon.json instead of passing it via dockerd flags. 1.
Build a base GPU image: Below is a stripped-down version that gives you a ~ 12 GB base image.
FROM nvidia/cuda:12.9.1-runtime-ubuntu24.04
# 1. Install OS packages & SSH server
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 python3-venv python3-pip python3-dev \
build-essential git curl ca-certificates \
libsm6 libxext6 libnss-sss sssd-common \
sudo vim wget \
&& rm -rf /var/lib/apt/lists/*
# 2. Optimize pip usage
ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_BREAK_SYSTEM_PACKAGES=1
# 3. Install ML stack
RUN python3 -m pip install --no-cache-dir \
numpy pandas scipy scikit-learn matplotlib seaborn \
scikit-image transformers \
&& python3 -m pip install --no-cache-dir \
torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu124 \
&& python3 -m pip install --no-cache-dir tensorflow==2.16.*
WORKDIR /workspace
CMD ["sleep", "infinity"]
The goal is to provide users with a “batteries included” ML environment, so they do not need to install additional software (they can still use apt-get install).
Disk quotas with XFS project quotas
Since all the containers would run on a single host and to avoid certain users filling up the host’s entire disk, having a hard cap on a container’s disk usage is non-negotiable.
You could define disk quotas using Docker’s overlay2.size option. However, this only works if the underlying file system is XFS. So I had to create a 1 TB loopback image, format it as XFS (ftype=1), and mount it on /var/lib/docker with the pquota flag.
Below are the relevant commands:
sudo fallocate -l 1T /home/docker-data.img
LOOP=$(sudo losetup -f --show /home/docker-data.img)
sudo mkfs.xfs -n ftype=1 "$LOOP"
sudo losetup -d "$LOOP"
echo '/home/docker-data.img /var/lib/docker xfs loop,pquota 0 0' | sudo tee -a /etc/fstab
sudo mount -a
Then in /etc/docker/daemon.json
{
"storage-driver": "overlay2",
"storage-opts": ["overlay2.size=100G"],
"log-driver": "json-file",
"log-opts": { "max-size": "50m", "max-file": "3" }
}
This gives:
A global upper bound for Docker’s data
A default 100 GB upper bound per container
Log rotation so JSON logs do not eat the disk
You can also attach XFS project quotas to specific named volumes to set per-volume caps.
Auto-launching a personal container on SSH
I wanted the user experience to be - SSH to the host, and you are automatically dropped into your personal GPU container. To do that, you need:
The user is to be created on the host machine first.
Then provision a container for that user. I used the naming convention gpu-{USERNAME}.
Configured SSH with a Match User block that uses ForceCommand /usr/local/bin/docker_shell for the allowed users. This executes the docker_shell script, which I wrote for that user on login.
This script does the following:
Derives the container name from the username, for example, gpu-alice.
Looks up the user’s drive (each user is also given an external drive based on their username) path and mounts it into /external_drive in the container.
Uses genv-docker run with:
--runtime genv
--user uid:gid
A host path bind for /var/lib/sss and a sudoers snippet (To give the user sudo access inside the container)
--gpus 1
--gpu-memory set to a slice of VRAM (for example, 8410mi)
--network host
If the container already exists, the script starts or attaches instead of recreating it. If it has never been created, it runs genv-docker run to spin up a fresh one.
From the user’s point of view, this feels like a personal VM, but in reality, they are inside a container with a GPU slice and disk limit.
Enforcing GPU memory limits with genv-enforce
Everything so far is about wiring. The enforcement itself comes from genv enforce.
There are two key facts:
Inside each container, nvidia-smi is wrapped by genv and shows a reduced “visible” memory, for example, 8 GiB, even though the physical L40S has 48 GB.
Framework APIs like PyTorch’s torch.cuda.get_device_properties(0).total_memory still see the real device size and may try to use the entire GPU Memory, because the driver is not actually partitioned in hardware.
To make the limits real, I run this on the host as a systemd service: genv enforce --env-memory --non-env-processes --interval 1 with a unit file like:
[Unit]
Description=genv enforce watchdog
After=network-online.target local-fs.target
[Service]
Type=simple
ExecStart=/usr/local/bin/genv enforce --env-memory --non-env-processes --interval 1
StandardOutput=append:/var/log/genv-enforce.log
StandardError=append:/var/log/genv-enforce.log
Restart=always
RestartSec=2s
[Install]
WantedBy=multi-user.target
This loop looks at GPU usage and kills:
Any process in a genv environment that exceeds its GPU memory quota
Any process touching the GPU that is not associated with a genv environment at all. It is a software cop, not a hardware partition, but for friendly users, who you can trust, it is enough.
Provisioning and housekeeping
To create a new user environment, I wrote a small provisioning script:
./provision_gpu_container.sh <username> <gpu_memory_gb> <gpus> [disk_gb] [image]
For example:
./provision_gpu_container.sh johndoe 6 1 40 gpu-cont
This:
Sets up their Unix account and Docker group membership
Configures sudo for their numeric UID
Starts a container named gpu-johndoe with:
1 GPU
6 GB GPU memory
40 GB disk space
On the maintenance side, I have:
A simple command to restart any gpu-* containers after a host reboot.
Checks to confirm genv-enforce is running.
A note to users that any running processes inside containers die when the host reboots, but their files remain.
What worked well
A few things turned out nicely:
Usability - Users SSH once and land in a GPU-ready shell. No need to think about Docker or genv.
Per user containment - Each user lives in their own container with its own Python env and disk quota. Users cannot easily interfere with each other’s packages.
Soft GPU partitioning - genv plus genv-docker gave me fine-grained control over how much VRAM each user can consume, and genv enforce gave me a safety net when someone inevitably went over.
Reuse of existing identities and storage - I reused our existing SSSD setup for Unix users and mounted their campus drive into the container so they did not need to learn a new storage system.
What was painful and what I would fix
You can feel the rough edges in the system. A few examples:
Memory debugging is hard - Users cannot easily see how close they are to their genv quota. They only see “process killed” and guess it was OOM. Environment monitoring or a simple CLI that displays “You currently use X of Y MiB” would be beneficial. 1.
File movement - Copying data in and out of the container is not straightforward, even though the external drive was mounted. I need maybe a helper script for Rsync or scp. 1.
No scheduling - Right now, this is “first-come, first-served”. No scheduler decides who gets a GPU slice when. For a small group, this is fine, but if more users show up, I would need a real scheduler or move to Kubernetes or Slurm with GPU support. 1.
No comprehensive observability or monitoring - I currently only have access to genv envs, which show me what envs are being actively used. We could spin up a small service in each of the containers that monitors GPU usage and reports it back. We could then use this data on a control plane to manage environments without needing the CLI or scripts.
When this system makes sense
This setup is not a universal pattern, but it can work well if:
You have a small number of users who mostly trust each other.
You have a handful of large non-MIG supported GPUs like L40S, and you want to slice them into several usable chunks.
vGPU licensing is overkill for your budget and complexity tolerance.
You want users to feel like they have “their own machine” with a stable environment. If you have dozens or hundreds of users, strict multi-tenant requirements, or want strong isolation, hardware options such as MIG-capable GPUs, full vGPU, or a proper GPU-aware cluster scheduler may be a better fit.
Core takeaway
If you are stuck with GPUs that cannot do MIG and vGPU is not practical, you can still get a long way with:
genv environments for soft VRAM slices
genv-docker to attach containers to those environments
Docker containers per user instead of full VMs
A small amount of scripting for SSH integration and quotas
