How to share Nvidia GPUs that don’t support MIG and vGPU isn’t an option

15 Nov, 2025 *

I recently had to build a system to share two Nvidia L40S GPUs (each with 48 GB of memory) among a group of users. The L40S does not support MIG (Multi-instance GPU) as it was designed as a PCIe-only GPU. Therefore, hardware slicing an individual GPU was not an option. Nvidia’s vGPU was another option, but it would require separate per-GPU or per-user licensing, which did not make sense for a small academic setup.

How can I share a couple of GPUs with a small group of users without MIG, using vGPU, and at the same time without giving everyone root access on the host?

The TLDR answer is:

Genv for soft GPU partitioning and enforcement

Docker for per-user environments

Some XFS tricks for disk quotas

A lot of glue scripts

This post walks you through wh…

15 Nov, 2025 *

How can I share a couple of GPUs with a small group of users without MIG, using vGPU, and at the same time without giving everyone root access on the host?

The TLDR answer is:

Genv for soft GPU partitioning and enforcement

Docker for per-user environments

Some XFS tricks for disk quotas

A lot of glue scripts

This post walks you through what I built, what worked, and what still needs improvement.

What is genv and why did I use it?

Genv is an open source GPU environment and cluster manager. You can think of it as “virtual environments for GPUs”

It allows you to define environments with:

Limited GPU memory capacity

Number of GPUs

An enforcer that terminates processes in environments that use more than their allocated memory.

This was perfect for running training/inference scripts on GPUs. As an admin, I had a way to enforce the memory capacity that was allocated for that environment without needing MIG or vGPU licenses. The soft partitioning was good enough for a small set of trustworthy users.

High-level architecture

Bare metal Ubuntu 24.04 host

Nvidia drivers and CUDA installed

Nvidia container toolkit configured for Docker

Genv installed and configured

`genv-docker` as a Docker runtime

Docker data root on an XFS loopback image with project quotas

genv enforce running as a systemd service

One long-lived container per user

Image built from `nvidia/cuda:12.9.1-runtime-ubuntu24.04`

Python, PyTorch, TensorFlow, Jupyter, etc, baked in

GPU access via genv runtime

Limited GPU memory per container

Limited disk usage per container

SSH login

Users SSH into the host

A ForceCommand script auto-starts or attaches them to their container

Their home directory is mounted inside the container

From the user’s perspective, this is a VM that they can SSH into. However, they land inside their own GPU-ready container, which includes the necessary libraries/tools pre-installed, as well as external drives mounted, allowing them to access their larger files/data.

Host side setup

On the host, I started with the basic Nvidia container stack:

Install GPU drivers and CUDA - I used ubuntu-drivers autoinstall to get an appropriate driver and then followed Nvidia’s guide to install the container toolkit and hook it into Docker. 1.

Install genv and genv-docker - Following the genv docs, I installed the core genv tool and the Docker integration, and added the genv runtime into Docker’s daemon.json instead of passing it via dockerd flags. 1.

Build a base GPU image: Below is a stripped-down version that gives you a ~ 12 GB base image.

FROM nvidia/cuda:12.9.1-runtime-ubuntu24.04
# 1. Install OS packages & SSH server
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 python3-venv python3-pip python3-dev \
build-essential git curl ca-certificates \
libsm6 libxext6 libnss-sss sssd-common \
sudo vim wget \
&& rm -rf /var/lib/apt/lists/*


# 2. Optimize pip usage
ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_BREAK_SYSTEM_PACKAGES=1


# 3. Install ML stack
RUN python3 -m pip install --no-cache-dir \
numpy pandas scipy scikit-learn matplotlib seaborn \
scikit-image transformers \
&& python3 -m pip install --no-cache-dir \
torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu124 \
&& python3 -m pip install --no-cache-dir tensorflow==2.16.*


WORKDIR /workspace
CMD ["sleep", "infinity"]

The goal is to provide users with a “batteries included” ML environment, so they do not need to install additional software (they can still use apt-get install).

Disk quotas with XFS project quotas

Since all the containers would run on a single host and to avoid certain users filling up the host’s entire disk, having a hard cap on a container’s disk usage is non-negotiable.

You could define disk quotas using Docker’s overlay2.size option. However, this only works if the underlying file system is XFS. So I had to create a 1 TB loopback image, format it as XFS (ftype=1), and mount it on /var/lib/docker with the pquota flag.

Below are the relevant commands:

sudo fallocate -l 1T /home/docker-data.img
LOOP=$(sudo losetup -f --show /home/docker-data.img)
sudo mkfs.xfs -n ftype=1 "$LOOP"
sudo losetup -d "$LOOP"

echo '/home/docker-data.img /var/lib/docker xfs loop,pquota 0 0' | sudo tee -a /etc/fstab
sudo mount -a

Then in /etc/docker/daemon.json

{
"storage-driver": "overlay2",
"storage-opts": ["overlay2.size=100G"],
"log-driver": "json-file",
"log-opts": { "max-size": "50m", "max-file": "3" }
}

This gives:

A global upper bound for Docker’s data

A default 100 GB upper bound per container

Log rotation so JSON logs do not eat the disk

You can also attach XFS project quotas to specific named volumes to set per-volume caps.

Auto-launching a personal container on SSH

I wanted the user experience to be - SSH to the host, and you are automatically dropped into your personal GPU container. To do that, you need:

The user is to be created on the host machine first.

Then provision a container for that user. I used the naming convention `gpu-{USERNAME}`.

Configured SSH with a Match User block that uses ForceCommand /usr/local/bin/docker_shell for the allowed users. This executes the docker_shell script, which I wrote for that user on login.

This script does the following:

Derives the container name from the username, for example, `gpu-alice`.

Looks up the user’s drive (each user is also given an external drive based on their username) path and mounts it into /external_drive in the container.

Uses genv-docker run with:

`--runtime genv`

`--user uid:gid`

A host path bind for /var/lib/sss and a sudoers snippet (To give the user sudo access inside the container)

`--gpus 1`

`--gpu-memory set to a slice of VRAM (for example, 8410mi)`

--network host

If the container already exists, the script starts or attaches instead of recreating it. If it has never been created, it runs genv-docker run to spin up a fresh one.

From the user’s point of view, this feels like a personal VM, but in reality, they are inside a container with a GPU slice and disk limit.

Enforcing GPU memory limits with genv-enforce

Everything so far is about wiring. The enforcement itself comes from genv enforce.

There are two key facts:

Inside each container, nvidia-smi is wrapped by genv and shows a reduced “visible” memory, for example, 8 GiB, even though the physical L40S has 48 GB.

Framework APIs like PyTorch’s torch.cuda.get_device_properties(0).total_memory still see the real device size and may try to use the entire GPU Memory, because the driver is not actually partitioned in hardware.

To make the limits real, I run this on the host as a systemd service: genv enforce --env-memory --non-env-processes --interval 1 with a unit file like:

[Unit]
Description=genv enforce watchdog
After=network-online.target local-fs.target

[Service]
Type=simple
ExecStart=/usr/local/bin/genv enforce --env-memory --non-env-processes --interval 1
StandardOutput=append:/var/log/genv-enforce.log
StandardError=append:/var/log/genv-enforce.log
Restart=always
RestartSec=2s

[Install]
WantedBy=multi-user.target

This loop looks at GPU usage and kills:

Any process in a genv environment that exceeds its GPU memory quota

Any process touching the GPU that is not associated with a genv environment at all. It is a software cop, not a hardware partition, but for friendly users, who you can trust, it is enough.

Provisioning and housekeeping

To create a new user environment, I wrote a small provisioning script:

./provision_gpu_container.sh <username> <gpu_memory_gb> <gpus> [disk_gb] [image]

For example:

./provision_gpu_container.sh johndoe 6 1 40 gpu-cont

This:

Sets up their Unix account and Docker group membership

Configures sudo for their numeric UID

Starts a container named gpu-johndoe with:

1 GPU

6 GB GPU memory

40 GB disk space

On the maintenance side, I have:

A simple command to restart any gpu-* containers after a host reboot.

Checks to confirm genv-enforce is running.

A note to users that any running processes inside containers die when the host reboots, but their files remain.

What worked well

A few things turned out nicely:

Usability - Users SSH once and land in a GPU-ready shell. No need to think about Docker or genv.

Per user containment - Each user lives in their own container with its own Python env and disk quota. Users cannot easily interfere with each other’s packages.

Soft GPU partitioning - genv plus genv-docker gave me fine-grained control over how much VRAM each user can consume, and genv enforce gave me a safety net when someone inevitably went over.

Reuse of existing identities and storage - I reused our existing SSSD setup for Unix users and mounted their campus drive into the container so they did not need to learn a new storage system.

What was painful and what I would fix

You can feel the rough edges in the system. A few examples:

Memory debugging is hard - Users cannot easily see how close they are to their genv quota. They only see “process killed” and guess it was OOM. Environment monitoring or a simple CLI that displays “You currently use X of Y MiB” would be beneficial. 1.

File movement - Copying data in and out of the container is not straightforward, even though the external drive was mounted. I need maybe a helper script for Rsync or scp. 1.

No scheduling - Right now, this is “first-come, first-served”. No scheduler decides who gets a GPU slice when. For a small group, this is fine, but if more users show up, I would need a real scheduler or move to Kubernetes or Slurm with GPU support. 1.

No comprehensive observability or monitoring - I currently only have access to genv envs, which show me what envs are being actively used. We could spin up a small service in each of the containers that monitors GPU usage and reports it back. We could then use this data on a control plane to manage environments without needing the CLI or scripts.

When this system makes sense

This setup is not a universal pattern, but it can work well if:

You have a small number of users who mostly trust each other.

You have a handful of large non-MIG supported GPUs like L40S, and you want to slice them into several usable chunks.

vGPU licensing is overkill for your budget and complexity tolerance.

You want users to feel like they have “their own machine” with a stable environment. If you have dozens or hundreds of users, strict multi-tenant requirements, or want strong isolation, hardware options such as MIG-capable GPUs, full vGPU, or a proper GPU-aware cluster scheduler may be a better fit.

Core takeaway

If you are stuck with GPUs that cannot do MIG and vGPU is not practical, you can still get a long way with:

genv environments for soft VRAM slices

genv-docker to attach containers to those environments

Docker containers per user instead of full VMs

A small amount of scripting for SSH integration and quotas

it ain’t much but its honest work meme header - creates a gpu env that runs on scripts and hope

#containers #docker #genv #gpu #nvidia

Genv for soft GPU partitioning and enforcement

Docker for per-user environments

Some XFS tricks for disk quotas

Genv for soft GPU partitioning and enforcement

Docker for per-user environments

Some XFS tricks for disk quotas

What is genv and why did I use it?

Limited GPU memory capacity

Number of GPUs

High-level architecture

Nvidia drivers and CUDA installed

Nvidia container toolkit configured for Docker

Genv installed and configured

genv-docker as a Docker runtime

Docker data root on an XFS loopback image with project quotas

genv enforce running as a systemd service

Image built from nvidia/cuda:12.9.1-runtime-ubuntu24.04

Python, PyTorch, TensorFlow, Jupyter, etc, baked in

GPU access via genv runtime

Limited GPU memory per container

Limited disk usage per container

Users SSH into the host

A ForceCommand script auto-starts or attaches them to their container

Host side setup

Disk quotas with XFS project quotas

A global upper bound for Docker’s data

A default 100 GB upper bound per container

Auto-launching a personal container on SSH

The user is to be created on the host machine first.

Then provision a container for that user. I used the naming convention gpu-{USERNAME}.

Derives the container name from the username, for example, gpu-alice.

Looks up the user’s drive (each user is also given an external drive based on their username) path and mounts it into /external_drive in the container.

--runtime genv

--user uid:gid

A host path bind for /var/lib/sss and a sudoers snippet (To give the user sudo access inside the container)

--gpus 1

--gpu-memory set to a slice of VRAM (for example, 8410mi)

Enforcing GPU memory limits with genv-enforce

Inside each container, nvidia-smi is wrapped by genv and shows a reduced “visible” memory, for example, 8 GiB, even though the physical L40S has 48 GB.

Any process in a genv environment that exceeds its GPU memory quota

Provisioning and housekeeping

Sets up their Unix account and Docker group membership

Configures sudo for their numeric UID

1 GPU

6 GB GPU memory

A simple command to restart any gpu-* containers after a host reboot.

Checks to confirm genv-enforce is running.

What worked well

Usability - Users SSH once and land in a GPU-ready shell. No need to think about Docker or genv.

Per user containment - Each user lives in their own container with its own Python env and disk quota. Users cannot easily interfere with each other’s packages.

Soft GPU partitioning - genv plus genv-docker gave me fine-grained control over how much VRAM each user can consume, and genv enforce gave me a safety net when someone inevitably went over.

What was painful and what I would fix

When this system makes sense

You have a small number of users who mostly trust each other.

You have a handful of large non-MIG supported GPUs like L40S, and you want to slice them into several usable chunks.

vGPU licensing is overkill for your budget and complexity tolerance.

Core takeaway

genv environments for soft VRAM slices

genv-docker to attach containers to those environments

Docker containers per user instead of full VMs

Similar Posts

`genv-docker` as a Docker runtime

Image built from `nvidia/cuda:12.9.1-runtime-ubuntu24.04`

Then provision a container for that user. I used the naming convention `gpu-{USERNAME}`.

Derives the container name from the username, for example, `gpu-alice`.

`--runtime genv`

`--user uid:gid`

`--gpus 1`

`--gpu-memory set to a slice of VRAM (for example, 8410mi)`