9 min read1 hour ago
–
Hey all. I wrote a blog post recently that was explained as reading about black magic. I just went back and read it, and it’s definitely densely packed. Sorry about that. This is an attempt at explaining what TinyKVM is for, and what problems it solves.
TinyKVM is a light-weight sandboxing and virtualization library that hosts regular Linux programs. However, while it can run normal programs, it is more focused on computation and request-response workloads. It can reset a program in record time back to a previous state, and the intention is for the reset to be used on every single request. There probably aren’t that many of who us who are interested in per-request isolation. It’s kind of a…
9 min read1 hour ago
–
Hey all. I wrote a blog post recently that was explained as reading about black magic. I just went back and read it, and it’s definitely densely packed. Sorry about that. This is an attempt at explaining what TinyKVM is for, and what problems it solves.
TinyKVM is a light-weight sandboxing and virtualization library that hosts regular Linux programs. However, while it can run normal programs, it is more focused on computation and request-response workloads. It can reset a program in record time back to a previous state, and the intention is for the reset to be used on every single request. There probably aren’t that many of who us who are interested in per-request isolation. It’s kind of a niche security feature. But on the off-chance that you are interested, this blog post is for you.
And, you can still run simple programs inside it:
$ kvmserver --allow-read run uname -aLinux tinykvm 3.5.0  x86_64 x86_64 x86_64 GNU/Linux$ kvmserver --allow-read run perl /usr/games/cowsay Hello TinyKVM World! ______________________< Hello TinyKVM World! > ----------------------        \   ^__^         \  (oo)\_______            (__)\       )\/\                ||----w |                ||     ||
The overall architecture is designed for per-request isolation and scalability through super-light VM forks.
Per-request isolation
Normally when you host a web service, each time a request comes in, that request will be handled inside a stateful application that can remember things over time. The drawback is that a break-in can be made to affect future requests. The break-in might never be able to escape the sandbox, but it will still have access to the application and it could make the application start doing shady things with future client requests (or backend services). Per-request isolation reduces the fallout effects by resetting the whole environment back to a known good state after every single request, no matter what. This generally lowers the blast radius of an attack and eliminates all types of temporal attacks. It also removes garbage collection from the equation, as there is never any opportunity for it to run. We can say that each request is being handled in an ephemeral request VM.
An example:
let state = 0;Deno.serve({ port: 8000 }, (_req) => {  state += 1;  return new Response(String(state));});
We’ll run this program with Deno in the TinyKVM CLI:
export DENO_V8_FLAGS=--predictable,--max-old-space-size=64,--max-semi-space-size=64export DENO_NO_UPDATE_CHECK=1kvmserver -e -t 1 --warmup 1000 --allow-all run deno run --allow-all local.ts
We assume deno is in PATH. The -e argument means ephemeral, and enables per-request isolation, while -t is the number of VM forks that make up the concurrency of our isolated application. During startup it will send 1000 warmup requests, which will activate the V8 JIT. And after that it will become ephemeral and start listening for requests. Let’s send the 1001th request:
$ curl -D - http://127.0.0.1:8000HTTP/1.1 200 OKcontent-type: text/plain;charset=UTF-8vary: Accept-Encodingcontent-length: 5date: Mon, 27 Oct 2025 17:40:25 GMT1001$ curl -D - http://127.0.0.1:8000HTTP/1.1 200 OKcontent-type: text/plain;charset=UTF-8vary: Accept-Encodingcontent-length: 5date: Mon, 27 Oct 2025 17:40:49 GMT1001
We sent two requests, and yet the number didn’t get incremented. We know that it was incremented during warmup because we did 1000 warmup requests and it’s now 1001. It remains the same because the request VM was completely reset in between my two requests.
Gated persistence for per-request isolation
While we cannot have persistence in the request VMs with per-request isolation, we can have a dedicated program that acts as gated persistence. Since we’re using Deno we cannot directly call a function in a completely separate program, but we can instead resume our persistence program with a zero-copy buffer. We’ll write our request and the answer into the same buffer. It will be bi-directional true zero-copy IPC. Here’s our example persisted program:
import { connect } from "jsr:@db/redis";const kvmserverguest = Deno.dlopen("libkvmserverguest.so", {  kvmserverguest_storage_wait_paused: {    parameters: ["buffer", "isize"],    result: "isize",  },});const bufptrptrbuf = new BigUint64Array(1);const bufptrptr = Deno.UnsafePointer.of(bufptrptrbuf);const bufptrptrview = new Deno.UnsafePointerView(bufptrptr!);function waitForRemoteBuffer(result: number): Uint8Array {  // Wait for a UInt8Array buffer from C  const buflen = Number(    kvmserverguest.symbols.kvmserverguest_storage_wait_paused(      bufptrptrbuf, BigInt(result)) );  const bufptr = bufptrptrview.getPointer(0);  if (bufptr === null || buflen < 0) {    return new Uint8Array(0);  }  // View it as a Uint8Array  const arrayBuffer = Deno.UnsafePointerView.getArrayBuffer(bufptr, buflen);  return new Uint8Array(arrayBuffer);}const redisClient = await connect({ hostname: "127.0.0.1", port: 6379 });await redisClient.set("value", "0");let result = 0;while (true) {  const buffer = waitForRemoteBuffer(result);  if (buffer.length === 0) {    result = -1;    continue;  }  const redis_answer = await redisClient.incr("value");  const response = "Hello, " + redis_answer + " from Persisted Deno!";  const { read, written } = new TextEncoder().encodeInto(response, buffer);  result = read < response.length ? -1 : written;}
Our separate program with persistence is sitting in an endless loop waiting for a buffer. When we go back around, we’ll call waitForRemoteBuffer again with our current result. The result will be handed back to the caller currently waiting in a paused state. Any changes made to the buffer we received is also visible in the caller. The waitForRemoteBuffer function will call a FFI-function that is compiled automatically by KVM server and made available through a fixed libkvmserverguest.so filename. So, all persistence programs use the same API.
Our request program now looks something like this:
Deno.serve({ port: 8000 }, (_req) => {  const remote_buffer = new Uint8Array(256);  const len = Number(kvmserverguest.symbols.kvmserverguest_remote_resume(    remote_buffer, BigInt(remote_buffer.byteLength),  ));  if (len < 0) {    return new Response("Internal Server Error", { status: 500 });  }  const remote_str = new TextDecoder().decode(    new Uint8Array(remote_buffer.buffer, 0, len),  );  return new Response(remote_str);});
This program is the one handling the request, and it will be reset after the request concludes. We’re accessing our persistence program through the kvmserverguest_remote_resume FFI-function.
We can load both programs at the same time in KVM server. Our so-called storage program along with the main request program like so:
export DENO_V8_FLAGS=--predictable,--max-old-space-size=64,--max-semi-space-size=64export DENO_NO_UPDATE_CHECK=1kvmserver -e -t 1 --warmup 1000 --allow-all storage deno run --allow-all remote.ts ++ run deno run --allow-all local.ts
They are now completely separate from each other, living in separate VMs. In this example, requests can only access persistence through a single buffer.
You should see an extra line when starting KVM server now:
Storage VM initialized. init=416msListening on http://0.0.0.0:8000/ (http://localhost:8000/)Warming up the guest VM listening on 0.0.0.0:8000 (1 threads * 1000 connections * 1 requests)Program 'deno' loaded. epoll vm=1 ephemeral-kwm huge=0/0 init=161ms warmup=55ms rss=73MB
A storage VM has been initialized from a completely separate program. The storage VM has persistence and won’t get reset unless it crashes. Since we’re using a KV-store as our storage, we aren’t losing any data even if there’s a mistake that crashes and restarts our persistent VM. Now when we make cURL requests we will see that we appear to have persistence again, while having added only a meager 50MB to RSS. And most importantly we maintain a reduced blast radius for any attacks.
We could use Redis directly from the request VM, but then any break-in will also have access to it. Only having access to persistence through a bottleneck that you can apply scrutiny to is a good defense strategy.
Let’s send some cURL requests now and see what happens:
$ curl -D - http://127.0.0.1:8000HTTP/1.1 200 OKcontent-type: text/plain;charset=UTF-8vary: Accept-Encodingcontent-length: 32date: Mon, 27 Oct 2025 18:23:47 GMTHello, 1001 from Persisted Deno!$ curl -D - http://127.0.0.1:8000HTTP/1.1 200 OKcontent-type: text/plain;charset=UTF-8vary: Accept-Encodingcontent-length: 32date: Mon, 27 Oct 2025 18:23:48 GMTHello, 1002 from Persisted Deno!
This time the requests were able to count upwards. This persistence is achieved through the new sandbox-to-sandbox RPC mechanism that I tried very hard to explain in the last blog post.
What is the storage/persistence program for?
The storage/persistence program is entirely optional. It’s main purpose is to allow a tenant to remember things across requests while still benefiting from per-request isolation, where you can’t store anything at all. It’s fairly okay to whip up two JS programs that talk to each other like shown above.
The main benefit of the persisted program (the storage VM) is that it can naturally have more privileges, like access to a database or being able to connect to other services. Yet it is also sandboxed, and doesn’t have full access to the host system. Meanwhile the main program (in the request VMs) can have heavily reduced privileges, and should perhaps not have any filesystem access at all and only very limited network access, if any.
Another use-case for the persistent program is database connection pooling. It can realistically only be done from the persisted program: Request VM ← → Storage VM (pooled) ← → Database.
Validation
So, does all of this really work?
$ ./wrk -c1 -t1 -L http://127.0.0.1:8000 -H "Connection: close"Running 10s test @ http://127.0.0.1:8000  1 threads and 1 connections  Thread Stats   Avg      Stdev     Max   +/- Stdev    Latency   119.77us   94.24us   4.47ms   99.21%    Req/Sec     8.01k   373.70     8.30k    88.12%  Latency Distribution     50%  111.00us     75%  115.00us     90%  126.00us     99%  196.00us  80554 requests in 10.10s, 14.67MB readRequests/sec:   7975.70Transfer/sec:      1.45MB
Looks OK to me. It should be accessing our persistent program which again accesses Redis on every single request. We also close the connection, as that is required in KVM server.
$ curl -D - http://127.0.0.1:8000HTTP/1.1 200 OKcontent-type: text/plain;charset=UTF-8vary: Accept-Encodingcontent-length: 34date: Wed, 29 Oct 2025 12:12:25 GMTHello, 369859 from Persisted Deno!
Yep, it’s been counting! Not bad!
Benchmarks
I did some benchmarks against TinyKVM in a specialized server that avoids emulating I/O (and doesn’t need Connection: close). You might think that I should be benchmarking against state-of-the-art IPC memory-sharing solutions like iceoryx2, but that is not possible as we can’t share memory while the caller VM is executing at the same time. The caller VM could write garbage into the shared memory while the other side is using it to crash it (or worse). Since that is completely off the table, and we have to write the full request into a buffer anyway, we might as well use Redis. The difference between pipes and a TCP socket is very small.
Accessing storage is fast. We have a good baseline.
The first column is our unique RPC method. The point is not to say that accessing Redis is somehow bad here. It’s actually quite good. When the server is not busy, the two methods are largely the same. Rather, the point is to show that accessing storage also isn’t expensive. It is a VM-to-VM bi-directional communication method with shared memory and low latency that is safe to use because the caller is paused.
The third benchmark is the unfortunate reality for other implementations of per-request isolation, as you can’t reuse connections normally. However, you can in TinyKVM with storage access. After all, storage is non-ephemeral and that enables us to improve Redis performance by being in the second column. Normally a fresh connection has to be opened every request (as it gets fully reset). So we are avoiding around 30% of the overhead.
Finally, we will make the system busy using schbench:
VM-to-VM RPC in TinyKVM is scheduler independent.
Since the RPC method avoids the scheduler, it has predictable and low p99 latency even when the server is busy. We can see that just reading and writing from a socket (this was connection reuse) incurs unbounded latency when things are busy around us.
So, I hope these benchmarks explain why we do the weird things that we do in per-request isolation land. We’re dealing with adversarial tenants and have to architect things with certain things in mind in order to safely access other services. To sum it up:
- Every request is ephemeral and gets completely wiped after conclusion
- We cannot have stateful things like open connections in ephemeral VMs
- We compensate with a persistent storage VM which also is just a program
- We have easy access to the storage VM through bi-directional communication and shared memory
- It’s faster to access Redis through connection pool in our storage VM program than opening a new connection on every request
Thanks for reading!
-gonzo