Show HN: Building Memory as a Service with Memrun (opens in new tab)

Deploy data-heavy Python services with warm memory, sticky routing, and zero infrastructure code.


What memrun is

Memrun is a Python SDK and deployment platform for services that are memory-bound, not CPU-bound. The kind of services where you load a 2GB dataset or a large ML model, then serve many requests against it. Where the expensive part is getting data into memory, not processing it.

You write a handler function with a decorator. Memrun handles everything else: provisioning workers on Hetzner VMs, routing related requests to the same worker via Kafka partition keys, managing a 600GB NVMe-backed cache per worker, bounding concurrency with semaphores, and reporting health via heartbeats.

from memrun import MemoryService

svc = MemoryService(
name="doc-search",
memory="32Gi",
disk="600Gi",
max_workers=10,
concurrency=16,
timeout_seconds=300,
)

@svc.handler(sticky_key="corpus_id")
async def handle(ctx, req):
embeddings = await ctx.get_or_fetch(req["embeddings_path"])
results = search(embeddings, req["query"], top_k=req.get("top_k", 10))
return {"matches": results}

Deploy with one command:

memrun deploy handler.py --name doc-search --memory 32Gi --disk 600Gi

That’s it. No Dockerfiles, no Kubernetes manifests, no Terraform. The platform boots VMs, installs your handler, and starts serving.

The problem it solves

Every time I built a data-intensive service on Lambda or Cloud Run, the same pattern emerged:

  1. Request arrives
  2. Fetch 1-3GB from S3 (200-800ms)
  3. Deserialize into working structures (100-500ms)
  4. Do the actual computation (50-200ms)
  5. Return result
  6. Container gets killed
  7. Next request: repeat from step 1

Steps 2-3 dominate runtime. Steps 6-7 make it worse. The actual work is step 4, but you’re paying for steps 2-3 on every single request.

With memrun, step 2-3 happens once. The LRUCache stores fetched data on NVMe. The SharedWorkerContext keeps decoded structures in memory across requests. Kafka’s sticky routing ensures the same user/dataset combination always lands on the same worker. After the first request, subsequent requests skip straight to step 4.

How it works, concretely

The SDK

You define a MemoryService with resource declarations:

svc = MemoryService(
name="analytics",       # DNS-compatible service name
memory="32Gi",          # RAM per worker
disk="600Gi",           # NVMe cache per worker
max_workers=50,         # Maximum worker count
min_workers=2,          # Minimum (always-on) workers
concurrency=16,         # Max concurrent requests per worker
timeout_seconds=300,    # Per-request timeout
env={"MODEL_VERSION": "v3"},  # Environment variables
)

Init handlers

Load expensive resources once per worker lifetime:

Loading more...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help