gRPC Python, AsyncIO and multiprocess

I am torn about writing this. AsyncIO in Python is always a mess, protobuf is another, gRPC is the worst of them all because of all that boilerplate code that does nothing but trouble.

The task I am facing is integrating a mesh API server based on our internal codebase.

non-gRPC options

I mean, gRPC is just h2+protobuf, how hard could it be? Even uWSGI had h2 from decades ago

Turns out the options are quite limited. h2 in uWSGI was major versions behind, SPDYv3 never took off. gRPC and h2 are related but different because the frames are marked and handled differently.

So there’s either hypercorn or fallback to gRPC. To avoid further mess I decided to stick with gRPC

infectious async/await

Now I face another…

I am torn about writing this. AsyncIO in Python is always a mess, protobuf is another, gRPC is the worst of them all because of all that boilerplate code that does nothing but trouble.

The task I am facing is integrating a mesh API server based on our internal codebase.

non-gRPC options

I mean, gRPC is just h2+protobuf, how hard could it be? Even uWSGI had h2 from decades ago

Turns out the options are quite limited. h2 in uWSGI was major versions behind, SPDYv3 never took off. gRPC and h2 are related but different because the frames are marked and handled differently.

So there’s either hypercorn or fallback to gRPC. To avoid further mess I decided to stick with gRPC

infectious async/await

Now I face another challenge: The existing business logic is written in async/await style (cue FastAPI fad)

I carefully studied the gRPC async hello world example

Everything ran great, except the notorious GIL, my gRPC server runs but only on one single CPU.

multiprocess

Old school solution to GIL: spawn many processes. Given a 1:1 map to worker CPU. Easy? There’s an official multiprocessing example

It worked... until it didn’t. The major selling point of h2 is connection multiplexing, one TCP connection to serve all concurrency. And our mesh client is so good at this, only one worker consumes 100% of one CPU and the rest simply idle. 🤣

SO_REUSEPORT

I also tried to implement a prefork worker on my own. Let’s get rid of master because political-correctness we have SO_REUSEPORT already.

Unfortunately it didn’t work at all, because of h2’s multiplexing nature. The kernel won’t schedule requests if there’s only one single connection.

ProcessPoolExecutor

I looked closely and found how gRPC inits:

grpc.server(futures.ThreadPoolExecutor(max_workers=10))

Maybe swap it with ProcessPoolExecutor() ?

Nope, server went dead with a timeout. Don’t have time to look into C/C++ details. Nope.

It seems gRPC only allows ThreadPoolExecutor().

Why does Google even allow it as a parameter then?

The apply_async() hallucination

Out of despair, next I asked ChatGPT. The advanced AI model said: just use multiprocessing in your invokes

Yeah why not. So how do I run async in multiprocessing?

ChatGPT hallucinated: use apply_async. I initially believed that shit only to find it means the func will return an AsyncResult object, not running some async/await code. btw, I found the .apply() is just a shortcut for .apply_async().get()

Putting it together

I got the mess to work eventually.

Create a normal gRPC server with add_generic_rpc_handlers and stuff
Create a pool = ProcessPoolExecutor(...) before the unary_unary_rpc_method_handler, with an initializer that spawns a global loop = asyncio.new_event_loop(). It had to be global because concurrent.futures only allows it this way
Run loop.run_until_complete() inside pool.submit()

lessons learned

If you aren’t a try-hard:

avoid async

avoid gRPC

non-gRPC options

infectious async/await

non-gRPC options

infectious async/await

multiprocess

SO_REUSEPORT

ProcessPoolExecutor

The apply_async() hallucination

Putting it together

lessons learned

avoid async

Similar Posts