By the end of 2025, it’s hard to find a web application that doesn’t have AI-powered features (or that hasn’t tried to incorporate them). And when reflecting AI-generated content in a UI, LLM response streaming capabilities are essential. They enable us to provide feedback quickly, reduce the perceived slowness of AI, and thus improve the UX. But even though frameworks and libraries offer ready-made solutions for implementing streaming updates, the world of real-time hides many pitfalls. Let me reveal these (and how to avoid them) in Ruby on Rails AI-powered applications.
. And when reflecting AI-generated content in a UI, LLM response streaming capabilities are essential. They enable us to provide feedback quickly, reduce the perceived slowness of AI, and thus improve the UX. But even though frameworks and libraries offer ready-made solutions for implementing streaming updates, the world of real-time hides many pitfalls. Let me reveal these (and how to avoid them) in Ruby on Rails AI-powered applications.
Hire Evil Martians
Exploring reliable LLM streaming today or durable streams tomorrow? Navigate real-time pitfalls and choose architectures that last.
Ruby on Rails includes a component to deliver live updates to users: Action Cable. It uses WebSockets as a transport and can rely on various backends for horizontal scalability (databases, Redis, or whatever you want via custom adapters). Add Hotwire and RubyLLM to the mix, and you get a complete solution for streaming LLM responses to your users, with just a few lines of code.
A minimal example would consist of an HTML template and a bit of Ruby code:
<%= turbo_stream_from "chat_42" %>
<div id="chat">
Thinking...
</div>
RubyLLM.chat.ask("What are the pitfalls of real-time HTTP transports?") do |chunk|
Turbo::StreamsChannel.broadcast_append_to("chat_42", target: "chat", html: chunk.content)
end
The recipe is simple, but the result is… well, not-so-predictable. To better demonstrate potential UI hallucinations, I’ve created a simple AI-powered Rails application, Proposer, that helps prepare a conference talk proposal which follows selected best practices. Pure rails new (ver. 8.1 w/ Hotwire) spiced with RubyLLM. Nothing fancy (just yet.)
Wanna jump straight to the code? It’s already on GitHub: palkan/proposer.
Using Proposer as a playground, we’ll walk through the following topics:
Streaming is going off rails
Let’s start with a quick overview of how our application works. The flow is as follows:
- A user submits a proposal generation request via an HTML form
- A generation workflow is triggered to be executed in the background (via Active Job). The workflow is backed by the new Active Job Continuation API and consists of several steps, one for each part of the proposal (title, abstract, details)
- At each workflow step, we perform a streaming request to an LLM and broadcast chunks over a Turbo Stream to the user’s browser
The final step (the most interesting for us) looks like this:
def generate_field(id, prompt)
response = chat.ask(prompt) do |chunk|
next if chunk.content.blank?
Turbo::StreamsChannel.broadcast_append_to(
[proposal, field],
target: dom_id(proposal, field),
html: chunk.content
)
end
proposal.update!(id => response.content)
end
We broadcast the chunks as we receive them and store the final generation result in the database (the #update call).
What will the user see in the browser? Let’s consider a couple of examples.
No law, no order
An example of out-of-order streaming
No, the video above isn’t AI-generated, it’s real: data arrives in the browser in the incorrect order, there is no first-in-first-out guarantee. That’s how Action Cable reveals its threaded nature. Under the hood, Action Cable uses thread pools to distribute broadcast work (by calling connection#transmit for each subscribed connection). When you broadcast, say, 100 messages to the same client in a row, 4 Ruby threads pick them up and are being transmitted to the client concurrently—oops!
It turns out that even natural delays between chunks (which are not sent simultaneously) are not enough to guarantee proper ordering. How can we achieve this? There are plenty of options to try:
Send no chunks but accumulated data. In our case, we can use #broadcast_update_to instead of #broadcast_append_to. This wouldn’t eliminate hallucinations completely (the user will see some jittering), but the result will look correct. However, I’d still hesitate to go this way for environment reasons: sending full messages when only a few bytes have been added is a waste of resources (and increases network traffic).
Throttling. Accumulate chunks and broadcast them at most once every 100ms. However, 100ms may not be enough under higher load (when broadcast messages could pile up).
Using faster pub/sub adapters. Today, Rails suggests using Solid Cable by default, a database-backed adapter for Action Cable. The video above was recorded using this setup. When using Redis for pub/sub, such hallucinations occurred less often per client but were still noticeable when the broadcast rate was higher.
Using Action Cable Next or Async Cable. The next-gen version of Action Cable, actioncable-next, is currently available as a gem (but expected to be merged into Rails eventually). It provides a fastlane mode for broadcasts, suitable for Turbo, that eliminates one of the thread pools and makes broadcasts much faster (so they can be processed before chunks arrive). Similarly, async-cable also implements a faster broadcast loop.
However, none of the above provides a 100% order guarantee, since the Curse of Threads still holds and may reveal itself under load. Implementing re-ordering on the client-side would be more robust, but at the same time would require much more work (though a custom Turbo Stream action doesn’t sound like a bad idea 🤔).
Ordering is not the only guarantee missing from Action Cable.
There is no such thing as a reliable network
An example of network issues during streaming
The second example demonstrates a very typical pitfall of real-time communication—failing to account for connection losses. Look at the online status indicator (green circle—“on air”, red—no signal), and what happens when it turns red: chunks get lost, the UI state gets corrupted.
Network issues are inevitable. Most WebSocket clients, including Action Cable, support automatic reconnection, but they do not automatically catch up on missed messages. Chunks of LLM responses are lost forever because we don’t store them; they’re ephemeral. Still, from the UI perspective, we need a way to access them for clients that got disconnected for a moment—how can we do that?
Action Cable doesn’t provide any solution for that (although Solid Cable could, in theory, given that it stores messages in the database). In other words, Action Cable comes with an “at-most once” delivery guarantee—not enough for reliable streaming of LLM responses. We need “at-least once” at minimum!
We can emulate “at-least once” delivery by sending accumulated data instead of chunks and requesting the current committed state on reconnect. However, there is a better alternative that doesn’t require rethinking the streaming implementation or worrying about potential edge cases—we can use a more suitable real-time server that provides reliability out of the box.
Restoring sanity with AnyCable
tl;dr AnyCable provides both message ordering and “at-least once” delivery guarantees.
Yes, we can kill two birds with one stone by simply switching from one server implementation to another, from Action Cable to AnyCable. No hacks, no workarounds—just reliable streaming of LLM responses to clients with a couple of terminal commands:
bundle add anycable-rails
bin/rails g anycable:setup
Let’s see the process of AnyCable-ifying the Proposer app in action:
AnyCable Rails installer in action
That’s it! (Well, almost: you can see the list of TODO items to take care of yourself or delegate to your AI companion.)
Now, we no longer need to worry about the intricacies of real-time data streaming—the server takes care of it:
A demonstration of AnyCable recovery after a connection loss
The application code stays the same, and it works in a WYSIWYG manner: “I broadcast LLM response chunks one by one and the client receives all of them in the same order“.
Want to learn more about the technical side of AnyCable’s reliability? Go check our docs! Okay, we know that our documentation kinda sucks 🫤 (the upgrade is coming 🤞). Let me highlight the most important bits in the form of a list:
- The server stores publications (broadcasted messages) in log-like structures (in-memory for single-node installations and in Redis Streams in Pro clusters)
- The server attaches position metadata to each message
- The client (our beloved @anycable/web) keeps track of subscribed streams and positions and catches up from the last seen position on reconnect automatically
- (Optionally) The client may request historical messages for a given timeframe during the initial subscription (so we can eliminate race conditions during page loads or, in the case of Proposer, survive page reloads)
Thus, the guarantees are baked into the AnyCable server and the Action Cable Extended communication protocol. Yes, we had to invent a custom protocol on top of WebSockets and the existing Action Cable protocol to bring better delivery guarantees. (Why on Mars is there no common standard for implementing reliable real-time communication? I don’t know. What I know is that there is a chance that it will change in the future.)
Look into the future: Durable Streams
Recently, our friends from ElectricSQL announced a new initiative: an HTTP protocol for reliable data streaming called Durable Streams.
The goal is to standardize the communication language between clients and servers that consume and produce data streams, respectively. No restrictions on data format (as long as it can be represented as an append-only log of bytes), no opinionated authentication/authorization schemas (use your own), no custom application-level protocols (well, like we do at AnyCable 😄). The only requirement is the transport: HTTP (one-shot, polling, or SSE).
The Durable Streams protocol and AnyCable’s reliability design have many things in common:
- Both assume storing streams as logs (ordered and accessible from any position), one log per stream (not per client-stream pair)
- Both use stream offsets tracked by clients for resumability
What’s the point of Durable Streams if we already have existing solutions (such as AnyCable)? Interoperability. In this post, we demonstrated how AnyCable’s interoperability with Action Cable makes it a no-brainer to have reliable real-time streaming in Rails applications. Still, we had to switch the client implementation—just a few lines of code, but anyway.
Now imagine not having to worry about client or server vendor lock-in and designing your applications on open standards. You can use any compatible client on any platform, and upgrade servers as needed: start small with an embedded implementation and an in-memory store, switch to a managed solution, or build your own highly available cluster as you reach higher loads.
This philosophy closely aligns with ours at AnyCable. The “Any” in our name implies being helpful to anyone: any language, any platform, and… any protocol!
Yes, you got it right! AnyCable is gradually adopting Durable Streams. We’ll start by implementing the “read” part of the protocol (so you’ll be able to consume durable streams but still publish data using the AnyCable API), and we’ll see how it goes.
Stay tuned!