How to make SSE token streams resumable, cancellable, and multi-device

(zknill.io)

74 points | by zknill 3 days ago

6 comments

e1g 1 day ago
In JS land, this problem (streaming, resuming, recovering, multi-client, etc) has been fully solved by https://durablestreams.com - and it can be self-hosted, or managed via Cloudflare DO.
[-]
- zknill 1 day ago
  Cloudflare Sessions API and Anthropic Routines have a really similar model. Where they are hosting the 'session store' for you, and giving you access to it over long-polling (or sometimes websockets).
  It's a bit harder to do agent presence ('is the agent still there') with this model without heartbeats, but possible.
  It's good to see the industry starting to address the "durable sessions" problem, because it sucks.
jhancock 1 day ago
I built this Clojure lib for robust high scale LLM calls wherein the consumer is usually a http request waiting on an SSE stream. https://github.com/jhancock/aimee
The article states: "Most applications are built on an architecture like the one above, where there are a number of stateless horizontally scaleable server replicas that can handle client requests."
Using the library I built, I have yet to worry about this as Clojure core.async, http libs and Java VM are so rock solid, I don't have a fragile set of stateless servers. Sure, at some point there are rare edge cases but it's nice to get very far along without worrying about them.
[-]
- dgellow 1 day ago
  Im not sure what you mean by fragile stateless servers. If they are stateless, what is fragile about them?
the_gipsy 1 day ago
> Stop reading here if you just wanted the how-to. Because I’m going to talk about what I think is better, and that is probably too ‘commercial’ for some folks.
> I work for Ably, and I’m building a dedicated transport for AI applications that...
[-]
- vintagedave 1 day ago
  It's honest. If they genuinely think it's better it's fair to say so. The article up to that point seems well written over the domain (I've solved much of the same set of problems.)
  [-]
  - the_gipsy 1 day ago
    Should be at the top of the post, not at the end.
  - dgellow 1 day ago
    Agreed, I found it informative and did appreciate the reading, even with the assumption it was AI generated
_pdp_ 1 day ago
This is way too complex!
We have developed a simple API that can produce tokens and events in various formats like jsonl, sse, even csv. Cancelation can happen when the socket is closed when streaming - fully automated - or when you push an event from another endpoint so that we stop the stream midway.
Background task are also subscriable and cancelable.
see https://cbk.ai
[-]
- zknill 1 day ago
  > This is way too complex!
  100% - the argument of the article is that building any feature beyond chat-based-demos on HTTP SSE streaming is super complex. But a lot of folks still want to do it, because that's what their tech stack is. I think it's still a valuable thing to be talking about how you might do that.
TodorGrudev 1 day ago
Cancellation was the painful one for me. I have SSE streaming in a React Native app and there's no proper AbortController — ended up with a ref + interval hack to detect when the user navigates away mid-stream. Still don't have a good answer for "connection dropped, show partial response and let them retry from where it left off." Would've loved something like this.
ekojs 1 day ago
> HTTP is just not a good transport for streaming LLM tokens and for building async agentic applications
I don't know if I agree if this is a problem with SSE or HTTP. Something like a Redis Streams-backed SSE would solve most of the 'challenges' presented in the post.