Show HN: Sparrow-1 – Audio-native model for human-level turn-taking without ASR

(tavus.io)

61 points | by code_brian 13 hours ago

9 comments

ttul 1 hour ago
I tried talking to Claude today. What a nightmare. It constantly interrupts you. I don’t mind if Claude wants to spend ten seconds thinking about its reply, but at least let ME finish my thought. Without decent turn-taking, the AI seems impolite and it’s just an icky experience. I hope tech like this gets widely distributed soon because there are so many situations in which I would love to talk with a model. If only it worked.
[-]
- mavamaarten 58 minutes ago
  Agreed. English is not my native language. And I do speak it well, it's just that sometimes I need a second to think mid-sentence. None of the live chat models out there handle this well. Claude just starts answering before I've even had the chance to finish a sentence.
cuuupid 1 hour ago
The first time I met Tavus, their engineers (incl Brian!) were perfectly willing to sit down and build their own better Infiniband to get more juice out of H100s. There is pretty much nobody working on latency and realtime at the level they are, Sparrow-1 would be an defining achievement for most startups but will just be one of dozens for Tavus :)
dfajgljsldkjag 1 hour ago
I am always skeptical of benchmarks that show perfect scores, especially when they come from the company selling the product. It feels like everyone claims to have solved conversational timing these days. I guess we will see if it is actually any good.
[-]
- fudged71 1 hour ago
  Different industry, but our marketing guy once said "You know what this [perfect] metric means? We can never use it in marketing because it's not believable"
  [-]
  - khalic 48 minutes ago
    Just include some noise, it’s like the most available resource in the universe
nextaccountic 1 hour ago
> Non-verbal cues are invisible to text: Transcription-based models discard sighs, throat-clearing, hesitation sounds, and other non-verbal vocalizations that carry critical conversational-flow information. Sparrow-1 hears what ASR ignores.
Could Sparrow instead be used to produce high quality transcription that incorporate non-verbal cues?
Or even, use Sparrow AND another existing transcription/ASR thing to augment the transcription with non-verbal cues
randyburden 1 hour ago
Awesome. We've been using Sparrow-0 in our platform since launch, and I'm excited to move to Sparrow-1 over the next few days. Our training and interview pre-screening products rely heavily on Tavus's AI avatars, and this upgrade (based on the video in your blog post) looks like it addresses some real pain points we've run into. Really nice work.
mentalgear 44 minutes ago
Metric | Sparrow-1 Precision 100% Recall 100%
Common ...
[-]
- reubenmorais 19 minutes ago
  If you watch the demo video you can see how they would get this: the model is not aggressive enough. While it doesn't cut you off, which is nice, it also always waits an uncanny amount of time to chime in.
orliesaurus 2 hours ago
Literally no way to sign up to try. Put my email and password and it puts me into some wait list despite the video saying I could try the model today. That's what makes me mad about these kind of releases is that the marketing and the product don't talk together.
[-]
- qfavret 22 minutes ago
  try signing up for the API platform on the site. You can access it there
nubg 2 hours ago
Any examples available? Sounds amazing.
krautburglar 41 minutes ago
Such things were doing a good-enough job scamming the elderly as it is--even with the silence-based delays.