Every "wait, let me say
that again"
— caught.
Sapari's AI reads your transcript and identifies when you restarted a sentence, trailed off, or redid a line. Each one becomes a reviewable card on the timeline.
7 days · 30 AI minutes · No credit card
So the way I think about this is — actually, the way I think about this is you need to decide what matters first.
The problem
Silence removal
doesn't catch these.
The audio is full and the transcript has words. But those words aren't the take you're keeping. Finding them manually means scrubbing and reading the transcript — exactly the post-production work you're trying to avoid.
45-min recording → 15–25 false starts.
Manual time to find them: 30–60 min.
Sapari time: under 10 min.
How it works
Semantic, not
signal-processing.
Word-level transcript
Speech-to-text produces every word with timing.
LLM reads in chunks
A language model reads overlapping chunks and flags moments where the speaker restarted the same thought.
Confidence threshold
Each detection gets a score. The aggressiveness slider sets the threshold.
This isn't pattern-matching on "um" or "uh" — it's semantic. The model understands that "the way I think — actually, the way I think" is one restart, not two separate sentences.
Controls
Aggressiveness slider.
Long false starts (10+ seconds) get a duration penalty so the model doesn't aggressively cut chunks of real content that happen to look like a restart.
Review on the timeline
Purple cards.
You decide.
"so the way I — actually, the way I think"
"let me — let me start that over"
"and if you — wait, sorry, if you want this to"
Why this is rare
Most AI tools only cut silence.
Detecting false starts requires a language model that understands intent — not a signal-processing pass that measures volume.
Sapari runs this as part of the same analysis as silence, captions, audio, and B-roll. One pipeline, one review.
See the full pipeline →Before you ask
Common questions.
Is it accurate? +
In typical recordings, the Moderate setting catches most true restarts with few false positives. Confidence scoring means you rarely get surprised — obvious restarts score high, ambiguous ones score low and only appear at Aggressive.
What if I want the restart in the final cut? +
Dismiss the card and the restart stays. The AI suggests, you decide.
Does it work in non-English? +
Yes. Transcription supports English, Spanish, Portuguese, French. The LLM handles all four. English is most extensively tested.
Does it catch um and uh? +
Not directly — false start detection focuses on restarted thoughts, not filler. But silence removal at Hyper picks them up because Hyper is tuned to remove anything that isn't speech.
What about comedic or intentional restarts? +
Dismiss the card and the AI leaves it alone. Same review model as every other detection.
Is it accurate?
In typical recordings, the Moderate setting catches most true restarts with few false positives. Confidence scoring means you rarely get surprised — obvious restarts score high, ambiguous ones score low and only appear at Aggressive.
What if I want the restart in the final cut?
Dismiss the card and the restart stays. The AI suggests, you decide.
Does it work in non-English?
Yes. Transcription supports English, Spanish, Portuguese, French. The LLM handles all four. English is most extensively tested.
Does it catch um and uh?
Not directly — false start detection focuses on restarted thoughts, not filler. But silence removal at Hyper picks them up because Hyper is tuned to remove anything that isn't speech.
What about comedic or intentional restarts?
Dismiss the card and the AI leaves it alone. Same review model as every other detection.