How much to cut depends on what you're making.
Pacing isn't universal. A podcast audience tolerates long pauses because they're listening while doing something else, and the pauses give them space to process. A TikTok audience won't tolerate much silence at all because the next video is one thumb-flick away. The right threshold depends on format, viewer context, and content density.
Starting thresholds by genre. These are working ranges from creator communities, not platform-published standards. Test and adjust from your own retention graphs.
Adjust from your analytics. YouTube Studio's "Intro" metric tells you what percentage of viewers made it past the first 30 seconds. If that number is dropping, tighter pacing in the opener usually helps.
The pause before a punchline is the joke. The pause after a strong claim lets the viewer register it. Any tool worth using has a review step.
What silence removal doesn't fix.
Silence is one edit. A finished video usually needs more. False starts and stumbles ("wait, let me say that again") survive the cut, because the audio is full and only the words are wrong. Filler words like "um" usually slip through too, unless the threshold is aggressive. Host/guest volume mismatch, background noise, and missing captions are all untouched, and captions aren't optional on feeds where videos autoplay muted by default.
Silence removal alone leaves a lot of the post-production undone. Purpose-built tools, Sapari among them, handle the rest of the pipeline in the same pass, which is why most serious creators don't run silence removal in isolation.
For the time math: manual silence cutting in a pro editor takes a few minutes of work per minute of final video, which adds up to an afternoon for a 45-minute recording. AI silence removal cuts that to a few minutes of analysis plus review. The tools charge $7–31/month for enough capacity to do several videos a week. The break-even is obvious for anyone publishing more than once a month.
How to do it in Sapari.
Upload your recording
MP4, MOV, or common video formats.
Pick a pacing setting
Slider runs from Off (keeps natural pauses) to Hyper/TikTok (removes anything that isn't speech). Start at Balanced for general YouTube, Natural/Podcast for podcast audio, Hyper for short-form.
Wait for analysis
Captions, false starts, and audio cleanup run in the same pass.
Review the cards
Every detected silence is an orange card on the timeline. Dismiss the pauses worth keeping; drag boundaries on anything in between.
Export
16:9, 9:16, 1:1, or a custom aspect ratio from the same timeline.
Common questions.
How do I know if I'm cutting too aggressively?
Watch the first 60 seconds at full speed. If it feels breathless, loosen. If sentences trail and you lose interest, tighten. YouTube Studio's retention graph is the longer-term signal. A sharp drop-off in the first 30 seconds usually means hook pacing.
Should I cut differently in different parts of the same video?
For hook-heavy formats, yes. Tighten the first 15–30 seconds to survive the scroll-past window, loosen for the body. Most tools don't support section-level pacing today. The workaround is an aggressive overall setting plus manually dismissing over-cuts in the body.
What does "Hyper" sound like?
Words run into each other, breath gets cut. It's the TikTok default and unnerving on long-form.
Can I remove silence from a video I've already edited?
Yes, but the results are worse: you've already made pacing decisions the AI doesn't know about. Run silence removal on the raw recording instead.