Audio-podcast editing doesn't translate.
Audio podcast editing is relatively hands-off. You cut dead air, maybe a cough, maybe a bad stumble. You leave in long thought-pauses, conversational asides, the occasional tangent. Audio audiences tolerate looseness because they're listening while doing something else (driving, cooking, working out), and the pauses feel natural.
Video audiences don't. They're watching. Long pauses register as dead air, tangents feel like filler, and the visual emphasis on silent moments (two people sitting there) amplifies every slow patch. A 90-minute audio podcast often works fine. The 90-minute video version of the same podcast is 20 minutes of editing away from being watchable.
What to cut that you'd leave in audio.
Loudness reference: YouTube, Spotify, Tidal, TikTok, Instagram, Amazon Music all target −14 LUFS integrated.
What to leave that a silence remover might cut.
Aggressive silence removal kills moments that work specifically because the camera is on you. Keep these.
Editing a weekly show in a day.
Serious video podcasters record weekly. The edit needs to fit into roughly a day, not three. Most of the bottleneck is the mechanical tax (silence, false starts, audio normalization, captions), all of which compound over 60–90 minutes of recording.
- 01 · Record
In Riverside, Zencastr, or similar. Local-quality per-track captures so audio mixing is possible later if needed.
- 02 · Mix down
Single MP4 with all participants merged. Riverside does this automatically.
- 03 · Run full post in one pass
Silence at Natural/Podcast pacing, false start detection, audio normalization to −14 LUFS, captions.
- 04 · Review on a timeline
Dismiss the pauses you want to keep, accept the stumbles and volume corrections.
- 05 · Export
16:9 for the main video. 9:16 ranges for short-form clips from the same edit.
How to do it in Sapari.
Upload the mixed MP4
Single file from your recorder (Riverside, Zencastr, etc., merged).
Set pacing to Natural / Podcast
Preserves conversational breath, cuts the obvious dead air.
Set false start detection to Moderate
Aggressive catches too many intentional restarts for conversational content.
Toggle Clean Sweep on
Normalizes audio across the recording to −14 LUFS.
Review the cards
Most of your time goes to the pauses-you-might-keep above.
Export 16:9 + mark 9:16 ranges
Main video at 16:9 for YouTube/Spotify Video. Mark short-form clip ranges from the same timeline.
Per-track processing (separate host/guest tracks) is on the roadmap. Today Sapari edits the mixed recording.
Common questions.
Should I cut the pre-roll chatter?
Yes, for the video version. Audio listeners tolerate pre-roll; video viewers scroll past it. If there's banter worth keeping, pull it into the middle of the episode.
What about b-roll or cutaways?
If you have them (product shots, screen recordings, visual references), place them at the moments being discussed. Most video podcasts don't use them. The format is just two talking heads, and that's fine as long as the pacing carries.
How long should a video podcast be?
YouTube retention data shows videos in the 5–10 minute range have the highest average retention, but podcast audiences have different expectations. For new audiences, 20–45 minutes is a better floor than 90. For established audiences, length matches your audience's actual listening habits. Check your retention graph.
Should I record differently for video?
A little. Look at the camera occasionally. Avoid long paper-rustling or note-reading moments. Don't wear a shirt that strobes on camera.