Use case · How-to

How to edit a podcast
for the video version.

Video podcasts have eaten the category. Most serious podcasts now publish a video version alongside audio, and many new podcasts launch video-first. The problem: most podcasters still edit their shows the way they edited audio. Video has different goals, different tolerances, and a different audience. Edit it like audio and the video version underperforms.

Audio-podcast editing doesn't translate.

Audio podcast editing is relatively hands-off. You cut dead air, maybe a cough, maybe a bad stumble. You leave in long thought-pauses, conversational asides, the occasional tangent. Audio audiences tolerate looseness because they're listening while doing something else (driving, cooking, working out), and the pauses feel natural.

Video audiences don't. They're watching. Long pauses register as dead air, tangents feel like filler, and the visual emphasis on silent moments (two people sitting there) amplifies every slow patch. A 90-minute audio podcast often works fine. The 90-minute video version of the same podcast is 20 minutes of editing away from being watchable.

What to cut that you'd leave in audio.

Long thought-pauses (over ~1.5s)
In audio they breathe. On video they're dead air.
Host/guest volume mismatch
Normalize to −14 LUFS, the loudness target across YouTube, Spotify, and most major platforms.
The tangent that doesn't resolve
If it doesn't tie back, cut it. Video viewers can't fast-forward through it the way audio listeners can.
The first minute of pleasantries
"Hey, how's it going, thanks for having me" plays fine on audio. On video, viewers scroll past.
The pre-recording recap
Audio listeners don't care. Video viewers can see you glancing at notes and it reads as unprofessional.

Loudness reference: YouTube, Spotify, Tidal, TikTok, Instagram, Amazon Music all target −14 LUFS integrated.

What to leave that a silence remover might cut.

Aggressive silence removal kills moments that work specifically because the camera is on you. Keep these.

The pause after a hard question
"So, why did you leave?" ... three seconds ... "That's a good question." The pause is the answer's setup. Don't cut it.
The laugh
Video shows reactions, and shared laughter is retention gold. Audio podcasts often cut laughter for pacing; video should keep it.
The interruption that lands
If one person cuts the other off and it works (conflict, surprise, punchline), keep it.
The visible thinking
A speaker looking up while formulating a complex answer reads as thoughtful on video. The same moment in audio is just silence.

Editing a weekly show in a day.

Serious video podcasters record weekly. The edit needs to fit into roughly a day, not three. Most of the bottleneck is the mechanical tax (silence, false starts, audio normalization, captions), all of which compound over 60–90 minutes of recording.

  1. 01 · Record

    In Riverside, Zencastr, or similar. Local-quality per-track captures so audio mixing is possible later if needed.

  2. 02 · Mix down

    Single MP4 with all participants merged. Riverside does this automatically.

  3. 03 · Run full post in one pass

    Silence at Natural/Podcast pacing, false start detection, audio normalization to −14 LUFS, captions.

  4. 04 · Review on a timeline

    Dismiss the pauses you want to keep, accept the stumbles and volume corrections.

  5. 05 · Export

    16:9 for the main video. 9:16 ranges for short-form clips from the same edit.

How to do it in Sapari.

01

Upload the mixed MP4

Single file from your recorder (Riverside, Zencastr, etc., merged).

02

Set pacing to Natural / Podcast

Preserves conversational breath, cuts the obvious dead air.

03

Set false start detection to Moderate

Aggressive catches too many intentional restarts for conversational content.

04

Toggle Clean Sweep on

Normalizes audio across the recording to −14 LUFS.

05

Review the cards

Most of your time goes to the pauses-you-might-keep above.

06

Export 16:9 + mark 9:16 ranges

Main video at 16:9 for YouTube/Spotify Video. Mark short-form clip ranges from the same timeline.

Per-track processing (separate host/guest tracks) is on the roadmap. Today Sapari edits the mixed recording.

Common questions.

Should I cut the pre-roll chatter? +

Yes, for the video version. Audio listeners tolerate pre-roll; video viewers scroll past it. If there's banter worth keeping, pull it into the middle of the episode.

What about b-roll or cutaways? +

If you have them (product shots, screen recordings, visual references), place them at the moments being discussed. Most video podcasts don't use them. The format is just two talking heads, and that's fine as long as the pacing carries.

How long should a video podcast be? +

YouTube retention data shows videos in the 5–10 minute range have the highest average retention, but podcast audiences have different expectations. For new audiences, 20–45 minutes is a better floor than 90. For established audiences, length matches your audience's actual listening habits. Check your retention graph.

Should I record differently for video? +

A little. Look at the camera occasionally. Avoid long paper-rustling or note-reading moments. Don't wear a shirt that strobes on camera.

A video version
that doesn't eat your week.

7 days. 30 AI minutes. No credit card.

Start free trial