Auto captions

Captions in the
same pass
as the edit.

Sapari transcribes word-by-word, aligns each one to its timestamp, and generates captions that stay in sync after every cut. Margin auto-adapts per aspect ratio.

Start free trial

7 days · 30 AI minutes · No credit card

Same caption · Margin auto-adapts
YOU NEED TO DECIDE
16:9
5% margin
YOU NEED TO DECIDE
1:1
6% margin
YOU NEED TO DECIDE
9:16
12% · clears UI

Why captions matter

Most platforms
auto-mute on scroll.

A video without captions loses viewers in the first second. Doing them manually means syncing word-by-word — nobody has time. Running a separate captioning tool means another upload, another export, another sync issue.

Sapari generates captions as part of the main analysis. No separate step.

How it works

Word-by-word.
Locked to time.

01 · Transcribe

Speech-to-text

Word-level precision. Every word recognized with start and end timestamps.

02 · Align

Bound to ms

Every word locked to its exact timestamps. Captions stay in sync after silence and false start cuts.

03 · Render

Burn or toggle

Captions burn into the export or stay toggleable based on your setting.

Styling

What you can change.

Position
Top · Center · Bottom
Style
Default · Minimal · Bold
Font
Sans · Serif · Mono
Length
Short (~3–4 words) · Medium (~5–6) · Long (~7–8)
Color
Custom hex
Background box
On / off · color · opacity

Save your preferred combination as a preset and reuse it across projects.

Margin auto-adapts

One setting, three ratios.

Sapari scales the caption's vertical margin per aspect ratio so captions clear platform UI on 9:16 — without you fiddling with separate position settings per export.

Aspect Auto margin
16:9
5% margin
1:1
6% margin
9:16
12% margin · clears platform UI

Languages

Four today.

English
Spanish
Portuguese
French

Caption generation uses the detected language from your audio.

Profanity in captions

Three modes.

Off Caption every word as spoken.
Partial Censor mid-word. "f**k"
Full Censor entirely. "****"

Independent of audio censoring. Bleep audio and leave captions clean — or the other way around. More on profanity filter →

Credit cost

Half a credit if that's all you need.

1.0
Per minute

Full AI pipeline (captions + silence + false starts + audio + B-roll).

0.5
Per minute

Captions-only mode (transcription + captions).

0
Always

Manual editing and rendering. Always free.

Full pricing →

Before you ask

Common questions.

Are captions included in every plan? +

Yes, including the free trial. They're part of the analysis, not a premium add-on.

Can I edit the text after generation? +

Yes. Every caption line is editable — fix a misheard word, change capitalization, tweak punctuation.

Do captions work for thick accents or noise? +

Accent support is good; heavy background noise hurts accuracy. Running Clean Sweep in the same analysis helps — it denoises before transcription.

Can I download the transcript separately? +

SRT export is on the roadmap. Today captions render into the exported video.

What's the maximum video length? +

Same as the rest of Sapari — no hard upper limit. Long-form podcasters run 2-hour episodes.

Captions that
move with the edit.

7 days. 30 AI minutes. No credit card.

Start free trial