Use case · How-to

Caption styles and strategies
for video.

Captions aren't optional. Most social feeds autoplay video muted, which means a video without readable on-screen text loses most of its audience in the first few seconds. LinkedIn explicitly recommends captions for all feed video, and Meta's autoplay across Facebook and Instagram is muted by default. What's less obvious is that the style of the captions matters as much as having them. Badly-sized or badly-styled captions perform worse than no captions because they signal sloppiness.

Size and position by platform.

The biggest mistake is using the same caption style across every platform. Phone feed and desktop feed are different reading contexts. Captions that look fine on YouTube desktop are unreadable at 9:16 on a phone, and captions sized for TikTok look oversized on YouTube.

Working defaults used across creator communities (not platform-published specs, general conventions; adjust for your audience):

9:16 (TikTok, Reels, Shorts)
64–80px, centered or slightly above center, one or two words per line, high contrast. Short-form captions are often the design, not just the accessibility layer.
1:1 (LinkedIn, Instagram feed)
48–60px, lower-third, two to four words per line. Sized so they're readable on mobile without dominating the frame.
16:9 (YouTube, desktop web)
36–44px, bottom-left or bottom-center, full phrases per line. Desktop viewers mostly watch with sound on, so captions are a support layer.

Font weight matters more than font choice. Bold sans-serif (Inter Bold, Montserrat SemiBold, Helvetica Black) reads against busy backgrounds on mobile; thin fonts disappear. White text with a thick black stroke works on almost anything. LinkedIn's own ad guidance recommends 4:5 vertical or 1:1 square for feed performance, so the 1:1 sizing above is the safer default for professional audiences.

Burned-in vs toggleable captions.

Most platforms support soft captions: toggleable closed captions the viewer has to enable. Most creators still burn captions into the video, for three specific reasons:

Soft captions default to off on most platforms
Burned-in captions show up for everyone regardless of settings.
Soft captions don't match your style
The viewer gets the platform's default font and size, not yours. LinkedIn's auto-caption feature works but produces LinkedIn's default styling.
Soft captions have parsing failures
SRT files sometimes don't display correctly across every device or viewing context.

Given that a substantial majority of social video is watched with sound off, burning captions in trades flexibility for guaranteed display.

One exception: YouTube long-form. Upload soft captions (a .srt file) for accessibility and SEO. YouTube indexes the transcript for search. Still burn in visual captions for the first 30 seconds, because that's the retention-critical window where you can't trust closed captions to be enabled.

What caption style says about you.

Captions are the most visible design element in most creator videos. If yours look templated (the default Submagic bouncy word-by-word, the default CapCut font), viewers register that you used a tool. That's not necessarily bad, but it's not distinctive. Captions that match your brand (your font, your color, your rhythm) read as considered in a way default presets don't.

Animation is separate from style. The word-by-word animated captions that dominate TikTok are a short-form convention: they look right on short-form and overdesigned on everything else. If you're publishing across platforms from the same source, static styled captions travel better than animated ones.

How to do it in Sapari.

01

Upload the recording

Captions generate in the same analysis as silence, false starts, and audio cleanup.

02

Pick the aspect ratio first

Defaults adapt automatically to each format.

03

Override if needed

Font, color, position, background, and per-word highlighting are editable.

04

Export with captions burned in

SRT export is on the roadmap; today captions render into the video.

Captions regenerate automatically after every cut, so silence removal and false start dismissal don't break timing.

Common questions.

Should I keep "um" in the captions if I cut it from the audio? +

No. If the audio cuts it, the caption should too. Tools that run silence removal and captioning in separate passes sometimes get out of sync here. A good pipeline regenerates captions after the edit.

What about non-English audio? +

Caption it in the spoken language. Auto-translate is a separate decision. Platforms handle multilingual audiences differently and the right answer depends on your audience.

Do I need to worry about line breaks? +

For short-form, yes: one to three words per line reads better than wrapping. For long-form, automatic line breaks at phrase boundaries are fine.

What's the smallest caption I can get away with? +

On a phone feed at 9:16, 56px is the practical floor. Below that, older viewers can't read them comfortably.

Captions that
people actually read.

7 days. 30 AI minutes. No credit card.

Start free trial