Caption Styles and Strategies for Video

Size and position by platform.

The biggest mistake is using the same caption style across every platform. Phone feed and desktop feed are different reading contexts. Captions that look fine on YouTube desktop are unreadable at 9:16 on a phone, and captions sized for TikTok look oversized on YouTube.

Working defaults used across creator communities (not platform-published specs, general conventions; adjust for your audience):

9:16 (TikTok, Reels, Shorts)

64–80px, centered or slightly above center, one or two words per line, high contrast. Short-form captions are often the design, not just the accessibility layer.

1:1 (LinkedIn, Instagram feed)

48–60px, lower-third, two to four words per line. Sized so they're readable on mobile without dominating the frame.

16:9 (YouTube, desktop web)

36–44px, bottom-left or bottom-center, full phrases per line. Desktop viewers mostly watch with sound on, so captions are a support layer.

Font weight matters more than font choice. Bold sans-serif (Inter Bold, Montserrat SemiBold, Helvetica Black) reads against busy backgrounds on mobile; thin fonts disappear. White text with a thick black stroke works on almost anything. LinkedIn's own ad guidance recommends 4:5 vertical or 1:1 square for feed performance, so the 1:1 sizing above is the safer default for professional audiences.

Burned-in vs toggleable captions.

Most platforms support soft captions: toggleable closed captions the viewer has to enable. Most creators still burn captions into the video, for three specific reasons:

Soft captions default to off on most platforms

Burned-in captions show up for everyone regardless of settings.

Soft captions don't match your style

The viewer gets the platform's default font and size, not yours. LinkedIn's auto-caption feature works but produces LinkedIn's default styling.

Soft captions have parsing failures

SRT files sometimes don't display correctly across every device or viewing context.

Given that a substantial majority of social video is watched with sound off, burning captions in trades flexibility for guaranteed display.

One exception: YouTube long-form. Upload soft captions (a .srt file) for accessibility and SEO. YouTube indexes the transcript for search. Still burn in visual captions for the first 30 seconds, because that's the retention-critical window where you can't trust closed captions to be enabled.

What caption style says about you.

Captions are the most visible design element in most creator videos. If yours look templated (the default Submagic bouncy word-by-word, the default CapCut font), viewers register that you used a tool. That's not necessarily bad, but it's not distinctive. Captions that match your brand (your font, your color, your rhythm) read as considered in a way default presets don't.

Animation is separate from style. The word-by-word animated captions that dominate TikTok are a short-form convention: they look right on short-form and overdesigned on everything else. If you're publishing across platforms from the same source, static styled captions travel better than animated ones.

How to do it in Sapari.

Upload the recording

Captions generate in the same analysis as silence, false starts, and audio cleanup.

Pick the aspect ratio first

Defaults adapt automatically to each format.

Override if needed

Font, color, position, background, and per-word highlighting are editable.

Export with captions burned in

SRT export is on the roadmap; today captions render into the video.

Captions regenerate automatically after every cut, so silence removal and false start dismissal don't break timing.

Start free trial Captions feature details →

Common questions.

Should I keep "um" in the captions if I cut it from the audio? +

No. If the audio cuts it, the caption should too. Tools that run silence removal and captioning in separate passes sometimes get out of sync here. A good pipeline regenerates captions after the edit.

What about non-English audio? +

Caption it in the spoken language. Auto-translate is a separate decision. Platforms handle multilingual audiences differently and the right answer depends on your audience.

Do I need to worry about line breaks? +

For short-form, yes: one to three words per line reads better than wrapping. For long-form, automatic line breaks at phrase boundaries are fine.

What's the smallest caption I can get away with? +

On a phone feed at 9:16, 56px is the practical floor. Below that, older viewers can't read them comfortably.

Caption styles and strategies
for video.

Size and position by platform.

Burned-in vs toggleable captions.

What caption style says about you.

How to do it in Sapari.

Upload the recording

Pick the aspect ratio first

Override if needed

Export with captions burned in

Common questions.

Captions that
people actually read.

Caption styles and strategiesfor video.

Size and position by platform.

Burned-in vs toggleable captions.

What caption style says about you.

How to do it in Sapari.

Upload the recording

Pick the aspect ratio first

Override if needed

Export with captions burned in

Common questions.

Captions thatpeople actually read.

Caption styles and strategies
for video.

Captions that
people actually read.