Captions in seconds, not hours.

Caption Toolkit promo video

Context

Captions are essential for accessibility, clarity, and platform compliance; but within After Effects, adding them has always been painfully manual.

At Vidsy, designers could spend hours typing, syncing, and splitting text by hand just to make a short video accessible.
The Caption Toolkit set out to change that.

Process

I built the extension around the idea of one click to captions.

The tool integrates with OpenAI's Whisper for natural, conversation paced linebreaks, and AssemblyAI for precise word-by-word timestamps.

With either mode, it automatically transcribes the composition's voiceover into accurately timed text layers directly in the After Effects timeline in seconds.

Outcome

What once took hours now takes seconds.

Designers can select a VO layer or just the entire composition, hit transcribe, and within seconds have native After Effects text layer based captions ready to go.

The tool has been especially popular for the one-word-at-a-time mode, which would be almost impossible to create manually.

Beyond the time saved, it removed a tedious barrier and let motion designers focus on craft instead of busywork.

Gallery

Caption Toolkit incorporated into an existing Vidsy toolkit

Caption Toolkit incorporated into an existing Vidsy toolkit