The Captions

97.9% of the video files in the dataset have accompanying closed caption transcripts. However, is it common for caption text to be missing during commercial segments. Captions are time aligned to the video's audio track at word granularity. For example, the following are the results of aligning the phrase "I'M CHRIS CUOMO. WELCOME TO PRIMETIME." Fine-scale time information enables queries to select video segments exactly when a word is spoken.


00:00:10,730 --> 00:00:10,960
I'M

 

00:00:10,960 --> 00:00:11,190
CHRIS

 

00:00:11,210 --> 00:00:11,280
CUOMO.

 

00:00:11,280 --> 00:00:11,449
WELCOME

 

00:00:11,869 --> 00:00:11,939
TO

 

00:00:11,939 --> 00:00:12,509
PRIMETIME.