The Captions

97.8% of the video files in the dataset have accompanying closed caption transcripts. However, is it common for caption text to be missing during commercial segments. Captions are time aligned to the video's audio track at word granularity. For example, the following are the results of aligning the phrase "I'M CHRIS CUOMO. WELCOME TO PRIMETIME." Fine-scale time information enables queries to select video segments exactly when a word is spoken.

00:00:10,730 --> 00:00:10,960
I'M
 
00:00:10,960 --> 00:00:11,190
CHRIS
 
00:00:11,210 --> 00:00:11,280
CUOMO.
 
00:00:11,280 --> 00:00:11,449
WELCOME
 
00:00:11,869 --> 00:00:11,939
TO
 
00:00:11,939 --> 00:00:12,509
PRIMETIME.