97.1% of the video files in the dataset have accompanying closed caption transcripts. However, is it common for caption text to be missing during commercial segments. Captions are time aligned to the video's audio track at word granularity. For example, the following are the results of aligning the phrase "I'M CHRIS CUOMO. WELCOME TO PRIMETIME." Fine-scale time information enables queries to select video segments exactly when a word is spoken.
00:00:10,730 --> 00:00:10,960I'M00:00:10,960 --> 00:00:11,190CHRIS00:00:11,210 --> 00:00:11,280CUOMO.00:00:11,280 --> 00:00:11,449WELCOME00:00:11,869 --> 00:00:11,939TO00:00:11,939 --> 00:00:12,509PRIMETIME.