The Cable TV News Dataset

The dataset available for analysis using Stanford Cable TV News Analyzer was provided by the Internet Archive's TV News Archive. The dataset includes near 24-7 recordings of CNN, Fox News, and MSNBC between January 1, 2010 and May 28, 2024. (The dataset updates daily, with approximately a 24-36 hour lag from the original content's air date.) In total, the dataset consists of over 370,000 hours of video and includes both TV news programming and commercial segments.

The Internet Archive's TV News Archive provided the dataset as a collection of video files, with each video corresponding to one airing of a news program (e.g., most videos are approximately one hour in length). Per-video metadata includes the name of the news program, the date/time it aired, and the channel on which it aired. Video frames range in resolution from 640x360 to 858x480. All videos include audio. 97.8% of the video files have accompanying closed caption transcripts.

From this source dataset, we also compute additional labels describing the video's contents as described on the methodology page.

Use the following links to browse the contents of the dataset.

The Videos

The Shows

The Captions

The People

The Tags (for faces)