Our goal is to provide the public with computational tools that enable large-scale, data-driven analysis of the contents of cable TV news. We believe the ability to quantitatively measure who is in the news and what is talked about will increase transparency about editorial decisions, serve as a powerful mechanism to identify forms of bias, and identify trends in an important information source that reaches millions of Americans each day.
The Stanford Cable TV News Analyzer uses automated face recognition technology provided by the Amazon Rekognition Celebrity Recognition API to identify and compute the screen time of individuals on cable TV News. Face recognition, particularly when performed en masse on large image databases, is a controversial technology because of its potential to cause harm due to errors and bias, erode personal privacy, and misuse by governments and law enforcement. Due to these concerns the City of San Francisco has banned face recognition technology for law enforcement, Amazon AWS announced a moratorium on police use of its face recognition services, and IBM recently announced that it is sunsetting its face identification services entirely.
At the same time, applying face recognition to large image databases plays a role in efforts such as fighting human trafficking and identify missing children. We believe aiding public understanding of the share of screen time given to specific public figures on cable TV news programs is a new application of face recognition technology for which the potential for harm is low.
Individual privacy concerns. The Stanford Cable TV News Analyzer only applies face recognition to publicly-aired broadcast cable TV news video. Additionally, our database contains only individuals identified by the Amazon Rekognition Celebrity Recognition API, which identifies only public figures. (Amazon does not disclose their definition of "public figure".) Additionally, the Stanford Cable TV News Analyzer only permits screen time queries for individuals that have received at least ten hours of screen time as of August 1, 2020 (according to the Celebrity Recognition API results). This is a total of 1,543 individuals. The full set of individuals identified in our dataset is given on our dataset page. Individuals that wish to be removed from the dataset should email us at email@example.com.
Accuracy and bias concerns. Automated facial recognition will have errors, and studies of other face recognition services have demonstrated accuracy biases by gender and race. It is not feasible to validate the accuracy of all face identifications in our dataset, but we provide results from a number of validation efforts on our methodology page. We also point users to recent studies of accuracy of the Amazon Rekognition service. Also, to help users build trust in the accuracy of their own query results and to identify face identification errors, the Stanford Cable TV News Viewer provides the ability to directly view the video clips selected by queries.
The Stanford Cable TV News Analyzer does not attempt to determine the race of individuals. We are unaware of any computational model that can accurately estimate an individual’s race from their appearance. In the future, it may be possible to use external data sources to link public figure identities to the individual’s self-reported race. Such approaches would enable a new set of queries that could assist studies of representation in cable TV news that concern the subject of race.
We document our data labeling algorithms as well as provide an assessment of their accuracy on the methodology page.
To request removal from the dataset, please email firstname.lastname@example.org.
The Stanford Cable TV News Analyzer uses the Amazon’s Celebrity Recognition service to identify faces. This service is only designed to identify celebrities and public individuals. In addition, the site only displays the names of individuals that have appeared on screen for at least ten hours by August 1, 2020 (according to the celebrity recognition detections). Individuals not identified by Amazon’s Celebrity Recognition service, or individuals who are identified, but only briefly appear on screen, will not be shown on the site.
Note that regardless of the success of face identification, text transcripts can always be queried for an individual’s name even if the individual’s face is not identified on screen.
Due to the scale of our dataset, we will not be able to correct all labeling errors. However, we welcome comments and feedback via email@example.com.
The Internet Archive makes video data available to the Stanford Cable TV News Analyzer after a 24-hour delay. Because of this delay, as well the processing time of video analysis, new results appear on the Stanford Cable TV News Analyzer approximately 24-36 hours after a program's original air time. Please see our methodology page for more detail.