Frequently Asked Questions

What are your goals in creating the Stanford Cable TV News Analyzer?
Why does the Stanford Cable TV News Analyzer use automated face recognition technology?
Does the Stanford Cable TV News Analyzer attempt to identify the race of an individual?
What algorithms does the Stanford Cable TV News Analyzer use to annotate the dataset?
How do I request to be removed from the dataset?
I recognize an individual in the video, but the face in the video was not labeled with the individual’s name. Why were they not identified?
How do I report errors in video annotations?
How quickly after a program airs is the Stanford Cable TV News Analyzer updated to reflect new content?
What are some known confounds in the dataset?

What are your goals in creating the Stanford Cable TV News Analyzer?

Our goal is to provide the public with computational tools that enable large-scale, data-driven analysis of the contents of cable TV news. We believe the ability to quantitatively measure who is in the news and what is talked about will increase transparency about editorial decisions, serve as a powerful mechanism to identify forms of bias, and identify trends in an important information source that reaches millions of Americans each day.

Automated face recognition has been shown to have errors, bias, and has the potential to cause harm. Why does the Stanford Cable TV News Analyzer use this technology?

The Stanford Cable TV News Analyzer uses automated face recognition technology provided by the Amazon Rekognition Celebrity Recognition API to identify and compute the screen time of individuals on cable TV News. Face recognition, particularly when performed en masse on large image databases, is a controversial technology because of its potential to cause harm due to errors and bias, erode personal privacy, and misuse by governments and law enforcement. Due to these concerns the City of San Francisco has banned face recognition technology for law enforcement, Amazon AWS announced a moratorium on police use of its face recognition services, and IBM recently announced that it is sunsetting its face identification services entirely.

At the same time, applying face recognition to large image databases plays a role in efforts such as fighting human trafficking and identify missing children. We believe aiding public understanding of the share of screen time given to specific public figures on cable TV news programs is a new application of face recognition technology for which the potential for harm is low.

Individual privacy concerns. The Stanford Cable TV News Analyzer only applies face recognition to publicly-aired broadcast cable TV news video. Additionally, our database contains only individuals identified by the Amazon Rekognition Celebrity Recognition API, which identifies only public figures. (Amazon does not disclose their definition of "public figure".) Additionally, the Stanford Cable TV News Analyzer only permits screen time queries for individuals that have received at least ten hours of screen time as of December 31, 2021 (according to the Celebrity Recognition API results). This is a total of 1,990 individuals. The full set of individuals identified in our dataset is given on our dataset page. Individuals that wish to be removed from the dataset should email us at tvnews-project@stanford.edu.

Accuracy and bias concerns. Automated facial recognition will have errors, and studies of other face recognition services have demonstrated accuracy biases by gender and race. It is not feasible to validate the accuracy of all face identifications in our dataset, but we provide results from a number of validation efforts on our methodology page. We also point users to recent studies of accuracy of the Amazon Rekognition service. Also, to help users build trust in the accuracy of their own query results and to identify face identification errors, the Stanford Cable TV News Viewer provides the ability to directly view the video clips selected by queries.

Does the Stanford Cable TV News Analyzer attempt to identify the race of an individual?

The Stanford Cable TV News Analyzer does not attempt to determine the race of individuals. We are unaware of any computational model that can accurately estimate an individual’s race from their appearance. In the future, it may be possible to use external data sources to link public figure identities to the individual’s self-reported race. Such approaches would enable a new set of queries that could assist studies of representation in cable TV news that concern the subject of race.

I am concerned about possible sources of error and bias in my query results. What algorithms does the Stanford Cable TV News Analyzer use to annotate the dataset?

We document our data labeling algorithms as well as provide an assessment of their accuracy on the methodology page.

How do I request to be removed from the dataset?

To request removal from the dataset, please email tvnews-project@stanford.edu.

I recognize an individual in the video, but the face in the video was not labeled with the individual’s name. Why were they not identified?

The Stanford Cable TV News Analyzer uses the Amazon’s Celebrity Recognition service to identify faces. This service is only designed to identify celebrities and public individuals. In addition, the site only displays the names of individuals that have appeared on screen for at least ten hours by December 31, 2021 (according to the celebrity recognition detections). Individuals not identified by Amazon’s Celebrity Recognition service, or individuals who are identified, but only briefly appear on screen, will not be shown on the site.

Note that regardless of the success of face identification, text transcripts can always be queried for an individual’s name even if the individual’s face is not identified on screen.

I see a missed face detection or a misidentified face in the dataset. How do I report the error?

Due to the scale of our dataset, we will not be able to correct all labeling errors. However, we welcome comments and feedback via tvnews-project@stanford.edu.

How quickly after a program airs is the site updated to reflect new content?

The Internet Archive makes video data available to the Stanford Cable TV News Analyzer after a 24-hour delay. Because of this delay, as well the processing time of video analysis, new results appear on the Stanford Cable TV News Analyzer approximately 24-36 hours after a program's original air time. Please see our methodology page for more detail.

What are some known confounds in the dataset?

This list is by no means comprehensive, but is instead intended as a set of examples for the kinds of confounds that one should be aware of when using this tool. We highly recommend using the video playback functions provided by the tool to validate the results of any query.

Variation in caption spellings. The canonical spelling of a word or phrase may not be consistently reflected in the captions. For example, the word "Obamacare" often appears as "Obama care" (two separate words). "Email" is also often written as "E mail".
Changes in the face identification models. This tool relies on the Amazon Rekognition Celebrity Recognition API for face identification. Amazon's API may change without notice. For example, Laura Ingraham, a Fox News host, is no longer detected after September 2021. Similarly, if new individuals are added and begin receiving screen time, this is not a guarantee that they were not present in the historical data when face identification was performed.

Our research paper provides additional details on other patterns in the dataset.