Our goal is to provide the public with computational tools that enable large-scale, data-driven analysis of the contents of cable TV news. We believe the ability to quantitatively measure who is in the news and what is talked about will increase transparency about editorial decisions, serve as a powerful mechanism to identify forms of bias, and identify trends in an important information source that reaches millions of Americans each day.
The Stanford Cable TV News Analyzer uses automated face recognition technology provided by the Amazon Rekognition Celebrity Recognition API to identify and compute the screen time of individuals on cable TV News. Face recognition, particularly when performed en masse on large image databases, is a controversial technology because of its potential to cause harm due to errors and bias, erode personal privacy, and misuse by governments and law enforcement. Due to these concerns the City of San Francisco has banned face recognition technology for law enforcement, Amazon AWS announced a moratorium on police use of its face recognition services, and IBM recently announced that it is sunsetting its face identification services entirely.
At the same time, applying face recognition to large image databases plays a role in efforts such as fighting human trafficking and identify missing children. We believe aiding public understanding of the share of screen time given to specific public figures on cable TV news programs is a new application of face recognition technology for which the potential for harm is low.
Individual privacy concerns. The Stanford Cable TV News Analyzer only applies face recognition to publicly-aired broadcast cable TV news video. Additionally, our database contains only individuals identified by the Amazon Rekognition Celebrity Recognition API, which identifies only public figures. (Amazon does not disclose their definition of "public figure".) Additionally, the Stanford Cable TV News Analyzer only permits screen time queries for individuals that have received at least ten hours of screen time as of August 1, 2020 (according to the Celebrity Recognition API results). This is a total of 1,490 individuals. The full set of individuals identified in our dataset is given on our dataset page. Individuals that wish to be removed from the dataset should email us at firstname.lastname@example.org.
Accuracy and bias concerns. Automated facial recognition will have errors, and studies of other face recognition services have demonstrated accuracy biases by gender and race. It is not feasible to validate the accuracy of all face identifications in our dataset, but we provide results from a number of validation efforts on our methodology page. We also point users to recent studies of accuracy of the Amazon Rekognition service. Also, to help users build trust in the accuracy of their own query results and to identify face identification errors, the Stanford Cable TV News Viewer provides the ability to directly view the video clips selected by queries.
The Stanford Cable TV News Analyzer uses computer vision to make a binary assessment of an individual’s presented gender based on the appearance of their face (see our methodology page for the algorithms used to do this). Gender presentation (also referred to as gender expression) reflects an individual’s external expression of their gender (through cues such as facial features, makeup, hairstyle, and clothing), which may be different from both their gender identity and/or their birth sex. When an individual's presented gender differs from their actual gender identity, algorithmic attempts to infer gender identity from facial appearance will fail.
We recognize that treating an individual’s gender as a binary quantity, as well as assessing gender solely from an individual's appearance, is a grossly simplified treatment of a complex topic. Further, we recognize that perpetuating the notion of binary gender can cause harm to non-binary individuals. However, we believe that binary classification of presented gender still provides useful insights into the presentation of cable TV news, and illuminates important biases in the screen time given to male- and female-presenting groups. We believe these benefits justify the inclusion of binary gender labels in the tool.
Mitigating potential for harm. Automatic gender recognition can result in automated misgendering, which can be distressing and harmful, especially to transgender individuals. Prior studies of existing automatic gender recognition systems have found that error rates are higher for dark-skinned individuals and transgender individuals. We provide statistics on the accuracy of our gender classifier on our methodology page.
To mitigate potential harm due to automated misgendering, the Stanford Cable TV News Analyzer does not present gender labels in the user interface unless a user specifically "opts-in" to see these annotations by using "tag=male" or "tag=female" predicates in their search query.
We also provide the ability to report misgendered individuals via email@example.com.
The Stanford Cable TV News Analyzer does not attempt to determine the race of individuals. We are unaware of any computational model that can accurately estimate an individual’s race from their appearance. In the future, it may be possible to use external data sources to link public figure identities to the individual’s self-reported race. Such approaches would enable a new set of queries that could assist studies of representation in cable TV news that concern the subject of race.
We document our data labeling algorithms as well as provide an assessment of their accuracy on the methodology page.
To request removal from the dataset, please email firstname.lastname@example.org.
The Stanford Cable TV News Analyzer uses the Amazon’s Celebrity Recognition service to identify faces. This service is only designed to identify celebrities and public individuals. In addition, the site only displays the names of individuals that have appeared on screen for at least ten hours by August 1, 2020 (according to the celebrity recognition detections). Individuals not identified by Amazon’s Celebrity Recognition service, or individuals who are identified, but only briefly appear on screen, will not be shown on the site.
Note that regardless of the success of face identification, text transcripts can always be queried for an individual’s name even if the individual’s face is not identified on screen.
Due to the scale of our dataset, we will not be able to correct all labeling errors. However, we welcome comments and feedback via email@example.com.
The Internet Archive makes video data available to the Stanford Cable TV News Analyzer after a 24-hour delay. Because of this delay, as well the processing time of video analysis, new results appear on the Stanford Cable TV News Analyzer approximately 24-36 hours after a program's original air time. Please see our methodology page for more detail.