This page documents the full query language supported by the the Stanford Cable TV News Analyzer. Prior to reading this documentation, we recommend that you read the getting started tutorial.
All queries compute the total time of video segments in the dataset that match the query's filters. Screen time queries are broken in to several parts. A basic query consists of filters separated by "AND"s.
For example, the following query computes the screen time of Kamala Harris on CNN: Likewise, "OR" is also supported. The following query counts the total video time on CNN or MSNBC. Queries can combine "AND" and "OR" using parentheses in order to construct more complex queries. If no parentheses are specified, AND precedes OR. Putting the examples above together, the following query computes the screen time of Kamala Harris on CNN or MSNBC: If the query is left blank, then no filters are applied and all of the data is counted.Filters on Entire Videos | ||
channel | description | name of the channel |
values | CNN, FOX, MSNBC | |
default | all | |
example | channel="CNN" |
|
show | description | name of the show |
values | list of shows | |
default | all | |
example | show="CNN Newsroom" |
|
hour | description | Range (inclusive) of hours in 24h format, in US eastern time. (UTC-5:00 in standard time and UTC-4:00 during daylight saving time). |
values | 0-23 | |
default | 0-23 | |
example |
hour="10"
|
|
dayofweek | description | range (inclusive) or a day in the week |
values | mon, tue, wed, thu, fri, sat, sun | |
default | mon-sun | |
example |
dayofweek="mon"
|
|
Filters on Detected Faces | ||
name |
description | face of the person with the specified name is on screen |
values |
name (See the people page for a complete list of people.) |
|
default | n/a | |
example |
name="Kamala Harris"
|
|
tag |
description | face with the specified tag(s) is on screen. Multiple tags can be specified with commas. |
values |
(non_)presenter
(See the tags page for a complete list of tags.) |
|
default | n/a | |
example |
tag="presenter"
|
|
facecount | description | number of faces on screen |
values | 1 or more | |
default | n/a | |
example | facecount=2 |
|
Filters on Closed Caption Transcripts | ||
text | description | segments where the specified text pattern appears in the captions. |
values | keywords or phrases. Use | for "or"(See Text Filter Syntax for more details of valid text patterns.) |
|
default | n/a | |
example |
text="affordable care act" text="affordable care act | obamacare | obama care" |
|
textwindow | description | Specifies how much to dilate the time interval associated with text filter matches. If the text window is 0, then text filter selects exactly the video segment when a word or phase is being said. By increasing the "window" to larger than 0, it is possible to design queries where segments matching one filter need only be within a certain amount of time of a segment matching a text filter. For example, if the word "obamacare" is said and the text window is 1 second, then each instance of "obamacare" is converted to a 1 second interval centered around the time when "obamacare" is said. Note that for long windows, overlapping intervals are merged. |
values | keyword or phrase | |
default | 1 | |
example | textwindow=10 (treat each text match as 10 seconds) |
Sometimes a topic can be defined with multiple related or synonymous
words/phrases. For example, the "European Union" can be also be referred to
as the EU or E.U. in the captions. When this is the case, use the "|"
character to delimit multiple words and phrases. For example,
text="European Union | EU | E.U."
will search for video segments where any of these three n-grams appear in the captions.
This can be repeated for an arbitrary
number of words and phrases.
You can also search for instances where words appear nearby using "&" (and).
For example, to find instances of "United" near "Airlines", use
text="United & Airlines"
.
This can be chained; for example,
text="United & Airlines & 737 MAX"
.
Not ("\") works similarly; for instance,
text="United \ States \ Kingdom"
finds instances
of "United" that are not near "States" or not near "Kingdom".
By default, the threshold for nearness is 15 seconds. This can be modified
using the following ("::") syntax: text="United \ States :: 60"
,
which sets the window for "\" to 60 seconds. Use "//" to change the window
policy to tokens; for example,
text="United \ States // 100"
finds "United"
with no instances of "States" within 100 tokens.
Text query "&" and "|" operators behave differently from AND and OR. The
latter operate on intervals, while the former give back intervals which the
latter operate on. The query text="United & Airlines"
finds
separate intervals of "United" and intervals of "Airlines",
which are nearby. These intervals are of duration "textwindow"
(by default, 1 seconds). In contrast,
text="United" AND text="Airlines"
finds intervals of "United" and "Airlines", and returns their exact time overlap.
The text grammar also supports basic composition of "&", "|", and "\".
For example, text="United \ States \ Kingdom"
is equivalent to text="United \ (States | Kingdom)"
,
expressing instances of "United" that are not near either "States" or "Kingdom".
Parentheses are necessary to separate clauses and operators may not be
mixed in a clause. The full details of the text query grammar can be found
here.
Words can be used in many inflected forms. The simplest case is
when words are singular or plural. To search for all inflected forms of
a word without specifying them manually, surround the word with
[...]
brackets. For example,
text="[truck]"
will find instances of
"truck", "trucks", and "trucking". If multiple words are surrounded by
[...]
, then inflections will be found for any of the words in the brackets.
By default, text
will precisely find
the intervals of time during which a word or phrase is spoken. This means that
each mention will likely contribute only a small fraction of a second of screen time to a query result.
Sometimes it is useful for a caption-text query to match a wider region of time around the utterance
of a word, for example if an query seeks examples where a person is on-screen within a specific amount of time
of a word being stated. The
textwindow
parameter defines how much a time of time is dilated around a
caption-text match. See the "Supported Query Filters" section for details.
Instead of computing screen time estimates in absolute time units (e.g., in minutes or hours), it can be useful to present query results as a proportion of the screen time of another query. The query language supports normalization of one query's computed time by another using NORMALIZE:
For example, the following query computes the fraction of the overall dataset that is from CNN: The following query computes the fraction of time on CNN that a news presenter is on screen: