Note: Part II of this series, which goes into quantifying the fundamental shared elements of plot arcs, is now up here.
In this post, I'm going to combine those two projects. What can we see by looking at the different content of TV shows? Are there elements to the ways that TV shows are laid out--common plot structures--that repeat? How thematically different is the end of a show from its beginning? I want to take a first stab at those questions by looking at a couple hundred TV shows and their structure. To do that, I:
1. Divided a corpus of 80,000 movies and TV show episodes into 3 minute chunks, and then divided each show into 12 roughly-equal parts.
2. Generated a 128-topic model where each document is one of those 3-minute chunks, which should help the topics be better geared to what's on screen at any given time.
3. For every TV show, plotted the distribution of the ten most common topics with the y-axis roughly representing percent of dialogue of the show in the topic, and the x-axis corresponding to the twelfth of the show it happened in. So dialogue in minute 55 of a 60-minute show will be in chunk 11.
First a note: these images seem not to display in some browsers. If you want to zoom and can't read the legends, right click and select "view in a new tab."
Let's start by looking at a particularly formulaic show: Law and Order.
The two most common topics in Law & Order are "court case Mr. trial lawyer" and "murder body blood case". Murder is strongest in the first twelfth, when the body is discovered; "court case" doesn't appear in any strength until almost halfway through, after which it grows until it takes up more than half the space by the last twelfth.
That's pretty good straight off: the process accurately captures the central structuring element of the show, which is the handoff from cops to lawyers at the 30 minute mark. (Or really, this suggests, more like the 25 minute mark). Most of the other topics are relatively constant. (It's interesting that the gun topic is constant, actually, but that's another matter). But a few change--we also get a decrease in the topic "people kid kids talk," capturing some element of the interview process by the cops; a different conversation topic, "talk help take problem," is more associated with the lawyers. Also, the total curve is wider at the end than at the beginning; that's because we're not looking at all the words in Law & Order, just the top ten out of 127 topics. We could infer, preliminarily, that Law and Order is more thematically coherent in the last half hour than the first one: there's a lot of thematic diversity as the detectives roam around New York, but the courtroom half is always the same.
Compare the spinoffs: SVU is almost identical to the Law & Order mothership, but Criminal Intent gets to the courtroom much later and with less intensity.
See below the fold for more. Be warned: I've put a whole bunch of images into this one.
Some of the things revealed are interesting because they tell us when a show departs from its ostensible topic.
"Grey's Anatomy" (which I've never seen) appears to open as a fairly strong hospital drama, but by the end the medical content has dropped by half. It's not completely clear from the topics what's replaced it, but topics like "sorry feel really" and "remember wanted knew" grow in strength, suggesting the soapier elements get stronger through an episode.
"Sex and the City" is similar, though less marked:the light green sex topic gets less significant through the course of the episode, though the smaller light orange "New York City" topic doesn't change quite so much.
"Cheers" moves away from the bar through each episode, and into the language of apology: (this is broken into sixths rather than twelfths; see below for why).
Cop/lawyer shows often have the strongest signatures. Perry Mason, like Law & Order, doesn't get into the courtroom for quite a while: but unlike the more recent show, it also takes its time in getting to the murder (which usually isn't mentioned until almost a quarter of the way in.
"The Mentalist" moves from actually talking about the murder not into a court case, but into topics about truth and lying, and talking about "killing" and "death" (as distinguished, interestingly, from "murder" and "body"). But above all, the last half is concerned with mumbling: the topic dominated by "uh", "Uh," and "Okay" comes to dominate.
British mysteries have their own topical signature; neither cops nor lawyers, but "Inspector Professor sir Holmes." "Poirot" is typical; more about the detectives as the show proceeds, less of the upper-class "dear little course darling" chit-chat, and very little talk about how the murder actually happened until the last quarter of the episode.
Other types of dramas show fewer structural signatures, at least in their most common topics.
"The West Wing," slightly decreases the amount of time it spends talking about the presidency (at least until the last scene), and talks a bit more about "talking, helping, problems." But the signal is overall quite weak.
"The Wire" is distinguished by its slang and curses above all; and there's no strong sign of temporality in how they're used.
Comedies are less easily read in this version for two reasons. The first is that their topics seem to frequently be more conversational. (A better list of stop words might fix this). For example, "The Office" does have a business topic that generally prevails: but most of the major topics are pure filler.
More problematic is the way that I've chunked up the shows; first into 3 minute chunks, and then into twelfth of the show. This helps to keep the total number of documents down. But for twenty-minute shows, it also means that the vagaries of rounding will make certain twelfths very rare, and the charts far too bumpy. The chart for "The Simpsons" is mostly destroyed by this: only a couple episodes seem to have a chunk four out of twelve, so outer space and hospitals seem far more important to the show than they really are.
If I break it into 6 sections rather than 12, "The Simpsons" has a much clearer arc: mostly stable, with a decrease in most types of dialogue but particularly (as I noted before) in the language about "school", and an increase in the weighty topic "life death world fear heart God soul," something that's a little surprising to see in an animated comedy.
For this reason, in the appendix below I'm showing shows divided in sixths rather than twelfths.
And just to repeat at the end: Part II of this series, which goes into quantifying the fundamental shared elements of plot arcs, is now up here.
Here are 150 other shows. Let me know if there's an obvious show missing.
|A Touch of Frost|