Sapping Attention: The efficient plots hypothesis

Friday, September 9, 2016

The efficient plots hypothesis

I'm pulling this discussion out of the comments thread on Scott Enderle's blog, because it's fun. This is the formal statement of what will forever be known as the efficient plot hypothesis for plot arceology. Noble prize in culturomics, here I come.

Brief background: Enderle shows pretty persuasively that all the fundamental plot arcs described in a paper by a math-based computational story lab can be ascribed to random (brownian) noise. As I wrote earlier, and Hannah Walser explored in more depth recently, that this happens with their data isn't so surprising; the "stories" they are modeling are mostly random documents to begin with.

Still, there's some reason to think that maybe sentiment trajectories are random walks even in actual databases of stories like those Matt Jockers uses. Enderle finds that, well, weird: "Should we find that sentiment data from novels does indeed amount to “mere noise,” literary critics will have some very difficult questions to ask themselves about the conditions under which noise signifies." The idea that plots are random seems offensive to the idea of plot at all. Others in the field, like Jockers and Ted Underwood, have also expressed the idea that there should be some regularities to plot, particularly that map across genre.

I had earlier raised the idea that the null hypothesis for plot testing should be a random walk (Brownian noise, as Enderle calls it) but I thought of it as just that--a null hypothesis that indicates nothing interesting is going on.

But of course, it *would be interesting if nothing was going on.* It would demand explanation! And now I've got one: the efficient plots hypothesis, a corollary of the efficient markets hypothesis (EMH) for the literary world.

The EMH states that stock prices are efficient; you can't know reliably if they're about to go up or down, because if they were someone would have bought them. There's been a lot of research on whether stocks move in Brownian noise; they don't, totally, but they come pretty close.

The EPH, as I imagine it, says that the ideal reader can't know if the mood of a book is about to get sunnier or darker at any given point in the plot. This not because of market forces directly, but because the purpose of a narrative is to engross the reader. Engrossment proceeds through uncertainty. If you knew what was about to happen, you'd skim ahead or stop reading.

That is: at any moment in a story, the emotional trajectory is a random walk for the reader because anything else would be *boring.* And stories aren't boring.

This could be tested empirically by asking readers if a book will get more positive or more negative over the next five pages, and by how much. In a pure EPH world, they'll only be right about half the time. Enderle thinks the EPH is obviously wrong, particularly for genre fiction.

I'm not so sure. To take an example: I read some John le Carré novels over the summer. Periodically, a spy has to secretly pass from the East to the West without getting by the commies. (Through the Berlin wall, over the Chinese border to Hong Kong, etc.) Do you know if they'll make it? The emotional sentiment of the next few pages depends on whether they get killed or not. I can see two models here:
1. Genre determines plot arceology: There are conventions to the spy novel that make it possible to tell in advance.
2. The EPH: The whole point of reading a spy novel is that you don't know what will happen; the job of a spy novelist is to make you unsure.

My reading experience is much closer to the latter; that the conventions of genre fiction are *precisely* that you don't know what's going to happen next; otherwise no one would read it.
For most good genre fiction, I think this holds. Will Lockhardt/Gardner win the case? Is Don Draper going to hit the bottle or stay sober? The rise of "anyone can die" as the predominant trope of 2010s TV suggests that the economics are forcing stronger and stronger forms of the EPH onto us every day.

The major objection to this would be: "but there *are* genres where you know the outcomes precisely!" In a Hardy Boys novel, they'll rebound from danger and catch the bad guy every time. One response to this is: sure, *you* know that; but you don't read Hardy boys novels. The people who do are 10-year-olds who legitimately think that, just maybe, the killer's going to drown the brothers in the quarry and the next 20 books on the shelf will turn out to be prequels.

Even if you know how certain books will *end*, that doesn't mean that you'll ever be able to predict the next two pages, which is what this is about. I think this distinction is crucially important and maybe underestimated. Sure, a romantic comedy always has a temporary breakup in the middle; but whether that happens 40% of the way through or 70% of the way through makes all the difference; and if you've made it 90% of the way through without the breakup happening, you start to think "maybe this is one of those comedies without a breakup in it."

If the EPH holds, then, it doesn't suggest that fiction is truly arbitrary; rather, that it's an elaborately constructed game between reader and writer, socially conditioned and in no way permanent. It would suggest that there are enough fundamental plots that at any point in a book you are unsure what plot you are in; and that plots tend to wear themselves out over time.

It does completely throw into the ringer my analogy between musical tonality and emotional valence. Key signatures in music are highly predictable. But I think that's OK: it's really clear that there aren't underlying structures quite so strong as sonata form under novels; this would explain why.

For a lunatic idea, the EPH is actually empirically kind of testable. Just ask people to predict the direction of books as they're reading them. Someone could totally do this. Maybe some movie studios even do.

For more details, see my forthcoming book with Stephen Dubner, Jane Austen was a Derivatives Trader (Harper Collins 2017).

18 comments:

kbSeptember 11, 2016 at 11:17 AM
This was really interesting to read as a fiction writer. I wonder if the tension between knowing how a book will end and not knowing what will be in the next two pages is actually crucial to some extent.

I recently finished my MFA and one technique that the writers there sometimes mention is the idea of pulling readers through a scene as opposed to pushing them through a scene. Basically, you say at the start of the story or scene how it ends or what the significance of it will be before going into the story in chronological order. The idea is if you say "That morning I sat down at my computer and started reading my emails" the reader isn't going to be as interested in that sentence as if you say "Let me tell you about the day I met Pablo Escobar. That morning I say down at my computer and started reading my emails". You set an endpoint but interesting enough to hook the reader and vague enough that the question of how to reach that endpoint is intriguing.

Uncertainty is crucial to an interesting plot, but the reader also wants to trust the author knows what they're doing and where they're going. If there's too much uncertainty, it might undermine the reader's trust. Having some degree of signposting along the way probably helps.
ReplyDelete
Replies
Jim H.September 11, 2016 at 3:05 PM
As a fiction writer, I find this way of looking at fictional texts fascinating and love following the discussions going back and forth with regard to analyses of this art form.

As a philosopher, I have a quick critique of the point of this post. There are two things at work here, and they seem to get conflated in your approach.

First thing: readers of fiction read fiction along a timeline normally beginning from the time they immerse in page one and lasting until they close the book after reading "The End." They are, let us say, "in time".

Second thing: textual analysis (such as Jockers's—with whom, full disclosure, I have been in contact and who has performed a Sentiment Analysis on my first novel's manuscript after I self-taught R and attempted same myself. But I disgress.) takes a look at a completed text in an, let us call it, arcetextural manner. The novel is complete. It has ended, and the analyst is no longer "in time" with the text. Rather, she is breaking down its linguistic or sentimental, etc., components and comparing them to others.

Here's an analogy: You are walking along a new road. You don't know what lies around the next bend—whether there's a hill or a river you will have to deal with. But I'm flying a drone or satellite above you and can see the length of road you are traversing and know what you're about to encounter. What's more, I've seen dozens of others taking the same route and witnessed how they dealt with these so-called unknowns.

So, yes, the RWH or EPH works for the individual reader the first time through the text. But on second read and for analytic purposes, the plot is less important. The suspense is removed, and the elements that go into the text's work as an artwork (or genre work) can be seen for what they are. We notice, for example, bits of foreshadowing. We find parallels in subplots that give things away. We observe characters' behaviors that show us how they tend to act and witness how they conform (or don't) in the crucial moment.
ReplyDelete
Replies
BenSeptember 12, 2016 at 7:10 AM
Rereading is definitely a problem for this theory; the implication is that no one will re-read a book until they've forgotten most of the plot points. That seems pretty unlikely.

It's certainly true that something like Jockers's technique takes a high-level view and compares them to others. But this is an answer to what they're finding. It's somewhat like mapping roads; but what's striking so far is that the structures that have been uncovered are not what most people were expecting, at least on the page to page level. They look less like roads that all go to Rome, and more like stock prices that fluctuate randomly.

I don't think that plots are actually random; they're carefully constructed. (Like you say). But if they're carefully constructed to look random--so as to maintain reader interest--that would explain why the aggregate plots Jockers and others have been looking for show a less strong signal than some anticipated.
ReplyDelete
Replies
Jim H.September 12, 2016 at 4:18 PM
I love your term ur-plot. At my blog, I used my notion of Ur-Story—which I derived from The Epic of Gilgamesh—as the analytical framework to examine literary texts from ancients to moderns (Ivan Ilych, Hamlet, The Erasers [Robbe-Grillet], The Third Policeman, Pnin, Remainder, The Hour of the Star, The Metamorphosis, Malone Dies, Things Fall Apart, Henderson the Rain King, The Loser [Bernhard] and others). This was less a systematic academic exercise than a writer's reading—in order to inform my own work.

My own opinion is less the marriage plot and more Campbell's (who also derived much of his view from Gilgamesh & Carl Jung). Briefly, (great) literature (fiction, poetry, tragedy, comedy) arises as a direct response to becoming conscious of the scandal of mortality—the brute fact of realizing we are all of us going to die. The forms are myriad—laughing at the absurdity of it, it's unavoidably tragic, let's investigate and find someone to blame, surely there's a way out, but love is eternal isn't it?, make hay & entertain me while the sun shines, fuck it all, and on and on. (Even myth and religion grow out of this awareness, proposing consolation and a possible solution however misguided—and much of it makes for great literature.) It was Gilgamesh's great grief at the death of his boon wildman companion that set him off on his hero's journey to inquire of Utnapishtim the secret of immortality. That's the nutshell.

I would wager greater than 2-10% of great literature finds its motivation in this Ur-Story—possibly ALL great literature does; at least that's my feeling—my opinion. But great literature probably constitutes somewhat less than 2% of all stories ever told.

And, of course, crucial to this Comment is the distinction between Story (which I take to be the Substance of literature) and Plot (which I take to be the Form).
ReplyDelete
Replies
Ted UnderwoodSeptember 13, 2016 at 4:26 PM
Great conjecture. It's driving me crazy that I don't have time to test it. Totally doable.

For what it's worth, my money is against the EPH. But the testable hypothesis is the thing!
ReplyDelete
Replies
Bill BenzonSeptember 13, 2016 at 7:10 PM
1) "Even if you know how certain books will *end*, that doesn't mean that you'll ever be able to predict the next two pages, which is what this is about. I think this distinction is crucially important and maybe underestimated."

YES.

2) This kind of thing has come up in relation to music. As you know Leonard Meyer postulated expectation and surprise as the driving force behind emotion in music back in 1956. However, the "Surprise Symphony" will surprise you only once, or twice if you're a bit dense. How is it that music is pleasurable even once we know a piece quite well? One recent answer is that, while there's one mental module that does indeed know the musical future, there's one or more others that do not. And it's those modules that are being fooled. Alas, I don't have a citation to this idea.

3) Finally, this seems obliquely relevant:

Hays, D. G. (1973). "Language and Interpersonal Relationships." Daedalus 102(3): 203-216.

pp. 204-205:

The experiment strips conversation down to its barest essentials by depriving the subject of all language except for two pushbuttons and two lights, and by suggesting to him that he is attempting to reach an accord with a mere machine. We brought two students into our building through different doors and led them separately to adjoining rooms. We told each that he was working with a machine, and showed him lights and pushbuttons. Over and over again, at a signal, he would press one or the other of the two buttons, and then one of two lights would come on. If the light that appeared corresponded to the button he pressed, he was right; otherwise, wrong. The students faced identical displays, but their feedback was reversed: if student A pressed the red button, then a moment later student B would see the red light go on, and if student B pressed the red button, then student A would see the red light. On any trial, therefore, if the two students pressed matching buttons they would both be correct, and if they chose opposite buttons they would both be wrong.

We used a few pairs of RAND mathematicians; but they would quickly settle on one color, say red, and choose it every time. Always correct, they soon grew bored. The students began with difficulty, but after enough experience they would generally hit on something. . . . The students, although they were sometimes wrong, were rarely bored. They were busy figuring out the complex patterns of the machine.

But where did the patterns come from? Although neither student knew it, they arose out of the interaction of two students.
ReplyDelete
Replies
UnknownSeptember 16, 2016 at 11:38 AM
I love this additional turn of Scott E's null hypothesis. LeCarre makes a problematic example of the genre writer, though. In some work Scott and I have been doing we find the novels of LeC to be extreme outliers in a corpus of bestsellers; an algorithm that reliably distinguishes genre bestsellers from prizewinning literary fiction gets all the LeC novels wrong. -- Jim
ReplyDelete
Replies
UnknownSeptember 16, 2016 at 11:38 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
UnknownSeptember 28, 2016 at 7:15 PM
Don't tell me the book: my guess for the story becoming more positive or negative in the next two pages (or five) is 'neither'. If the goal is 'what happens' as you discuss later, my guess is "same thing that is happening now."

However, I'm not convinced though that an "efficient plot" is meaningful in the way you describe it. What would a perfect EPH book look like? If the criteria is that you can't predict what will happen in two or five pages, then that's a book that always zigs when others zag. That's exhausting and meandering, and I wouldn't expect that authors optimize for that in good fiction in any way comparable to how the stock market optimizes.

I also have a few clarifying questions.

How do we confirm the EPH? Does it assume the ideal reader or a 10-year old reader? Where do algorithms factor in? If an algorithm can predict the next two pages but a person can't, does the EPH hold or not?

What about probabilistic predictions? How much certainty do you have to assign to your guess? If a person knows when to not be confident in their guess, does that speak to the efficiency or predictability of the plot?
ReplyDelete
Replies
UnknownSeptember 29, 2016 at 12:44 PM
Ah yes, I removed it earlier because I was getting long-winded, but my read was that you intended for half zigs and half zags: where you don't know if the typical or atypical will happen. I meant that the stated criteria might not necessarily reflect that intent, and might favour a path of all left-turns. Maybe it does, hard to unkink the various possible interpretations in my head.
ReplyDelete
Replies

Add comment