Friday, September 9, 2016

The efficient plots hypothesis

I'm pulling this discussion out of the comments thread on Scott Enderle's blog, because it's fun. This is the formal statement of what will forever be known as the efficient plot hypothesis for plot arceology. Noble prize in culturomics, here I come.



Brief background: Enderle shows pretty persuasively that all the fundamental plot arcs described in a paper by a math-based computational story lab can be ascribed to random (brownian) noise. As I wrote earlier, and Hannah Walser explored in more depth recently, that this happens with their data isn't so surprising; the "stories" they are modeling are mostly random documents to begin with.

Still, there's some reason to think that maybe sentiment trajectories are random walks even in actual databases of stories like those Matt Jockers uses. Enderle finds that, well, weird: "Should we find that sentiment data from novels does indeed amount to “mere noise,” literary critics will have some very difficult questions to ask themselves about the conditions under which noise signifies." The idea that plots are random seems offensive to the idea of plot at all. Others in the field, like Jockers and Ted Underwood, have also expressed the idea that there should be some regularities to plot, particularly that map across genre.

I had earlier raised the idea that the null hypothesis for plot testing should be a random walk (Brownian noise, as Enderle calls it) but I thought of it as just that--a null hypothesis that indicates nothing interesting is going on.

But of course, it *would be interesting if nothing was going on.* It would demand explanation! And now I've got one: the efficient plots hypothesis, a corollary of the efficient markets hypothesis (EMH) for the literary world.

The EMH states that stock prices are efficient; you can't know reliably if they're about to go up or down, because if they were someone would have bought them. There's been a lot of research on whether stocks move in Brownian noise; they don't, totally, but they come pretty close.

The EPH, as I imagine it, says that the ideal reader can't know if the mood of a book is about to get sunnier or darker at any given point in the plot. This not because of market forces directly, but because the purpose of a narrative is to engross the reader. Engrossment proceeds through uncertainty. If you knew what was about to happen, you'd skim ahead or stop reading.

That is: at any moment in a story, the emotional trajectory is a random walk for the reader because anything else would be *boring.* And stories aren't boring.

This could be tested empirically by asking readers if a book will get more positive or more negative over the next five pages, and by how much. In a pure EPH world, they'll only be right about half the time. Enderle thinks the EPH is obviously wrong, particularly for genre fiction.

I'm not so sure. To take an example: I read some John le Carré novels over the summer. Periodically, a spy has to secretly pass from the East to the West without getting by the commies. (Through the Berlin wall, over the Chinese border to Hong Kong, etc.) Do you know if they'll make it? The emotional sentiment of the next few pages depends on whether they get killed or not. I can see two models here:
1. Genre determines plot arceology: There are conventions to the spy novel that make it possible to tell in advance.
2. The EPH: The whole point of reading a spy novel is that you don't know what will happen; the job of a spy novelist is to make you unsure.

My reading experience is much closer to the latter; that the conventions of genre fiction are *precisely* that you don't know what's going to happen next; otherwise no one would read it.
For most good genre fiction, I think this holds. Will Lockhardt/Gardner win the case? Is Don Draper going to hit the bottle or stay sober? The rise of "anyone can die" as the predominant trope of 2010s TV suggests that the economics are forcing stronger and stronger forms of the EPH onto us every day.

The major objection to this would be: "but there *are* genres where you know the outcomes precisely!" In a Hardy Boys novel, they'll rebound from danger and catch the bad guy every time. One response to this is: sure, *you* know that; but you don't read Hardy boys novels. The people who do are 10-year-olds who legitimately think that, just maybe, the killer's going to drown the brothers in the quarry and the next 20 books on the shelf will turn out to be prequels.

Even if you know how certain books will *end*, that doesn't mean that you'll ever be able to predict the next two pages, which is what this is about. I think this distinction is crucially important and maybe underestimated. Sure, a romantic comedy always has a temporary breakup in the middle; but whether that happens 40% of the way through or 70% of the way through makes all the difference; and if you've made it 90% of the way through without the breakup happening, you start to think "maybe this is one of those comedies without a breakup in it."

If the EPH holds, then, it doesn't suggest that fiction is truly arbitrary; rather, that it's an elaborately constructed game between reader and writer, socially conditioned and in no way permanent. It would suggest that there are enough fundamental plots that at any point in a book you are unsure what plot you are in; and that plots tend to wear themselves out over time.

It does completely throw into the ringer my analogy between musical tonality and emotional valence. Key signatures in music are highly predictable. But I think that's OK: it's really clear that there aren't underlying structures quite so strong as sonata form under novels; this would explain why.

For a lunatic idea, the EPH is actually empirically kind of testable. Just ask people to predict the direction of books as they're reading them. Someone could totally do this. Maybe some movie studios even do.

For more details, see my forthcoming book with Stephen Dubner, Jane Austen was a Derivatives Trader  (Harper Collins 2017).

18 comments:

  1. This was really interesting to read as a fiction writer. I wonder if the tension between knowing how a book will end and not knowing what will be in the next two pages is actually crucial to some extent.

    I recently finished my MFA and one technique that the writers there sometimes mention is the idea of pulling readers through a scene as opposed to pushing them through a scene. Basically, you say at the start of the story or scene how it ends or what the significance of it will be before going into the story in chronological order. The idea is if you say "That morning I sat down at my computer and started reading my emails" the reader isn't going to be as interested in that sentence as if you say "Let me tell you about the day I met Pablo Escobar. That morning I say down at my computer and started reading my emails". You set an endpoint but interesting enough to hook the reader and vague enough that the question of how to reach that endpoint is intriguing.

    Uncertainty is crucial to an interesting plot, but the reader also wants to trust the author knows what they're doing and where they're going. If there's too much uncertainty, it might undermine the reader's trust. Having some degree of signposting along the way probably helps.

    ReplyDelete
    Replies
    1. This is also a problem for the EPH, yeah. Because if you say "Let me tell you about the day I met Pablo Escobar" and then start talking about e-mails, it's pretty obvious that things are going to get emotionally negative. I can think of two answers, none of which I find especially convincing:

      1. A sufficiently advanced sentiment analysis will tag "Pablo Escobar" as negative, and so the sentence itself knocks the sentiment location down; the question is now just if you have a positive or negative encounter with him, relative to the encounters with Pablo Escobar people are likely to have.

      2. Although you know that the meeting with Escobar is coming, you don't know *when*. So although I can predict that "eventually something bad will happen," that's a statement so vague that it can apply to anything. And just because you say you'll meet Pablo Escobar, there's nothing to say that you (the author) won't go all Tristram Shandy and just never quite get to the point in the day when Escobar shows up, or that your main character is going to be a twenty-five year old comic character whose parents named him after a drug lord.

      Delete
  2. As a fiction writer, I find this way of looking at fictional texts fascinating and love following the discussions going back and forth with regard to analyses of this art form.

    As a philosopher, I have a quick critique of the point of this post. There are two things at work here, and they seem to get conflated in your approach.

    First thing: readers of fiction read fiction along a timeline normally beginning from the time they immerse in page one and lasting until they close the book after reading "The End." They are, let us say, "in time".

    Second thing: textual analysis (such as Jockers's—with whom, full disclosure, I have been in contact and who has performed a Sentiment Analysis on my first novel's manuscript after I self-taught R and attempted same myself. But I disgress.) takes a look at a completed text in an, let us call it, arcetextural manner. The novel is complete. It has ended, and the analyst is no longer "in time" with the text. Rather, she is breaking down its linguistic or sentimental, etc., components and comparing them to others.

    Here's an analogy: You are walking along a new road. You don't know what lies around the next bend—whether there's a hill or a river you will have to deal with. But I'm flying a drone or satellite above you and can see the length of road you are traversing and know what you're about to encounter. What's more, I've seen dozens of others taking the same route and witnessed how they dealt with these so-called unknowns.

    So, yes, the RWH or EPH works for the individual reader the first time through the text. But on second read and for analytic purposes, the plot is less important. The suspense is removed, and the elements that go into the text's work as an artwork (or genre work) can be seen for what they are. We notice, for example, bits of foreshadowing. We find parallels in subplots that give things away. We observe characters' behaviors that show us how they tend to act and witness how they conform (or don't) in the crucial moment.

    ReplyDelete
  3. Rereading is definitely a problem for this theory; the implication is that no one will re-read a book until they've forgotten most of the plot points. That seems pretty unlikely.

    It's certainly true that something like Jockers's technique takes a high-level view and compares them to others. But this is an answer to what they're finding. It's somewhat like mapping roads; but what's striking so far is that the structures that have been uncovered are not what most people were expecting, at least on the page to page level. They look less like roads that all go to Rome, and more like stock prices that fluctuate randomly.

    I don't think that plots are actually random; they're carefully constructed. (Like you say). But if they're carefully constructed to look random--so as to maintain reader interest--that would explain why the aggregate plots Jockers and others have been looking for show a less strong signal than some anticipated.

    ReplyDelete
    Replies
    1. So, how many basic plots are there? Two? Six? Thirteen? Twenty? 25? ... As many as there are works of fiction? Folk have been asking and trying to answer this question since at least Aristotle's time. It's a fun game, but I don't believe it can ever really be more than that. It's never a hard science though data aggregators can, indeed, show us some relevant & interesting trends & tendencies.

      A key assumption, here, and one I raised with MJ is that equating sentiment analysis with plot. He notes a correlation. He must. I find difficulty making that link and raised the question of dramatic irony. For example, in my ms. I used one sentence (the last) to throw the entire denouement (the protagonist's hard-won epiphany and decision to make positive changes to his life) of my novel into question. It's valence cannot be equated with everything it seeks to undermine, and the sentiment analysis is ill-equipped to bring the entire upward sweeping arc down. It treats it just like every other sentence with a similar number of words that rate analysis.

      Delete
    2. I agree with most of what you say here. (Although most of human knowledge is somewhere in between 'hard science' and a 'fun game'.) I'v raised similar questions about whether sentiment analysis is the right tool here; in my attempts at this I've used topic models to ask what the closest thing to a single overarching plot arc in a corpus, which is not a method that lends itself well to declaring any particular number of basic clusters.

      I'm also very sympathetic to the idea that the last sentence of a book potentially bears much more weight than any other; I think treating all sentences as equal deeply bears little relation to how readers read plot. I don't know if there's a good mathematical way to account for this.

      If I can toss out yet another oddball thought in the comments, because it's not worthy of living anywhere else: My gut instinct about distributions is that there are infinitely many possible plots, but that some are much more common. Distributionally I'd bet that usage follows Zipf's law; so the number of possible plots is proportional to the logarithm of all the books ever written, and the most common single ur-plot (which I'd bet is the marriage plot, in some form, or else something like Campbell's hero's journey, but who knows…) is something like 2-10% of all the stories ever told. I am suspicious of methods that too eagerly lump oddball plots into larger categories, which I think the sentiment ones probably do.

      Delete
  4. I love your term ur-plot. At my blog, I used my notion of Ur-Story—which I derived from The Epic of Gilgamesh—as the analytical framework to examine literary texts from ancients to moderns (Ivan Ilych, Hamlet, The Erasers [Robbe-Grillet], The Third Policeman, Pnin, Remainder, The Hour of the Star, The Metamorphosis, Malone Dies, Things Fall Apart, Henderson the Rain King, The Loser [Bernhard] and others). This was less a systematic academic exercise than a writer's reading—in order to inform my own work.

    My own opinion is less the marriage plot and more Campbell's (who also derived much of his view from Gilgamesh & Carl Jung). Briefly, (great) literature (fiction, poetry, tragedy, comedy) arises as a direct response to becoming conscious of the scandal of mortality—the brute fact of realizing we are all of us going to die. The forms are myriad—laughing at the absurdity of it, it's unavoidably tragic, let's investigate and find someone to blame, surely there's a way out, but love is eternal isn't it?, make hay & entertain me while the sun shines, fuck it all, and on and on. (Even myth and religion grow out of this awareness, proposing consolation and a possible solution however misguided—and much of it makes for great literature.) It was Gilgamesh's great grief at the death of his boon wildman companion that set him off on his hero's journey to inquire of Utnapishtim the secret of immortality. That's the nutshell.

    I would wager greater than 2-10% of great literature finds its motivation in this Ur-Story—possibly ALL great literature does; at least that's my feeling—my opinion. But great literature probably constitutes somewhat less than 2% of all stories ever told.

    And, of course, crucial to this Comment is the distinction between Story (which I take to be the Substance of literature) and Plot (which I take to be the Form).

    ReplyDelete
  5. Great conjecture. It's driving me crazy that I don't have time to test it. Totally doable.

    For what it's worth, my money is against the EPH. But the testable hypothesis is the thing!

    ReplyDelete
    Replies
    1. There are a few other intermediate tests someone should run here too.

      1. Do the patterns look the same if you take random subsets of stories as if you take full stories? (If so, stories are either random noise or some sort of fractal structure; the latter would be weirdly interesting too.)

      2. How accurately can a machine learning algorithm predict the next sentiment in a story? What information (ie, what time horizon of previous sentiments) makes it the best at predicting?

      These two don't even need experiments.

      Delete
  6. 1) "Even if you know how certain books will *end*, that doesn't mean that you'll ever be able to predict the next two pages, which is what this is about. I think this distinction is crucially important and maybe underestimated."

    YES.

    2) This kind of thing has come up in relation to music. As you know Leonard Meyer postulated expectation and surprise as the driving force behind emotion in music back in 1956. However, the "Surprise Symphony" will surprise you only once, or twice if you're a bit dense. How is it that music is pleasurable even once we know a piece quite well? One recent answer is that, while there's one mental module that does indeed know the musical future, there's one or more others that do not. And it's those modules that are being fooled. Alas, I don't have a citation to this idea.

    3) Finally, this seems obliquely relevant:

    Hays, D. G. (1973). "Language and Interpersonal Relationships." Daedalus 102(3): 203-216.

    pp. 204-205:

    The experiment strips conversation down to its barest essentials by depriving the subject of all language except for two pushbuttons and two lights, and by suggesting to him that he is attempting to reach an accord with a mere machine. We brought two students into our building through different doors and led them separately to adjoining rooms. We told each that he was working with a machine, and showed him lights and pushbuttons. Over and over again, at a signal, he would press one or the other of the two buttons, and then one of two lights would come on. If the light that appeared corresponded to the button he pressed, he was right; otherwise, wrong. The students faced identical displays, but their feedback was reversed: if student A pressed the red button, then a moment later student B would see the red light go on, and if student B pressed the red button, then student A would see the red light. On any trial, therefore, if the two students pressed matching buttons they would both be correct, and if they chose opposite buttons they would both be wrong.

    We used a few pairs of RAND mathematicians; but they would quickly settle on one color, say red, and choose it every time. Always correct, they soon grew bored. The students began with difficulty, but after enough experience they would generally hit on something. . . . The students, although they were sometimes wrong, were rarely bored. They were busy figuring out the complex patterns of the machine.

    But where did the patterns come from? Although neither student knew it, they arose out of the interaction of two students.

    ReplyDelete
    Replies
    1. The music analogy remains really interesting. Of course in music you do know most of what's going to happen, but some individual part is usually elusive. Maybe.

      That brings up yet another similar experiment, Claude Shannon's experiments to predict the next letter in English. He, of course, found heavy redundancy in English-language text, not a random walk. This question about emotional valence is several levels out from the resolution Shannon was looking at, and I can't quite wrap my head around the implications. But one possibility is that even some predictability of emotions in the short term (contrary my comment to Ted above) doesn't necessarily imply that the long term isn't at least somewhat pseudo-random.

      Delete
    2. Concerning music, I'm a jazz musician, and a pretty good one, too, having opened for Dizzy Gillespie on one occasion and BB King on another. That is to say, my experience is not that of a semi-competent dabbler.

      When I'm soloing, then, I am just making it up, but I don't necessarily know what's coming next despite the fact that I'm the one who determines it. Sometimes it just gets away from you; what comes out isn't what you'd been intending. That's when things get really interesting. Now you've got to get back on track so you can pretend that the 'mistake' wasn't a mistake at all, just a particularly clever feint. Hence the saying, "there are no bad notes, only bad resolutions."

      Note however that if you listen to the older guys (Louis Armstrong for example) play the same tune on different recordings, you realize that they aren't making their solo up from scratch each time. Rather, on a given tune they've got an approach worked out and they follow it each time they play the tune, with only minor variations. It's not improvisation so much as it is un-notated composition. The idea of improvising a new solo each and every time seems to have evolved with the bebop era in the early to mid-1940s.

      Delete
  7. I love this additional turn of Scott E's null hypothesis. LeCarre makes a problematic example of the genre writer, though. In some work Scott and I have been doing we find the novels of LeC to be extreme outliers in a corpus of bestsellers; an algorithm that reliably distinguishes genre bestsellers from prizewinning literary fiction gets all the LeC novels wrong. -- Jim

    ReplyDelete
    Replies
    1. Interesting. I assume that you're classifying on vocabulary, so it's *possible* that LeCarre books have a literary prizewinner's diction but a thriller's structure. (That sounds like back-jacket copy, doesn't it?). But for me to just assert that is kind of begging the question here.

      I guess another way of saying it is: do plots become more predictable in genres where they're of primary importance? I haven't really read enough true pulps to say.

      Delete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. Don't tell me the book: my guess for the story becoming more positive or negative in the next two pages (or five) is 'neither'. If the goal is 'what happens' as you discuss later, my guess is "same thing that is happening now."

    However, I'm not convinced though that an "efficient plot" is meaningful in the way you describe it. What would a perfect EPH book look like? If the criteria is that you can't predict what will happen in two or five pages, then that's a book that always zigs when others zag. That's exhausting and meandering, and I wouldn't expect that authors optimize for that in good fiction in any way comparable to how the stock market optimizes.

    I also have a few clarifying questions.

    How do we confirm the EPH? Does it assume the ideal reader or a 10-year old reader? Where do algorithms factor in? If an algorithm can predict the next two pages but a person can't, does the EPH hold or not?

    What about probabilistic predictions? How much certainty do you have to assign to your guess? If a person knows when to not be confident in their guess, does that speak to the efficiency or predictability of the plot?

    ReplyDelete
    Replies
    1. > my guess for the story becoming more positive or negative in the next two pages (or five) is 'neither'.

      Sure--but that's the a guess consistent with the plot being Brownian noise. The intuition literary scholars seem to have about plot direction is that it's determined by genre, and you could beat Brownian noise on the margin by knowing your plot.

      > What would a perfect EPH book look like? If the criteria is that you can't predict what will happen in two or five pages, then that's a book that always zigs when others zag.

      If you take the EPH seriously (which, yes, is difficult to do with a straight face) there is no such thing as a single EPH book; it's a general equilibrium where, from any given point, half of all plots zig and half zag. Neither of those moves is necessarily exhausting. It's just that half the time the butler did it, and half the time someone else did it.

      > How do we confirm the EPH? Does it assume the ideal reader or a 10-year old reader? Where do algorithms factor in? If an algorithm can predict the next two pages but a person can't, does the EPH hold or not?

      As a good hypothesis, I don't think it could be "proven"; but it should be easy to falsify across any given metric. (I hedge on metric because although the original domain is "sentiment," I don't really believe that's a super proxy for "plot.") To falsify in any sense, show that plot arcs on that metric *are* predictable in the short term. This should be easy; but the nutty thing that Enderle is finding is that it isn't as easy as it seems. (I suspect, though, that Matt Jockers has some data sitting around that could kill this thing dead. It may even be in the new book, which I haven't read yet.)

      To extinguish the EPH as a live concern, you could also come up with a more plausible causal explanation than the EPH of why plots aren't as predictable as you'd think.

      The reader it assumes is definitely a weak point. "The intended audience of the book," maybe?In theory, an algorithm might be able to exceed human performance. But since this is the sort of task that semantic tasks are pretty bad at, I'd probably take it. Probabilistic prediction is fine; in fact, I suspect the best way to score this would be using logits.

      Delete
  10. Ah yes, I removed it earlier because I was getting long-winded, but my read was that you intended for half zigs and half zags: where you don't know if the typical or atypical will happen. I meant that the stated criteria might not necessarily reflect that intent, and might favour a path of all left-turns. Maybe it does, hard to unkink the various possible interpretations in my head.

    ReplyDelete