Thursday, May 9, 2013

What years do historians write about?

Here's some inside baseball: the trends in periodization in history dissertations since the beginning of the American historical profession. A few months ago, Rob Townsend, who until recently kept everyone extremely well informed about professional trends at American Historical Association* sent me the list of all dissertation titles in history the American Historical Association knows about from the last 120 years. (It's incomplete in some interesting ways, but that's a topic for another day). It's textual data. But sometimes the most interesting textual data to analyze quantitatively are the numbers that show up. Using a Bookworm database, I just pulled out from the titles the any years mentioned: that lets us what periods of the past historians have been the most interested in, and what sort of periods they've described..

*Townsend is now moving on to the American Academy of Arts and Sciences, where I'm excited to see that he'll manage the Humanities Indicators—my first real programming/data project was putting together the first version of them together with Malcolm Richardson immediately after college.

Numbers between 500 and 2000 are almost always years. You can see here that the vast bulk of historical study has been in the period since 1750: the three spikes out of the landscape correspond to the Civil War and the two world wars. Output decreases in the late 20th century in large part because the data set goes back to about 1850; but as we'll see in the next chart, not entirely.

*This exaggerates the case somewhat in favor of post-1800 years, because before the modern period historians are more likely to talk about longer periods: "Late Antiquity," "The Ming Dynasty," "Elizabethan England." I find it impossible to believe they're doing so in a way that fundamentally changes the distribution above, though. As always, you should feel free anything you find possible.

More surprising is how the spread of years changes over time. Below, each of the dots represents the use of a year in a dissertation title: the blue and red lines are the mean and median years written about, respectively, in a moving 20-year window. (I've excluded years before 1500 for legibility, and to antagonize medievalists).

For most of the 20th century, the typical dissertation became later and later in time. In both cases, the early 1980s saw a steep rise up—I suspect that has to do with changes in the schools the AHA has been tracking rather than a more interesting explanation (the rise of cultural history, say). But recently, the forward march has stalled. The median year mentioned in titles has been stuck at 1900 since the mid-1980s; the mean year has actually decreased over the last 30 years, despite the additional 30 years of history to write about.[Edit--I'm going to provisionally retract this statement--if I expand the sample to correctly parse years like "1848-96" to include 1896, the mean merely stalls out]. This may reflect some conservative choices by dissertators: nowadays, a dissertation about the 1970s--ie, 40 years ago--is seen as pushing the modern boundary, while 40 years ago, they might have written about something only 20 years past. (My crazy pet theory--dissertators see "history" as the period before they were born, and as age at degree has gone up, that pushes back the time periods we'll write about.
I have zero evidence for this).

Although the line of history has stopped moving, the time periods taken in by historians have gotten longer. There are 8,000 dissertations out of the 30,000 in the set that have two years in the title: usually, those are start and end dates. (If there are three or more, I take the outer limits). We can see that the average time span covered by a dissertation has shifted from 20 to 30 years at midcentury to 75 to 100 today. (Take care--log scale on the left here).

So since about 1965, dissertations have covered longer and longer periods. (The data is sparse, but there's some reason to think there might even be a trend toward more focused dissertations until the 1970s). [Edit--with parsing of decades, this trend is less dramatic but still present. Graphs later).]

We can also look at this by the period of the dissertation, not the time it was written. The pattern you'd expect is longer dissertations about periods longer ago: 1933-1939 still seems like a useful period of time to study in a ramp-up to the war, but 1765-1775 is probably too short. That's indeed what we see, but it's not a steady decline: (I've pushed the window for the moving averages here out to a full century.)

Instead, the differences we have seem to match up against well-understood periods. Dissertations about the late middle ages typically cover a century (although particularly in that period, they tend to explicitly use phrases like "Fourteenth century" as well); the lengths are possibly a bit longer around 1000, and they drop dramatically immediately after the  renaissance, where they plateau at about 60 years for most of the early modern period. Those ending around 1900 are the shortest, about 30 years (1865-1900 is a classic American periodization, and 1870-1900 makes sense for French and German historians), and then they creep up through the twentieth century.

You see a funny curved line toward the right--that's the moving frontier of "1900-1945," "1900-1989," and so forth. The bump up at the end may be an artifact of some sort, but I think it's partly real--the median length for dissertations ending in 2000 is 55 years, which means that dissertators tend to take on the full post-war period at once. (Plus, remember, dissertations ending in the 1990s were by definition written after 1990, when coverage had gotten broader).

So that's the general outline. The next interesting thing to do would be to try to recover the most interesting years and the default periodizations from the set: maybe I'll get a chance soon. I also have a much prettier, wackier visualization of this same data sitting around somewhere. But as I worry more about my own dissertation, I felt like tossing something like this up on the blog.


  1. Thanks for this, Ben. Two quick questions:

    1. I was surprised you didn't invoke 1923 as a possible explanation for the stalling out around 1900. Especially as more and more sources are accessed online, it seems like the copyright cutoff will be increasingly important.

    Side-note: My own pet theory is that this (a) has/will increase the popularity of the GAPE relative to later stuff and (b) will also have a big effect on how we periodize that notoriously sketchy period.

    2. Might dissertations that designate periods *with numbers* skew short? i.e. if you're "going big," might you be more apt to bury (or highlight?) that ambition/hubris by using phrases like "Restoration," "Victorian," &c.? Any way to check that?

  2. 1923 is not yet a problem. It will be a mess, though, because really what we should be doing (IMHO) for general cultural history work is dropping the GAPE altogether and calling the period 1876-1945, which makes much more sense. But we won't for copyright reasons.

    Months ago I did a more complicated version that included things like "19th century" as 1800-1899, "the 1950s" as 1950-1959. I should check it matches up. But while I think those effects exist, I don't think they're going to change the direction on the curve. I suppose I could check for adjectives or something, but a comprehensive list is pretty hard.

  3. Suggestion, you could test your own pet theory, by deducting say the midpoint of the period covered by the thesis from its publication date. I'd be curious to see if there were periods when people were looking into relatively more recent events, or when they were writing about the more remote stuff...

  4. Just curious, there appears to be a step increase in number of publications at 1990. Does this reflect online availability of some journals? If a non random set of journals don't go back earlier than 1990 and if those journals specialize in older time periods, might the negative shift in the last 30 years be an artifact? Just curious, it may not be.

    1. It reflects a major change in the departments that the AHA has dissertation titles from, so there are definitely quite a few artifacts possible in the set. I think the negative shift over the last 30 years is not one of them, though, because the same trend occurs after the break (eg, 1991-2009 disses cover a longer period than 2003-2012 dissertations, but the sampled institutions are roughly comparable.) That list should be posted, though.

  5. Hypothesis about the increasing length of periods studied: a shift from political/military history to social history. Easy enough to check: you could mine the titles for words that correlate positively with period-length.

    1. Social-cultural history is definitely a lot of it. I have a very cool but slightly incomprehensible visualization that charts length and period against title keyword, and the "War" ones are definitely the shortest of all. I hadn't thought to just run a straightforward model on the keywords; I'll do that.

  6. Terrific--he says with some narcissism--as you know I've tried something similar with dates in subject headings for books and articles. I'm interested in the y = x line on the "years mentioned vs. diss. year" chart. Points on that line give us a sense of when dissertations reach up to the contemporary (and it would be interesting to consider the chronological reach of just this subset of diss.s). It looks like there are some points even as early as the interwar period, but it clearly gets more opaque over time, even though "the forward march has stalled."

    Disappointing not to see a single dissertation whose period ends in its own future.

    1. There was one that ended in 2030 or something like that. (And another than ended in the 500,000s sometime.) But convenient cropping can change that.

      Interpreting "20th century" to mean "1900-1999" does produce a lot of future-ending dates, which is worth doing.

  7. Payday loans revise exceedingly wean away detach from the standard loans mete out-of-the-mill provided by banks in lapse this trade name assign of conformation is unguarded,