Wednesday, July 11, 2012

Do revolutionaries really read history?

A quick post about other people's data, when I should be getting mine in order:

[Edit--I have a new post here with some concrete examples from the US Civil War of the pattern described in this post]

Michael Witmore and Robin Valenza have a post up on the Wine Dark Sea about how the kinds of books that are published can give us fascinating windows on the intellectual climate in moments of historical change. I (of course) agree strongly with this. But I want to offer an alternative, and somewhat deflating, interpretation of the central evidence they use.

Their post uses the following plot (presented by Google's Jon Orwant at a meeting with humanists) as evidence that more books about history are published (and therefore read--a difficult but not completely unwarranted leap) in periods of great revolutionary change. This jumps out, particularly,  at the English and French revolutions. The chart shows this in "general and old world history":

Joe Adelman suggests a number of problems with using book publication as a metric: several are accurate. I could offer a few more questions (eg: where's 1848?); but none would unsettle the central point. It would be, as Witmore and Valenza say, very interesting if "publishers are offering more history for readers who, perhaps, think of themselves as living through important historical changes." Even if only in those two periods.

My guess, though, is that we're seeing an artifact of data here, and not history. Here's why:

Although the full chart here is for 'old-world history', there are several subgenres making up that chart. Different sub-genres drive the spikes in the 1640s and the 1770s in the second chart. Unfortunately the labels overlap so that they're barely readable. (Why, oh Google, make a graph that places aesthetics so far above readability?)

I'm going to go out on a limb, though, and say I think I know what those lines are. I see an 'ain' at the end of the trend line for the first spike, and "France" is close to the second. I'll bet dollars to donuts, that what's going on is that more books about the "history of France" are published in the 1770s and 1780s, and more books about the "history of Britain" in the 1640s.

Now, perhaps that's because the big events were in those places. That would suggest a twist on 'we need history to understand history'--not that we turn to the classics, but rather that there's a resurgence of interest in hot spots at troubled times.

But--and again, I'm just speculating off graphs alone here--I think the explanation for that is much more mundane. I bet this can be explained by issues of cataloging that the humanists at Google probably didn't think about.

Many libraries--from whom Google is getting the data, AFAIK--file or label documents as 'history' not because they tell stories about the past but because they are themselves significant--things like these acts of the Virginia legislature get the subject heading "Virginia -- History -- Civil War, 1861-1865 -- Sources"; this Union general's report is filed with "American History". Similar documents from the 1870s, on the other hand, might not be kept at all, or might be kept but filed under 'law' rather than 'history.' (That these are extremely short documents, not what we'd think of as 'books,' is not at all atypical.) I think this is particularly frequent when documents are acquired well after they were first published, and are seen as historically valuable.

This means works will get filed as "History" which are historically important, but which are not works of history by historians. If one university had an excellent pamphlet collection filed as "history"--that might be enough to drive everything here. I suspect that this problem, or one much like it, is why English and French history spike during their revolutions.

That is to say: later generations save more history will show up as having "more history published" merely because of librarianship conventions around whether ephemera is saved, and how it is labeled. That won't correspond to reading; nor will it tell us anything major about production. But it will produce major outliers in time series that appear real enough to make professors gasp.

If the idea that we read more history in troubled times is wrong: so what? There's nothing wrong with being wrong: to tap into all that knowledge out there, we need to be wrong in public, quite frequently. (I'm taking that risk right now).

The real shame is that we can't do more than speculate. It's impossible to say without access to the data, but because of the disaster that passes for copyright law, even this metadata is too legally sensitive for Google to be able to share. It's likely that only exposure to a broader public can turn up alternative explanations like the one I'm giving here. The explanations for patterns like this might be solved by algorithmic firepower, but just as often they'll be solved by arcane knowledge from history, literature, or library science.

But we need to do it with open metadata. Open metadata, even more than open texts, is the sine qua non of digital research. It will frequently be possible to directly falsify conclusions like this if we can know what texts are driving the change. (I only know about this filing pattern because I work with a collection of open metadata--I have no idea if it continues as far back as the 1640s.) It's fine if you have to go to the library or the archive to find the actual works without having free access--that's always what historians have done, and it works relatively well for reproducible research. But so much of what we know is miscellany about those individual works; if they're not present in the research, we're abandoning almost all our expertise about this topic. We need to abandon the idea that one can usefully present big picture conclusions without some avenue into the texts undergirding it.

1 comment: