Thursday, June 16, 2011

What's new?

Let me get back into the blogging swing with a (too long—this is why I can't handle Twitter, folks) reflection on an offhand comment. Don't worry, there's some data stuff in the pipe, maybe including some long-delayed playing with topic models.

Even at the NEH's Digging into Data conference last weekend, one commenter brought out one of the standard criticisms of digital work—that it doesn't tell us anything we didn't know before. The context was some of Gregory Crane's work in describing shifting word use patterns in Latin over very long time spans (2000 years) at the Perseus Project: Cynthia Damon, from Penn, worried that "being able to represent this as a graph instead by traditional reading is not necessarily a major gain." That is to say, we already know this; having a chart restate the things any classicist could tell you is less than useful. I might have written down the quote wrong; it doesn't really matter, because this is a pretty standard response from humanists to computational work, and Damon didn't press the point as forcefully as others do. Outside the friendly confines of the digital humanities community, we have to deal with it all the time.

Now, there are bunch of responses to this question on the level of pure research. Just a few in passing: Knowledge of overall trends from arbitrary sampling can suffer major confirmation biases; by definition, only the rarest research is truly groundbreaking, and confirmatory research is important and underprivileged across all fields in the academy; there's a difference between knowing the existence of a trend and knowing the magnitude and contours of that trend. It's easy to go on.

But this time, the question that jumps out for me is Tonto's: What do you mean "we," kemosabe? Just who is it that already knows about these trends? The obvious answer, presumably, is that it's some academic field or subfield. Our expert speaks from authority to say that the research doesn't contribute to their fields. (Note that the statement can be exclusionary: an implication is that if 'you' the researcher find this discovery interesting, you must not really be in the field, even if you're a professor in it.) But though the field is important, it's more complicated. I've read a lot of pieces in the ever-lively crisis-of-the-humanities/defense-of-the-humanities genre, and pretty much all of them would agree that "we" also means the culture as a whole: scholars know it in their capacity as the keepers of the flame of knowledge. And for that we, different types of knowledge reshaping do actually contribute to what 'we' know.

This struck me in Damon's commentary because she mentioned elsewhere that she was working on a translation of Tacitus. I'm outside the field, obviously, but I still feel pretty confident in saying that putting Tacitus into modern English contributes very little to the body of scholarly knowledge. Jack Gladney notwithstanding, scholars speak the language of their field. If we think that kind of work broadens knowledge, it's because it makes Tacitus available to the much larger group of people who can't read Latin. If translations are a worthy activity for senior scholars, why aren't data representations?

I can think of a couple potentially concerning reasons that humanists don't think this work increases what 'we' know. The first is that while humanists care about non-Latin readers knowing things about Tacitus, we don't care about people who are more persuaded by quantitative data than by anecdotal impressions. Requiring numbers for proof is naive empiricism, blind to the complexities of human experience, etc. While there's certainly an undercurrent of this thinking, I don't think it's insuperable or always present in these critiques: at least in history, I've long been struck by how often maps and stats get used in lecture courses by faculty who would never use them in their published work.

Moreover, plenty of humanists themselves are interested in this type of knowledge--that's what's driving much of the interest in reaching out to new methods by humanists today now that the data is available. Thus even within the field, there's been an undercurrent of people who don't find our conclusions from traditional reading completely persuasive: I, for one, love to see more solid evidence on a few canards of historical interpretation. (The Culturomics keynote, for example, has this slide, which helps answer some live questions about things many historians claim to simply 'know' about the transition of "the United States" from a plural to a singular subject around the Civil War).

But if we do accept that new representations help persuade different groups of people, including some who aren't obviously outside the scholarly field, why don't they expand what 'we' know in a real way? I think it has to do with what one of the Digging into Data speakers (can't remember who… ) talked about as the privileging of method over questions in the humanities. Learning that language changed by looking at a chart isn't real knowledge, the argument would go, not like knowledge gained by reading lots of books. Even if I read a result off a google ngram, the only way to confirm its truth is to ask someone who's actually read all the books. It's easy to make a mistake off a graph, so the only real knowledge is rooted in reading in reading techniques anyway. Humanists would fail in their obligations to students if they let them reach conclusions through charts rather than through extended reading.

Now, a methodological fight may be coming, and it might be fun. I think a lot of participants would like this to be a purely epistemological issue. JB Michel briefly mentioned Viennese logical positivism in the keynote while suggesting that now we can speak quantitatively about culture, although way back when it wasn't possible. Many more humanists, I suspect, would think that we still can't—that humanistic questions are by definition not tractable to purely quantitative analysis.

So far as possible, I want to sidestep those issues to point out some more pragmatic problems with defending the scholarly status quo. Although for many humanists defending reading seems like a warmly resounding defending of human practices in texts, for students and outsiders it can seem much closer to an unquestionable assertion of authority. To assert a trend on the basis of experience that is not open to critical interrogation tends to tighten the circle of 'we' for whom humanities knowledge is accessible enormously. That's a mistake; trust in authority is not a core humanistic value. Neither is dismissing the relevance of particular types of learning. By making it easier to draw conclusions about the past, quantification allows us to enormously broaden the circle of people who can know things about the past.

I say 'know' a bit uncautiously, but I do think there is an enormous difference between someone told by an authority figure that language use changed, and someone given a graph to figure it out. Even on a printed page, a chart invites an engaged reading in a way that a simple pronouncement does not. It's hard to overstate how important that is. Now, a chart doesn't need to be quantitative--I actually think dynamically generated concordances might allow people to reach conclusions in the same way without statistics, although with a bit more effort and a bit more computing power. But if it changes the number of people to whom basic knowledge about culture is available, even if it doesn't change the type of knowledge, that serves the purpose of the humanities better than anything.

A concluding parable: Imagine a world with no maps. Most people know only their immediate neighborhoods, but a few social misfits spent their twenties driving the long hauls between cities, sacrificing fame, fortune, and family to learn the lay of the land. If you want to get from New York to midcoast Maine, they tell you about how to take the Hutchison to the Merritt, about the I-84 turnoff from 91 just before Hartford, about the merits of taking the coastal route north from Portland or following the interstate to Augusta. They tell you that they recall a friend who wrote a book mentioning a cutoff from Route 1 that saves a few miles by skipping Rockland—Maine route 90, route 95, something like that—you might want to look into further. This is a useful service. Some people make careers out of it.

If someone walks into this world with a stack of Hagstrom atlases, what happens? Those people go on about how those maps can't capture the rush hour traffic in Hartford or the backups near Wiscasset on summer weekends, about how you'd never know from a map to buy your gas in Massachusetts and your alcohol in New Hampshire. They say they've already driven all these roads; the maps don't tell us anything that we don't know already. Real knowledge of the terrain can only be gained by driving it.

 In a way, they'd be right. The routes that a mapreader gets may be more interesting at times, but they will also be shallower. If they rely on a completely algorithmic solution (Google Maps!) they will frequently get terrible results. But there's no surer way to avoid more people learning the landscape than making it as inaccessible as possible. It might validate the choices and expertise of a few, but it certainly does far less for the knowledge of the land itself than opening it up to new audiences.


  1. I think you're very much on the right track when you suggest that this is an issue of social or institutional organization, rather than a strictly epistemological one.

    Part of the problem, I think, is that the payoff for digital projects (especially ones involving big data) often does not *fit* neatly into a single field, as fields are presently defined. This is particularly an issue in literary studies, I think, because we periodize ourselves very tightly. If you draw a time-series graph with an x-axis longer than about 100 years, it can start to be hard to say who, exactly, would be the audience for such a thesis.

    Or, to pick up the way you're describing it here, it may be the case that Romanticists "already know" one implication of the graph, and Modernists "already know" another implication, and no one thinks it's particularly important that those two insights can be fused in a single trend line.

  2. It's weird: this kind of personal protectiveness over knowledge is exactly the kind of thing that will crash and burn a job talk ("I can just 'tell'"). It also wouldn't work in a book, at least not for a new scholar. Why is it okay in other contexts? Maybe resisting quantitative analysis is a knee-jerk defense against a public that's already skeptical about the existence and value of "expert knowledge" in the humanities. Fortunately, your take on this is much more productive.

  3. I think you're absolutely right - that quantitative "proofs" or demonstrations can serve a different purpose - perhaps of making the work reach a different kind of disciplinary audience. I'm going to use this argument the next time (with due credit!) when I get into a conversation with anyone about the utility of quantitative analysis. :)

    I suspect though that when humanists claim that pattern recognition by programs is not a major gain, they're also trying to raise an epistemological point. That is: if we construct programs to look for statistical regularities in texts, the programmer (or researcher) already has some idea of these regularities and the program only finds it for her (or verifies it for her that they exist). Your rejoinder is that it is indeed valuable for the program to be able to find it for her, even if that's all it does.

    I agree. But it's worth mentioning too that it is now possible in machine learning - through a technique called "boosting" - to construct weak pattern recognizers and then build them up together into a very strong one (such that the strong classifier is more than a sum of its weak parts). So there does exist a possibility of being able to go beyond things "we already know" in the computational analysis of texts.

    Hopefully we'll get there soon.