Tuesday, December 14, 2010

Avoidance tactics

Can historical events suppress use of words? Usage of the word 'panic' seems to spike down around the bank panics of 1873 and 1893, and maybe 1837 too. I'm pretty confident this is just an artifact of me plugging in a lot of words in to test out how fast the new database is and finding some random noise. There are too many reasons to list: 1857 and 1907 don't have the pattern, the rebound in 1894 is too fast, etc. It's only 1873 that really looks abnormal. What do you think:
But it would be really interesting if true--in my database of mostly non-newsy texts, do authors maybe shy away from using words that have too specific a meaning at the present moment? Lack of use might be interesting in all sorts of other ways, even if this one is probably just a random artifact.

In general, is there some way of finding books that use a word much less than context would suggest--and could we then ask why? That's nearly impossible with existing electronic resources—I've been working on a long post about that—but it might give interesting ways of thinking about single texts we know well. This is a way massive textual statistics could help in looking at a single book. We could find out what words Thorndike avoids in his psychological texts that we'd expect to see a lot of given the broader context of early 20th century psychology—or proto-behaviorism, or name your area. Most of these would be expected, but some might not.

That's the single-text use. On massive corpuses: If word-use shows intellectual context, lack of use might do so too. It's sort of the textual equivalent of listening for 'the notes they don't play.' The most obvious example might be to figure out how to grab some late-19th-century American history and look for the elision of certain slavery-related words. That's one of the areas existing historiography has already made a lot of hay out of missing terms. Could we find areas it hasn't?

But in either case--we need large text sources to build models of words we'd expect to see. Even more than positive counts, negative counts is something that's really hard to do through traditional close reading unless you have a very clear idea what you're looking for.

1 comment:

  1. Interesting post, Ben.

    It seems to me that negative counts(not to mention positive ones, though you do) require a lot of advance knowledge with or without digital means.

    That is to say, while you initially started out focusing on how these tools might help at the *beginning* of research projects, before you know a lot, it seems to me that this post moves us in a slightly different direction. While it might be an interest wrinkle in "keyword history" to look for the absence of keywords (and I agree, it is), the leap from "lack of use" to a language of purpose ("suppress," "shy away from," "avoids") is a tenuous one, and one you'll have to think a lot about as you move forward with both positive and negative (present, or absent) keyword searches.

    In general, drawing conclusions in the language of authorial intent is a problem no matter what tools we use, and so I'd like to see you meditate a bit on how these approaches might fit into that ongoing methodological conversation.

    Is it possible that the statistical (and visual) representations enabled by these new tools don't actually help us "get up over" the material, no matter what we'd like to think? Or, if they do help us do so, where do we go from there?

    Aren't there still problems with that? Discuss.