Let's start with two self-evident facts about how print culture changes over time:
- The words that writers use change. Some words flare into usage and then back out; others steadily grow in popularity; others slowly fade out of the language.
- The writers using words change. Some writers retire or die, some hit mid-career spurts of productivity, and every year hundreds of new writers burst onto the scene. In the 19th-century US, median author age stays within a few years of 49: that constancy, year after year, means the supply of writers is constantly being replenished from the next generation.
How do (1) and (2) relate to each other? To what extent do the shifting group of authors create the changes in language, and how much do changes happen in a culture that authors all draw from?
This might be a historical question, but it also might be a linguistics/sociology/culturomics one. Say there are two different models of language use: type A and type B.
- Type A means a speaker drifts on the cultural winds: the language shifts and everyone changes their vocabulary every year.
- Type B, on the other hand, assumes that vocabulary is largely fixed at a certain age: a speaker will be largely consistent in her word choice from age 30 to 70, say, and new terms will not impinge on her vocabulary.
Both of these models are extremes, and we can assume that hardly any words are pure A or pure B. To firm this up, let me concretize this with two nicely alphabetical examples of fictional characters to warm up the subject for all you humanists out there:
- Type A: John Updike's Rabbit Angstrom. Rabbit doesn't know what he wants to say. Every decade, his vocabulary changes; he talks like a ennui-ed salaryman in the 50s, flirts with hippiedom and Nixonian silent-majorityism in the 60s, spends the late 70s hoarding gold and muttering about Consumer Reports and the Japanese. For Updike, part of Rabbit being an everyman is the shifts he undergoes from book to book: there's a sort of implicit type-A model underlying his transformations. He's a different person at every age because America is different in every year.
- Type B: Richard Ford's Frank Bascombe. Frank Bascombe, on the other hand, has his own voice. It shifts from decade to decade, to be sure, but 80s Bascombe sounds more like 2000s Bascombe than he sounds like 80s Angstrom. What does change is internal to his own life: he's in the Existence period in the 90s and worries about careers, and the 00s he's in the Permanent Period and worried about death. Bascombe is a dreamy outsider everywhere he goes: the Mississippian who went to Ann Arbor, always perplexed by the present.*
Anyhow: I don't have good enough author metadata right now to check this on authors (which would be really interesting), but I
can do it a bit on words. An Angstrom word would be one that pops up across all age cohorts in society simultaneously; a Bascombe word is one that creeps in more with each succeeding generation, but that doesn't change much over time within an age cohort.
This is getting into some pretty multi-dimensional data, so we need something a little more complicated than line graphs. The solution I like right now is heat maps.
An example: I know that "outside" is a word that shows a
steady, upward trend from 1830 to 1922; in fact,
I found that it was so steady that it was among the best words at helping to date books based on their vocabulary usage. So how did "outside" become more popular? Was it the Angstrom model, where everyone just started using it more? Or was it the Bascombe model, where each succeeding generation used it more and more? To answer that, we need to combine author birth year with year of publication: