Any word has a given prominence for any book. Basically, that's the number of times it appears. (The numbers I give here are my TF-IDF scores, but for practical purposes, they're basically equivalent to the rate of incidence per book when we look at single words. Things only get tricky when looking at multiple word correlations, which I'm not going to use in this post.) To explain graphically: here's a chart. Each dot is a book, the x axis is the book's score for "evolution", and the y axis is the book's score for "society."
They look pretty unrelated, but there are a few places where they get used together. We can put a number on that--the correlation is .13, where 1 is complete correlation, -1 complete negative correlation, and 0 no relationship at all. I could get it significantly higher by transforming the numbers to account for the distribution--it's .21 if you just take the square root, eg—but I'm just playing around, here, for now. That would have to be fixed for more serious research, though.
How high is a .13 correlation? It's probably not nothing. For more highly linked word-pairs like "Lincoln–Springfield," "Kansas–Nebraska," or "Red-Green" the correlations are between .45 and .5. The highest score I could get was this graph, with a correlation of .91, for "united" and "states".
All that is kind of uninteresting, I'm sure. What isn't: these correlations can change over time. Here's a chart showing the relations between two words--"evolution" and "society"--from 1830 to 1922.
The zero line is incredibly important on these charts--I should probably be highlighting it somehow. It's below before the Origin in 1859 because, I think, evolution was a word about the course of diseases and such while society was a word about the Astors. And the Astors rarely got diseased, at least on the printed page. It goes neutral right about 1860, and then makes it into the positive territory around 1880. I'm actually quite surprised to see how strongly evolution and society are linked after 1910--I would have bet the strongest correlation of the two terms would have come much later.
That's a fuzzy line, but how important is it? Well, let's check some more basic things. First, how do events change the correlations of people?
That's pretty striking right in 1865. I think there are some Harper's or Putnam's magazines keeping the current events high. The secondary spikes are a little more mysterious to me, though, this may be quite a noisy event.
How about pairs more conceptually linked?
Darwin and evolution become two closely linked words from about 1865, which is right at the beginning of the takeoff for Darwinist discourse. Let me show my chart of overall occurrences "evolution" for comparison:
Do you see how evolution rises just as it becomes closely correlated with Darwin? That's nice. Also interesting is that the local peak in the early 1890s lines up nicely with a lull in the strength of the Darwin-Evolution connection. That might be noise, or it might be a nice illustration of the "eclipse of Darwinism" Hank was talking about earlier. Let's dig into that a little more, starting in 1865:
I was hoping to find a spike around the search for alternate mechanisms of heredity, but that's not at all what we get. The huge initial peak, I think, reflects how evolution initially was just about biological heredity. As it diffused into things like like studies of society (which the chart above showed), that correlation weakened. The rebound—if that's what it is—in the 1880s might show something of a renewed interest in hereditary mechanisms even despite all the other people talking about evolution. I think there are ways to start to address this problem by being more sophisticated in both our search terms (not just "evolution" and "heredity", but a bunch of things designed to ferret out specifically biological evolution) and in our correlation methods. That's what the next post in the hopper should do a little more of. Any requests for pairs of words or pairs of groups to put in it?
This isn't perfect, I should say. I had one case that appeared not to work at all: Taylorism and efficiency. We get a couple high years in the teens, but also several in the 1840s:
On the other hand, that failure might be suggestive: people writing about efficiency didn't really cite Taylor that much, perhaps? Efficiency was such a widely spread watchword? Taylor's just too common a name? I need to load some more books into the system now that Google's got me feeling like 30,000 isn't enough? Could be any or all. But where this does work, I think it can be interesting.