So I just looked at patterns of commemoration for a few famous anniversaries. This is, for some people, kind of interesting--how does the publishing industry focus in on certain figures to create news or resurgences of interest in them? I love the way we get excited about the civil war sesquicentennial now, or the Darwin/Lincoln year last year.
I was asking if this spike in mentions of Thoreau in 1917, is extraordinary or merely high.
Emerson (1903) doesn't seem to have much a spike--he's up in 1904 with everyone, although Hawthorne, whose centenary is 1904, isn't up very much.
Can we look at the centennial spikes for a lot of authors? Yes. The best way would be to use a biographical dictionary or wikipedia or something, but I can also just use the years built into some of my author metadata to get a rough list of authors born between 1730 and 1822, so they can have a centenary during my sample. A little grepping gets us down to thousand or so authors. Here are the ten with the most books, to check for reliability:
1 Thackeray 1811 86
4 Holmes 1809 52
9 Darwin 1809 30
10 Dickens 1812 30
14 Whittier 1807 29
15 Hawthorne 1804 28
21 Spencer 1820 25
22 Tyndall 1820 25
23 Holland 1819 24
26 Longfellow 1807 22
So who has the biggest centenaries? The 'Percent' column is the percentage of all mentions of an author that occur in his or her (hello, Harriet Beecher Stowe!) centenary. I'm only applying this to the 200 most represented authors. So Thoreau is indeed remarkable in having 11% of his mentions in his centennial year. There are a lot of other big centennials in the 90s and aughts, and only one from the first half of our sample.
Percent Author Birth Year
1 12.305728 Gieseler 1792
2 11.053941 Thoreau 1817
3 7.580752 Thackeray 1811
4 5.753147 Colburn 1793
5 5.594953 Stowe 1811
6 4.320671 Woodhouselee 1747
7 4.209995 Lewes 1817
8 3.640463 Dickens 1812
9 3.634805 Emerson 1803
10 3.533058 Haswell 1809
Do the centenary scores increase as time goes on?
> cor(centenary.scores[,3],centenary.scores[,1])
[1] 0.02625526
Emphatically they do not. It's unusual to get a correlation score so close to zero on this kind of data. But that may just be because most authors don't get centenary celebrations, in which case most of their mentions were probably closer to when they were alive. Or for some, like Darwin, the centennial just doesn't matter compared to the other controversies that get kicked up around the name at other points in time.
What if we did a different set of centennials: say, presidents? I can just pull their dates off wikipedia and sort the percentage of their mentions that come in the centenary year:
So any number below 1 means their centenary year had a below average number of mentions. I knew the Lincoln centenary was a big deal, but Fillmore? Who knew? A graphic like this could be good for teaching if we want to talk about, say, the eclipse of Grant, who gets a smaller boost than Hayes in 1922. I could go on, but I think it's clear that there's some interesting stuff about not just publishing practices, but maybe larger questions of reputation.
For the record, there is a positive correlation here:
> cor(as.numeric(presidents[,2]),result)
[1] 0.4922704
But it's mostly driven by the lack of centenaries for the founding fathers, which I think is something else entirely.
But enough of this stuff--after 19 hours of processing, I've finally got my database running in a new form, which should open up some new possibilities for comparing across discursive spheres.
This, from wikipedia, is one of the dumbest sentences I've read: "The record for the fewest Presidential birthdays is one, shared by June and September."
ReplyDeleteTwo questions:
ReplyDelete1. Was that Bronson or Louisa May in that graph?
2. Could you do a similar graph with some writers who weren't transcendentalists or Am. Renaissance stars? I'm interested in knowing if that big jump in the 1880s and beyond has to do with subject matter, reputation, or is partly an artifact of a large general jump in US publishing.
Dan,
ReplyDeleteIt's just "Alcott", unfortunately--my system only works quickly on single words. I'm curious about all those things, too, but I'm not sure I'm going to chase it down right away. Most of the jump is generic, I think--Longfellow gets it too, and more importantly, Shakespeare gets an abbreviated version. Even John Bunyan, who pretty clearly peaks earlier, has a secondary peak around 1885. I said earlier I was working on a post about the pitfalls of loess and the transcendentalists--I'm just not sure how deep I want to go into differences between publishing houses, etc., particularly before I have genre information.
As for whether it's a 'large general jump', the numbers are normed against all words published that year, so it's not just that there are more books--but there's probably some sort of change, maybe involving publishers printing books about writers more, and not just the 'primary sources.'
A less interesting possibility is that it has to do with publishers putting the names of other books in their catalogs on the endpapers, which my program would parse. I know you see that a lot in the late 19C, but I don't know when it starts. That's a whole problem I've hardly thought about at all.
Ben,
ReplyDeleteIf it's the last alternative (which is, I agree, less interesting, at least for content analysis), is there some way to spin out of it to claims about different efforts by different houses to push their wares in different books?
It's possible thoughts on that would have to wait for genre tags from you, but I just wondered if there were, in theory, some way to zoom out from an admission about a seemingly technical snafu to some analysis about commercial patterns in the publishing industry a la the last piece of Dan's second question...