Here's some inside baseball: the trends in periodization in history dissertations since the beginning of the American historical profession. A few months ago, Rob Townsend, who until recently kept everyone extremely well informed about professional trends at American Historical Association* sent me the list of all dissertation titles in
history the American Historical Association knows about from the last
120 years. (It's incomplete in some interesting ways, but that's a topic
for another day). It's textual data. But sometimes the most interesting
textual data to analyze quantitatively are the numbers that show up.
Using a Bookworm database, I just pulled out from the titles the any years mentioned: that lets us what periods of the past historians have been the most interested in, and what sort of periods they've described..
*Townsend is now moving on to the American Academy of Arts and Sciences, where I'm excited to see that he'll manage the Humanities Indicators—my first real programming/data project was putting together the first version of them together with Malcolm Richardson immediately after college.
Numbers between 500 and 2000 are almost always years. You can see here that the vast bulk of historical study has been in the period since 1750: the three spikes out of the landscape correspond to the Civil War and the two world wars. Output decreases in the late 20th century in large part because the data set goes back to about 1850; but as we'll see in the next chart, not entirely.
Digital Humanities: Using tools from the 1990s to answer questions from the 1960s about 19th century America.
Showing posts with label Historical memory. Show all posts
Showing posts with label Historical memory. Show all posts
Thursday, May 9, 2013
Monday, January 10, 2011
Searching for Correlations
More access to the connections between words makes it possible to separate word-use from language. This is one of the reasons that we need access to analyzed texts to do any real digital history. I'm thinking through ways to use patterns of correlations across books as a way to start thinking about how connections between words and concepts change over time, just as word count data can tell us something (fuzzy, but something) about the general prominence of a term. This post is about how the search algorithm I've been working with can help improve this sort of search. I'll get back to evolution (which I talked about in my post introducing these correlation charts) in a day or two, but let me start with an even more basic question that illustrates some of the possibilities and limitations of this analysis: What was the Civil War fought about?
I've always liked this one, since it's one of those historiographical questions that still rattles through politics. The literature, if I remember generals properly (the big work is David Blight, but in the broad outline it comes out of the self-situations of Foner and McPherson, and originally really out of Du Bois), says that the war was viewed as deeply tied to slavery at the time—certainly by emancipation in 1863, and even before. But as part of the process of sectional reconciliation after Reconstruction (ending in 1876) and even more into the beginning of Jim Crow (1890s-ish) was a gradual suppression of that truth in favor of a narrative about the war as a great national tragedy in which the North was an aggressor, and in which the South was defending states' rights but not necessarily slavery. The mainstream historiography has since swung back to slavery as the heart of the matter, but there are obviously plenty of people interested in defending the Lost Cause. Anyhow: let's try to get a demonstration of that. Here's a first chart:
I've always liked this one, since it's one of those historiographical questions that still rattles through politics. The literature, if I remember generals properly (the big work is David Blight, but in the broad outline it comes out of the self-situations of Foner and McPherson, and originally really out of Du Bois), says that the war was viewed as deeply tied to slavery at the time—certainly by emancipation in 1863, and even before. But as part of the process of sectional reconciliation after Reconstruction (ending in 1876) and even more into the beginning of Jim Crow (1890s-ish) was a gradual suppression of that truth in favor of a narrative about the war as a great national tragedy in which the North was an aggressor, and in which the South was defending states' rights but not necessarily slavery. The mainstream historiography has since swung back to slavery as the heart of the matter, but there are obviously plenty of people interested in defending the Lost Cause. Anyhow: let's try to get a demonstration of that. Here's a first chart:
How should we read this kind of chart? Well, it's not as definitive as I'd like, but there's a big peak the year after the war breaks out in 1861, and a massive plunge downwards right after the disputed Hayes–Tilden election of 1876. But the correlation is perhaps higher than the literature would suggest around 1900. And both the ends are suspicious. In the 1830s, what is a search for "civil war" picking up? And why is that dip in the 1910s so suspiciously aligned with the Great War? Luckily, we can do better than this.
Friday, December 3, 2010
Centennials, part II
So I just looked at patterns of commemoration for a few famous anniversaries. This is, for some people, kind of interesting--how does the publishing industry focus in on certain figures to create news or resurgences of interest in them? I love the way we get excited about the civil war sesquicentennial now, or the Darwin/Lincoln year last year.
I was asking if this spike in mentions of Thoreau in 1917, is extraordinary or merely high.
Emerson (1903) doesn't seem to have much a spike--he's up in 1904 with everyone, although Hawthorne, whose centenary is 1904, isn't up very much.
Can we look at the centennial spikes for a lot of authors? Yes. The best way would be to use a biographical dictionary or wikipedia or something, but I can also just use the years built into some of my author metadata to get a rough list of authors born between 1730 and 1822, so they can have a centenary during my sample. A little grepping gets us down to thousand or so authors. Here are the ten with the most books, to check for reliability:
I was asking if this spike in mentions of Thoreau in 1917, is extraordinary or merely high.
Emerson (1903) doesn't seem to have much a spike--he's up in 1904 with everyone, although Hawthorne, whose centenary is 1904, isn't up very much.
Can we look at the centennial spikes for a lot of authors? Yes. The best way would be to use a biographical dictionary or wikipedia or something, but I can also just use the years built into some of my author metadata to get a rough list of authors born between 1730 and 1822, so they can have a centenary during my sample. A little grepping gets us down to thousand or so authors. Here are the ten with the most books, to check for reliability:
Centennials, part I.
I was starting to write about the implicit model of historical change behind loess curves, which I'll probably post soon, when I started to think some more about a great counterexample to the gradual change I'm looking for: the patterns of commemoration for anniversaries. At anniversaries, as well as news events, I often see big spikes in wordcounts for an event or person.
I've always been interested in tracking changes in historical memory, and this is a good place to do it. I talked about the Gettysburg sesquicentennial earlier, and I think all the stuff about the civil war sesquicentennial (a word that doesn't show up in my top 200,000, by the way) prompted me to wonder whether the commemorations a hundred years ago helped push forward practices in the publishing industry of more actively reflecting on anniversaries. Are there patterns in the celebration of anniveraries? For once my graphs will be looking at the spikes, not the general trends. With two exceptions to start: the words themselves:
So that's a start: the word centennial was hardly an American word at all before 1876, and it didn't peak until 1879. The Loess trend puts the peak around 1887. So it seems like not only did the American centennial put the word into circulation, it either remained a topic of discussion or spurred a continuing interest in centennials of Founding era events for over a decade.
Monday, November 8, 2010
Diffusion patterns for news and technological events
An anonymous correspondent says:
You mention in the post about evolution & efficiency that "Offhand, the evolution curve looks more the ones I see for technologies, while the efficiency curve resembles news events."He or she is right that technology vs. news isn't quite the right way to describe it. Even in the 19C, some technology changes are news events, while others aren't. But let's look at some examples here.
That's a very interesting observation, and possibly a very important one if it's original to you, and can be substantiated. Do you have an example of a tech vs news event graph? Something like lightbulbs or batteris vs the Spanish American war might provide a good test case.
Also, do you think there might be changes in how these graphs play out over a century? That is, do news events remain separate from tech stuff? Tech changes these days are often news events themselves, and distributed similarly across media.
I think another way to put the tech vs news event could be in terms of the kind of event it is: structural change vs superficial, mid-range event vs short-term.
Anyhow, a very interesting idea, of using the visual pattern to recognize and characterize a change. While I think your emphasis on the teaching angle (rather than research) is spot on, this could be one application of these techniques where it'd be more useful in research.
Subscribe to:
Posts (Atom)