Starting this month, I’m moving from New Jersey to do a fellowship at the Harvard Cultural Observatory. This should be a very interesting place to spend the next year, and I’m very grateful to JB Michel and Erez Lieberman Aiden for the opportunity to work on an ongoing and obviously ambitious digital humanities project. A few thoughts on the shift from Princeton to Cambridge:
Digital Humanities: Using tools from the 1990s to answer questions from the 1960s about 19th century America.
Showing posts with label This Blog. Show all posts
Showing posts with label This Blog. Show all posts
Friday, July 15, 2011
Friday, January 21, 2011
Digital history and the copyright black hole
In writing about openness and the ngrams database, I found it hard not to reflect a little bit about the role of copyright in all this. I've called 1922 the year digital history ends before; for the kind of work I want to see, it's nearly an insuperable barrier, and it's one I think not enough non-tech-savvy humanists think about. So let me dig in a little.
The Sonny Bono Copyright Term Extension Act is a black hole. It has trapped 95% of the books ever written, and 1922 lies just outside its event horizon. Small amounts of energy can leak out past that barrier, but the information they convey (or don't) is miniscule compared to what's locked away inside. We can dive headlong inside the horizon and risk our work never getting out; we can play with the scraps of radiation that seep out and hope it adequately characterizes what's been lost inside; or we can figure out how to work with the material that isn't trapped to see just what we want. I'm in favor of the latter: let me give a bit of my reasoning why.
My favorite individual ngram is for the zip code 02138. It is steadily persistent from 1800 to 1922, and then disappears completely until the invention of the zip code in the 1960s. Can you tell what's going on?
The Sonny Bono Copyright Term Extension Act is a black hole. It has trapped 95% of the books ever written, and 1922 lies just outside its event horizon. Small amounts of energy can leak out past that barrier, but the information they convey (or don't) is miniscule compared to what's locked away inside. We can dive headlong inside the horizon and risk our work never getting out; we can play with the scraps of radiation that seep out and hope it adequately characterizes what's been lost inside; or we can figure out how to work with the material that isn't trapped to see just what we want. I'm in favor of the latter: let me give a bit of my reasoning why.
My favorite individual ngram is for the zip code 02138. It is steadily persistent from 1800 to 1922, and then disappears completely until the invention of the zip code in the 1960s. Can you tell what's going on?
Saturday, December 4, 2010
Today's Times Article
Patricia Cohen's new article about the digital humanities doesn't come with the rafts of crotchety comments the first one did, so unlike last time I'm not in a defensive crouch. To the contrary: I'm thrilled and grateful that Dan Cohen, the main subject of the article, took the time in his moment in the sun to link to me. The article itself is really good, not just because the Cohen-Gibbs Victorian project is so exciting, but because P. Cohen gets some thoughtful comments and the NYT graphic designers, as always, do a great job. So I just want to focus on the Google connection for now, and then I'll post my versions of the charts the Times published.
Thursday, November 25, 2010
Back from Moscow--where to now?
I’m back from Moscow, and with a lot of blog content from my 23-hour itinerary. I’m going to try to dole it out slowly, though, because a lot of it is dull and somewhat technical, and I think it’s best to intermix with other types of content. I think there are four things I can do here.
1. Document my process of building up a specific system and set of techniques for analyzing texts from the internet archive, and publishing an account my tentative explorations into the structure of my system.
2. Trying to produce some chunks of writing that I could integrate into presentations (we’re talking about one in Princeton in February) and other non-blog writing.
3. Digging in with the data into some major questions in American intellectual history to see whether we can get anything useful out of it.
4. Reflecting on the state of textual analysis within the digital humanities, talking about how it can be done outside of my Perl-SQL-R framework, and thinking about how to overcome some of the more gratuitous obstacles in its way.
I’m interested in all of these, but find myself most naturally writing the first two (aside from a few manifestos of type 4 written in a haze of Russia and midnight flights that will likely never see the light of day). I think my two commenters may like the latter two more.
So I think I’ll try to intersperse the large amount of type 1 that I have now with some other sorts of analysis over the next week or so. That includes a remake of the isms chart, a further look at loess curves, etc.
Saturday, November 6, 2010
Intro
I'm going to start using this blog to work through some issues in finding useful applications for digital history. (Interesting applications? Applications at all?)
Right now, that means trying to figure out how to use large amounts of textual data to draw conclusions or refine questions. I currently have the Internet Archive's OCRed text files for about 30,000 books by large American publishers from 1830 to 1920. I've done this partly to help with my own research, and partly to try a different way of thinking about history and the texts we read.
I'm putting it online to help convince one or two people (I'm looking at you, Henry) that this sort of exploration is important for research and teaching. Not necessarily that it's research itself; I'm still unimpressed by the conclusions I'm getting out of it. But at least that any historian looking at the meanings of words (which is most of us, at least around here) should make some stab at using the texts of books we haven't read. And if I can get some good graphics out of it, maybe we can start to think about how this might be useful in teaching, particularly students who respond better to data than stories.
Anyhow, on with it.
Subscribe to:
Posts (Atom)
