Patricia Cohen's new article about the digital humanities doesn't come with the rafts of crotchety comments the first one did, so unlike last time I'm not in a defensive crouch. To the contrary: I'm thrilled and grateful that Dan Cohen, the main subject of the article, took the time in his moment in the sun to link to me. The article itself is really good, not just because the Cohen-Gibbs Victorian project is so exciting, but because P. Cohen gets some thoughtful comments and the NYT graphic designers, as always, do a great job. So I just want to focus on the Google connection for now, and then I'll post my versions of the charts the Times published.
There's one strange subtext that I can't quite figure out: the secret google metadata. Cohen says google has substantially better metadata than they put on their site, which makes me somewhat doubtful of just how open they can be with all their resources. If Google can get a full API with access to texts and good metadata, which seems like it's a year or two off, that will obviate any need for databases like the one I've built. But if it's hampered by restrictions put on by content providers, that could cripple their ability to give the full access scholars need to engage in real dialogue with the data. Google Trends for historical terms might be worse than nothing, because it would only allow the facile sort of thinking Cohen's discouraging.* It was a big, messy production for google to wean itself from outside providers to let them do more interesting things with Maps: is metadata for historians and literature scholars going to be worth that effort for them, particularly when errors could result in copyright infringement? The Google employee in the article has a long comment about metadata that makes it sound like they currently have some obligations to providers, which is a bad sign. On the other hand, Cohen seems to trust them, which is something, and it's their book-scanning and free circulation of PDFs (though not OCR) that makes all of this possible.
But given the lack of clarity on a) what Google will offer, and b) when it will offer it, I'm happy for now to be working with Internet Archive OCR on Google scans, even though their metadata is quite a headache. The completeness of the Google stuff is appealing, but for most of the actual, historical questions I can think of dealing with books (not serials, which is a whole other mess) a combination of Internet Archive sources and Library of Congress catalog information should be fine. (Not that I've made any progress towards getting them to play together since the last time I said that.)
*BTW, trends says the Sox just got Adrian Gonzalez? It's taking all my self-control not to dump these posts and just read baseball sites for the next two hours. Forget about more database results for today.