We just launched a new website, Bookworm, from the Cultural Observatory. I might have a lot to say about it from different perspectives; but since it was submitted to the DPLA beta sprint, let's start with the way it helps you find library books.
Google Ngrams, which Bookworm in many ways resembles, was fundamentally about words and their histories; Bookworm tries to place texts much closer to the center instead. At their hearts, Ngrams uses a large collection of texts to reveal trends in the history of words; Bookworm lets you use words to discover the history of different groups of books--and by extension, their authors and readers.
Digital Humanities: Using tools from the 1990s to answer questions from the 1960s about 19th century America.
Friday, September 30, 2011
Monday, September 5, 2011
Is catalog information really metadata?
We've been working on making a different type of browser using the Open Library books I've been working with to date, and it's raised a interesting question I want to think through here.
I think many people looking at word countson a large scale right now (myself included) have tended to make a distinction between wordcount data on the one hand, and catalog metadata on the other. (I know I have the phrase "catalog metadata" burned into my reflex vocabulary at this point--I've had to edit it out of this very post several times.) The idea is that we're looking at the history of words or phrases, and the information from library catalogs can help to split or supplement that. So for example, my big concern about the ngrams viewer when it came out was that it included only one form of metadata (publication year) to supplement the word-count data, when it should really have titles, subjects, and so on. But that still assumes that word data--catalog metadata is a useful binary.
I'm starting to think that it could instead be a fairly pernicious misunderstanding.
I think many people looking at word countson a large scale right now (myself included) have tended to make a distinction between wordcount data on the one hand, and catalog metadata on the other. (I know I have the phrase "catalog metadata" burned into my reflex vocabulary at this point--I've had to edit it out of this very post several times.) The idea is that we're looking at the history of words or phrases, and the information from library catalogs can help to split or supplement that. So for example, my big concern about the ngrams viewer when it came out was that it included only one form of metadata (publication year) to supplement the word-count data, when it should really have titles, subjects, and so on. But that still assumes that word data--catalog metadata is a useful binary.
I'm starting to think that it could instead be a fairly pernicious misunderstanding.