Tuesday, July 10, 2018

Google Books and the open web.

Historians generally acknowledge that both undergraduate and graduate methods training need to teach students how to navigate and understand online searches. See, for example, this recent article in Perspectives.  Google Books is the most important online resource for full-text search; we should have some idea what's in it.

A few years ago, I felt I had some general sense of what was in the Books search engine and how it works. That sense is diminishing as things change more and more. I used to think I had a sense of how search engines work: you put in some words or phrases, and a computer traverses a sorted index to find instances of the word or phrase you entered; it then returns the documents with the highest share of those words, possibly weighted by something like TF-IDF.

Nowadays it's far more complicated than that. This post is just some notes on my trying to figure out one strange Google result, and what it says about how things get returned.