Monday, September 15, 2014

Screen time!

Here's a very fun, and for some purposes, perhaps, a very useful thing: a Bookworm browser that lets you investigate onscreen language in about 87,000 movies and TV shows, encompassing together over 600 million words. (Go follow that link if you want to investigate yourself).

I've been thinking about doing this for years, but some of the interest in my recent Simpsons browser and some leaps and bounds in the Bookworm platform have spurred me to finally lay it out. This comes from a very large collection of closed captions/subtitles from the website opensubtitles.org; thanks very much to them for providing a bulk download.

Just as a set of line charts, this provides a nice window into changing language. I've been interested in the "need to"/"ought to" shift since I wrote about it in Mad Men: it's quite clear in the subtitle corpus, and the ratio is much higher as of 2014 than anything Ngrams can show.

Add caption

Thursday, September 11, 2014

Some links to myself

An FYI, mostly for people following this feed on RSS: I just put up on my home web site a post about applications for the Simpsons Bookworm browser I made. It touches on a bunch of stuff that would usually lead me to post it here. (Really, it hits the Sapping Attention trifecta: a discussion of the best ways of visualizing Dunning Log-Likelihood, cryptic allusions to critical theory; and overly serious discussions of popular TV shows.). But it's even less proofread and edited than what I usually put here, and I've lately been more and more reluctant to post things on a Google site like this, particularly as blogger gets folded more and more into Google Plus. That's one of the big reasons I don't post here as much as I used to, honestly. (Another is that I don't want to worry about embedded javascript). So, head over there if you want to read it.

While I'm at it, I made a few data visualizations last year that I only shared on Twitter, but meant to link to from here: Those are linked from a single place on my web site. My favorite is the baseball leaderboard, the most popular was either the distorted subway maps or the career charts, and the most useful, I think, is the browser of college degrees by school and institution type. There are a couple others as well. (And there are a few not there that I'll add at some point.)