Tuesday, May 8, 2012

Women in the libraries

It's pretty obvious that one of the many problems in studying history by relying on the print record is that writers of books are disproportionately male.

Data can give some structure to this view. Not in the complicated, archival-silences filling way--that's important, but hard--but just in the most basic sense. How many women were writing books? Do projects on big digital archives only answer, as Katherine Harris asks, "how do men write?" Where were gender barriers strongest, and where weakest? Once we know these sorts of things, it's easy to do what historians do: read against the grain of archives. It doesn't matter if they're digital or not.

One of the nice things about having author gender in Bookworm is that it opens a new way to give rough answers to these questions. Gendered patterns of authorship vary according to social spaces, according to time, according to geography: a lot of the time, the most interesting distinctions are comparative, not absolute. Anecdotal data is a terrible way to understand comparative levels of exclusion; being able to see rates across different types of books adds a lot to the picture.

In this post, I'm going to run through a lot of basic metadata about the gender composition of libraries very quickly, because I need to know it to work with this data. Although this is the bookworm database, the rules for inclusion in Bookworm are so simple (Open Library page, Internet Archive downloadable file) that at least up to 1922, the results here should be broadly similar to any large selection of texts that draws heavily from the Google library-scanning project. (Most notably: HathiTrust and Google Books). And those are so similar to the composition of the university libraries that humanists have been using for decades, that even non-digital researchers should have some use for similar statistics.

More interesting findings might come out of more complicated questions about interrelations among all these patterns: lots of questions are relatively easy to answer with the data at hand. (If you want to download it, it's temporarily here. For entertainment purposes only, etc., etc.)

The most basic question is: what percentage of books are by women? How did that change? (Of course, we could flip this and ask it about men--this data analysis is going to be clearer if we treat women as the exceptional group). Here's a basic estimate: as the chart says, post-1922 results are unreliable. The takeaway: something like 5% at midcentury, up to about 15% by the 1920s.

Monday, May 7, 2012

Author Genders: methodology

We just rolled out a new version of Bookworm (now going under the name "Bookworm Open Library") that works on the same codebase as the ArXiv Bookworm released last month. The most noticeable changes are a cleaner and more flexible UI (mostly put together for the ArXiv by Neva Cherniavksy and Martin Camacho, and revamped by Neva to work on the OL version), couple with some behind-the-scenes tweaks that should make it easy to add new Bookworms on other sets of texts in the future. But as a little bonus, there's an additional metadata category in the Open Library Bookworm we're calling "author gender."

I don't suppose I need to tell anyone that gender has been an important category to the humanities over the last few decades. But it's been important in a way that makes lump categories like this highly fraught, so I want to be slightly careful about this. I'll do that in two posts: this one, explaining the possibilities and limits of the methodology; and a follow-up that actually crunches the data to look at how library holdings, and some word usages, break down by gender.