In my last post, I looked at first names as a rough gauge of author gender to see who is missing from libraries. This method has two obvious failings as a way of finding gender:
1) People use pseudonyms that can be of the opposite gender. (More often women writing as men, but sometimes men writing as women as well.)
2) People publish using initials. It's pretty widely known that women sometimes publish under their initials to avoid making their gender obvious.
The first problem is basically intractable without specific knowledge. (I can fix George Eliot by hand, but no other way). The second we can get actually get some data on, though. Authors are identified by their first initial alone in about 10% of the books I'm using (1905-1922, Open Library texts). It turns out we can actually figure out a little bit about what gender they are. If this is a really important phenomenon in the data, then it should show up in other ways.
Here's one way to look at that. For every letter, we can find the percentage of books using only the initial for authors with that letter. So for example, 11% of all books by people whose names start with "J" (James, Jessicas, etc.) are just by "J." Only 6% of those by people whose names start with "D" are.
Moreover, for every letter we know from the census the real distribution in the population of that name. 90% of all M's are female; 85% of all T's are male.
We can combine those two, and see whether women's letters are used as initials more than men's letters are. I was hoping this might provide evidence for a whole raft of female authors in the library hiding behind their initials. But that turns out, as far as I can tell, not to be the case. In fact, majority-female letters are probably less likely to be used instead of full names than are majority-male letters.
[Edit--Note: size is the frequency of that letter beginning first names in the census.]
So the intuition here is that if someone's bookplate says "W. Brown," their name is most likely William or Willard; if "M. Black", it's probably Mary or Marian. If women use initials, the ratio of just "M." to Mary/Marian/Michael should be higher than that of just "W." to Willard/William/Willa. And that looks untrue.
Personally, I find this a disappointing result. I went in this looking for some nice evidence that women were publishing under their initials at significant rates: enough to make me think twice about gender when pulling an initialed book off the shelves. That seems not to be the case. (Although, it's worth remembering that in many cases, the title on the bookplate is shorter than that in the library catalog, so it's still possible). It's possible to come up with scenarios where it's still important--maybe Mary's don't even bother using initials since their gender is obvious, and Juliet's almost always do, since they know they'll be mistaken for James?--but I can't think of any really plausible ones. Can you?