Tuesday, March 6, 2012

Do women hide their gender by publishing under their initials?

A quick follow-up on this issue of author gender.

In my last post, I looked at first names as a rough gauge of author gender to see who is missing from libraries. This method has two obvious failings as a way of finding gender:

1) People use pseudonyms that can be of the opposite gender. (More often women writing as men, but sometimes men writing as women as well.)

2) People publish using initials. It's pretty widely known that women sometimes publish under their initials to avoid making their gender obvious.

The first problem is basically intractable without specific knowledge. (I can fix George Eliot by hand, but no other way). The second we can get actually get some data on, though. Authors are identified by their first initial alone in about 10% of the books I'm using (1905-1922, Open Library texts). It turns out we can actually figure out a little bit about what gender they are. If this is a really important phenomenon in the data, then it should show up in other ways.

Here's one way to look at that. For every letter, we can find the percentage of books using only the initial for authors with that letter. So for example, 11% of all books by people whose names start with "J" (James, Jessicas, etc.) are just by "J." Only 6% of those by people whose names start with "D" are.

Moreover, for every letter we know from the census the real distribution in the population of that name. 90% of all M's are female; 85% of all T's are male.

We can combine those two, and see whether women's letters are used as initials more than men's letters are. I was hoping this might provide evidence for a whole raft of female authors in the library hiding behind their initials. But that turns out, as far as I can tell, not to be the case. In fact, majority-female letters are probably less likely to be used instead of full names than are majority-male letters.
[Edit--Note: size is the frequency of that letter beginning first names in the census.]

So the intuition here is that if someone's bookplate says "W. Brown," their name is most likely William or Willard; if "M. Black", it's probably Mary or Marian. If women use initials, the ratio of just "M." to Mary/Marian/Michael should be higher than that of just "W." to Willard/William/Willa. And that looks untrue.

This might partly be a genre thing--the sciences use lots of initials, maybe, and women don't write for them. Restricting to just fiction (LC Classification PZ) reduces the effect from a strong one to a non-existent one: but there's still no evidence that women are more likely to use initials than are men. (Maybe "Elizabeths" do it a lot more than "Marys", indicating something about Catholics vs. WASPs?)

Personally, I find this a disappointing result. I went in this looking for some nice evidence that women were publishing under their initials at significant rates: enough to make me think twice about gender when pulling an initialed book off the shelves. That seems not to be the case. (Although, it's worth remembering that in many cases, the title on the bookplate is shorter than that in the library catalog, so it's still possible). It's possible to come up with scenarios where it's still important--maybe Mary's don't even bother using initials since their gender is obvious, and Juliet's almost always do, since they know they'll be mistaken for James?--but I can't think of any really plausible ones. Can you?

But for looking at author gender of big corpuses using just names, this is a somewhat positive result, in a way; we do have to worry about the pseudonym effect, but initials seem not to particularly cloud the gender breakdown of the library in the aggregate.


  1. Hi Ben,

    A question about fiction that I've wondered about in the past. How well does PZ work for this purpose? PR and PS are where I'd go for British and American lit, though of course then you also get criticism, etc. But isn't PZ mostly juvenile lit and assorted hard-to-classify stuff? Have you looked at the composition of your PZ holdings? Does that skew things much?

    Just curious,

  2. Hi Matt!

    PR/PS vs. PZ is a tough call. I like PZ because it was the biggest category I had when using popular presses (although it's not now that I'm including all presses), it's almost entirely novels rather than poetry, and because I believe (purely anecdotally) that it has fewer reprints. (In this corpus, there's a great danger of drowning in a sea of Wordsworth editions.) I suspect that either California or Michigan, the two biggest libraries in my sample, like to shelve a lot of stuff in PZ? It's probably library dependent, which is another difficult question. In any case, I know that a lot of what we'd consider literature (not just Twain, but James, Hardy, and Wharton) shows up in PZ pretty regularly along with the children's books and genre fiction. But I like having those around because I think they're probably as close to spoken language as anything else I can get.

    I posted some lists a little while ago of what the books are, including PR, PS, and PZ.

    It's currently hard to see all of the ones in the set I'm using now, but you can page through them in Bookworm.

    There's also the LCSH heading for fiction, which you'd think would be best: but it tends to be applied only rarely by the libraries that the Open Library draws from, and so isn't so useful.

  3. Hello Ben,

    This is a very interesting subject.
    I am a female author and I use my initials for all of my published work.
    But it is not to hide my gender.My desire was to have readers more interested in the content and the words that I write,rather than who wrote it.


  4. So, does S.E. Gregg worry that she will undermine her work by making her gender obvious? Will the readers be more or less judgmental if they were to know the gender of the author? We know why George Eliot did it.

  5. I'm interested to hear SE Gregg's point here--it's definitely right that there are a whole range of justifications that authors will give. What I think is interesting right now is that when a female author wants the slight anonymity that initials provide, we immediately make it gender related; when a male author does, we come up with other reasons.

  6. I just recently came across this post again and did not realize that there were comments.Actually many artist do not use their real names.They would rather brand another name,a concept or etc. rather than their own name.My writing is spiritual and I want people to see God's message in my writing and not me.

  7. I was just researching why authors use their initials and came across this site. Very interesting! I decided to go by my initials because I write in a genre whose audience is predominately male, though I make no other attempts to hide the face that I'm female. The initials also hide my age to a certain extent, as Jennifer is a very 80s name.