On the one hand, I don't want to claim too much for this: anyone can go to a library and see that Washington Irving doesn't write female characters. But as one of many possible exercises in reducing down the size of the library to rethink the broad aspects of the literary canon, c. 1910, I do think it's suggestive; and, as I'll suggest towards the end, knowing these practical details can help us explore the instability of 'subject' or 'genre' as expressed by the librarians who choose where to put these books on the shelves.
To start: here's a chart that shows the gender ratios of how many times authors use male pronouns (he/his/him/himself) over how many times they use female ones (she/her/hers/herself). The colors show the machine-derived author gender based on first names.
*Except for George Eliot. I changed her by hand to female a while ago. That was, though, a poor choice I'm going to rectify eventually. I'm eager to hear if there are more male pseudonyms for female authors shown here, but I'm not going to change any more of them because the presumptive gender of the pseudonym is every bit as interesting as the physical sex of the author. Ideally, of course, I'd have both.
These are the 100 authors with the highest number of personal pronouns in the Bookworm database The circles sizes are the number of pronouns used. Right in the middle is William Shakespeare, for example: his dot is the largest because there are multiple copies of most of his plays, and he's in the middle because he has approximately the same overall gender balance as the whole corpus. (Over three male pronouns for every female one).
This is interesting for two reasons. First, it gives you some of the most common authors in 19C libraries:* that's a list which has some interesting contrasts with who we think of as the canon. Washington Irving is quite a bit better represented than I would have thought, for example. Second, it lets us compare those authors to each other in a relatively straightforward way. Henry James writes a lot of female characters, but he still has about 20% more male pronouns. Emerson really never mentions women.
*[Edit--most common just by the number of pronouns they use, which, as John Theibault points out, isn't the same as the number of words they use. For the top 200 authors by overall word count, see the bottom of this post].
Genre is playing a big role here. I can group each author by the LC classification under which they published the most books: it doesn't always work (Grant Allen was more a science writer who penned novels than the other way around, Samuel Smiles is there for self-help), but it helps put some order on all the names you might not recognize.
The thing that really comes out here is the distinction between PZ and PS: there are far more women writing in PZ (Fiction and juvenilia) than in PS (American Literature). And there's not really a difference between the two: that Louisa May Alcott is a juvenile writer and James Fenimore Cooper a literary writer presumably has as much to do with gender as with content. The center of the curve skews more female in in PZ as well.
That highlights one of the most important things to keep in mind when doing this sort of work: that it's a convenient abstraction that gender and classification might be two different things, but in fact gender is so important that we have to assume it bleeds over into other form of classifications. This is one of the great problems with using established libraries (because they are "biased") but also one of the opportunities, because we should be able to see some of the ways bias is instantiated into categories.
Two last charts, and then I'll move onto other things. First, I was showing only the top 84 out of 1,000 authors: there are actually 25 who show include more female pronouns than male, so I'll pull them out so we can see just who they are.
Again, it's mostly women: there are a few men in there, too. (Although some of them are strange: Jean-Henri Fabre makes the cut because he refers to spiders as "she").
One last question might be: given that we know there's a baseline for pronoun use by gender and subject, can we tease out which authors are most remarkable in their context? Just a linear model across the five most common classifications allows this: in the following chart, Hardy and James occupy the far left as the authors who use far more female pronouns than their genre (literature, PR or PS) and gender (male) would have us expect.
It seems to be that it might be generally interesting to be able to compare authors across a wide range of word metrics like this, so we could see if statements about an author's linguistic peculiarities are actually true. This is something we could currently support in some form with the Bookworm API--it's possible this would be useful for something out there that isn't the problem of author gender or pronouns. It's worth thinking about.
[Edit]: Here's the top chart done for the top 200 authors overall, not restricted by pronoun count. Click to enlarge.