It's pretty obvious that one of the many problems in studying history by relying on the print record is that writers of books are disproportionately male.
Data can give some structure to this view. Not in the complicated, archival-silences filling way--that's important, but hard--but just in the most basic sense. How many women were writing books? Do projects on big digital archives only answer, as Katherine Harris asks, "how do men write?" Where were gender barriers strongest, and where weakest? Once we know these sorts of things, it's easy to do what historians do: read against the grain of archives. It doesn't matter if they're digital or not.
One of the nice things about having author gender in Bookworm is that it opens a new way to give rough answers to these questions. Gendered patterns of authorship vary according to social spaces, according to time, according to geography: a lot of the time, the most interesting distinctions are comparative, not absolute. Anecdotal data is a terrible way to understand comparative levels of exclusion; being able to see rates across different types of books adds a lot to the picture.
More interesting findings might come out of more complicated questions about interrelations among all these patterns: lots of questions are relatively easy to answer with the data at hand. (If you want to download it, it's temporarily here. For entertainment purposes only, etc., etc.)
The most basic question is: what percentage of books are by women? How did that change? (Of course, we could flip this and ask it about men--this data analysis is going to be clearer if we treat women as the exceptional group). Here's a basic estimate: as the chart says, post-1922 results are unreliable. The takeaway: something like 5% at midcentury, up to about 15% by the 1920s.
From now on, I'm removing post-1922 data from the analysis.
Next: Library of Congress classifications, my favorite proxy for genre. The labels won't fit on this chart, but you can read them here. The results are generally between 10 and 20% female for most genres (roughly comparable to the data in the Arxiv nowadays, I think), with some notable exceptions.
- The Ps--fiction--are far and away the most frequently female fields. There's really no question about it: particularly PZ ("fiction and juvenile belles-lettres), but also PS (American literature) are more female than almost any other field.
- DD, German history, is _far_ more male_dominated than any other field in history except maybe E, one of the two for US history. Does this reflect greater constraints on access to print in the heavily university-dominated German system in the 19C? (For American or German authors--the Ph.D.s are probably all going through Berlin, anyway). Are there other places that institutional discrimination might be evident?
- Genealogy and particularly biography, ("CT") are a really striking area of female authorship. Might be worth looking into.
- HQ--"The Family--Marriage--Women" is about 45% female. Most of this is probably settlement-house stuff that is well covered in the historiography, but is nonetheless a little higher than I might have thought.
- K, the law, has fewer women than anywhere. As with the German history, that can reflect the role of higher education in enforcing discriminatory practices.
- The religion section of the B's, BL-BX, is particularly male-dominated, with the exception of practical theology. The really strikingly low bar, BM, is "Judaism."
- From the number of authors I've worked with myself, I think of the Ls--education--as having a very high female percentage. (Although more in the 1930s than the 1900s). But though they're a little higher, it's not that notable.
- The Ns, visual art, are a little more female than most other fields.
- The low numbers in the sciences and technology are not very surprising; the spikes in the Ts are for handicrafts and home economics. The latter of those is the only field to break 50% female.
What about geography?
By state. Massachusetts does extremely well: of books with a publishing industry to speak of, only California does better. New York is OK, but in the middle of the pack. A lot of this probably has to do with the individual presses in the state--see the publishers list below for more on that.
A question emerges: Montana and Nevada both seem to have high female percentages. We know that western states had women's suffrage early; is the same true of female authors? A map loses the information about which states actually have significant numbers of books published in them, but makes regional comparisons easier. My opinion is that it puts to rest any idea of a particularly progressive West, but I could be dissuaded from that.
International comparisons are interesting as well. We can look at publication country. The result is a really striking win for the United States, with almost 18% of books written by women. The Swedes are next, followed by the Australians. Once again, the Germans are shockingly bad. This seems too strong to be merely a genre effect: the Germany overall percentage is lower than a lot of the science fields are. What's different about 19th century Germany compared to these other countries? And what does America not have? I'm strongly inclined to blame the developed system of universities.
Publishers exist in the data, although they're a little harder to pull out. After a little text scrubbing (to make "Little,Brown" the same as "Little Brown" the same as "Little, Brown & co.") the following are the largest publishers:
- It's nearly 50% for Roberts Brothers; that might even be low, since they seem (from Wikipedia, I'm ashamed to admit) to have built their success on Little Women, and generally capitalized on the market that opened up.
- I thought Dodd Mead was largely the education market, but wikipedia has no sign of that. Why did one mass-market publisher would publish about 1/3 women, while putnam or macmillan publish only about 1/8?
- Houghton Mifflin and Little Brown both get above 20%: this probably has to do largely with the predominance of fiction (remember the PZs above), but there might be other differences as well.
- Grosset Dunlap is largely the children's market: that's clearly a confounding factor on a lot of these statistics.
- The government printing office is not surprising, but worth remembering.
- T.T. Clark is largely religious materials, I believe.
- The university presses (U of Chicago, the Clarendon press at Oxford) are among the lowest. Yet another strike against the universities.
If I were to draw a preliminary conclusion, it might be: established institutions--the state, the universities--seem to most strongly suppress women, presumably because there are more hurdles to jump. In certain areas, things have changed. In others, they haven't--I ran some of this on the ArXiv author lists, and the 10-15% figures hold in the sciences. There's no reason to think that the same massively distortionary effects aren't still going on in academia, particularly on behalf or against social structures in addition to gender.
Keep in mind: women are the only discriminated-against group that we can pull out of library catalogs, but hardly the only ones in the 19th century. Surnames might get ethnicities--I haven't had much luck with that--but race and class are virtually impenetrable. I suspect that access to print is at least as strongly skewed by income and race as it is by gender. I don't think--I have to write this up at greater length--it makes any sense to not use libraries as they are not "representative." They are what they are--libraries are interesting. Everything that anyone ever said would be interesting, too. We have one of these: we'll never have the other.
A few disclaimers: All this data is restricted to 1,000,000 library books from the Open Library; I see no reason to think they aren't basically representative of the books that make it into university libraries. (Except that all but one or two of them had considerably fewer books around 1910-1920). The basic gender categorization scheme is here. For "percentage of books" I calculate categorized female authors divided by categorized female plus categorized male, throwing out books I can't classify. Those numbers will be off if unclassifiable authorship skews heavily in one direction or the other, but I don't see substantial reasons to think that's happening.