tag:blogger.com,1999:blog-8929346053949579231.post3981753816847707004..comments2024-03-11T02:10:31.396-04:00Comments on Sapping Attention: Author Genders: methodologyBenhttp://www.blogger.com/profile/04856020368342677253noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-8929346053949579231.post-45125370750371795222012-05-29T10:09:56.889-04:002012-05-29T10:09:56.889-04:00I'm a very non-technical person that is trying...I'm a very non-technical person that is trying to find out (even just roughly) the % of female vs male authors (globally or nationally) can you give any indication of this?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-21508962939563228642012-05-08T18:18:50.170-04:002012-05-08T18:18:50.170-04:00(this is a follow-up to my post below, which I put...(this is a follow-up to my post below, which I put in the wrong place);<br /><br />I just logged onto Freebase and classified a few genders myself to see what it was like--I see I may be wrong that machine tagging is predominant, and that a lot of it is likely scanning the wikipedia article for gender clues and then entering it in. Which is pretty valuable. That should be better in a lot of cases than just automatically classifying. So the question is just<br /><br />1) Whether enough OL authors are in freebase to make it worth the time entering them.<br />2) Whether a good gender classifier based on census data including location, etc., would be better than a good one based on freebase.<br />3) Whether I should just keep freebase independent as a check on the methodology.Benhttps://www.blogger.com/profile/04856020368342677253noreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-26566919809061830672012-05-08T17:47:29.810-04:002012-05-08T17:47:29.810-04:00Yeah, this is a good question. Open Library links ...Yeah, this is a good question. Open Library links on a few (very few--less than 1%, if I recall correctly) authors to wikipedia pages; those are the only ones that seem like they'd be easy to link to an entity defined in some linked open data repository. With the rest, moving from name to shared entity is going to be pretty imprecise.<br /><br />But more to the point: there about 450,000 unique author ids in the OL, most of whom wrote a single book and died before 1930. The majority of these, a little random checking shows, don't show up at all in Freebase, and most of those that do (like <a href="http://www.freebase.com/view/en/francois_lenormant" rel="nofollow">this one, that I pulled randomly from the catalog</a>) are almost certainly <a href="http://blog.freebase.com/2009/09/09/gender-and-names-in-freebase/" rel="nofollow">just machine categorized by the genderednames app themselves</a>. So it's basically whether I want to use my own guessing system for everything, or use someone else's for some fraction (20-30%, I'd hazard?) of uncertain composition. Seems cleaner just to use the one, particularly because I want to have the 97% threshold, which freebase doesn't. Plus, for my sample 1910 names are a better classifying set than all the names in freebase. I don't think there's anything more complicated going on the freebase classifier, but I might be wrong about that.<br /><br />What we really need is a good multidimensional classifier that would work off whatever available data you had about birth year, nationality, region, and so on. That would probably be easy enough to build off freebase, although I'd want to check it off census data since just using famous people is problematic (and probably skews everything male). <br /><br />Another thing Freebase would be good for, potentially, is getting a list of female authors with male names: I'm really curious about the degree to which they write differently than women writing as women. I'll have to think about that some more.Benhttps://www.blogger.com/profile/04856020368342677253noreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-30822784735948048252012-05-08T16:55:10.587-04:002012-05-08T16:55:10.587-04:00Instead of guessing at author gender, why not look...Instead of guessing at author gender, why not look them up in Freebase for those which are available and only fall back to guessing if you don't get a hit?<br /><br />You can lookup by LC NAF, OpenLibrary and a bunch of other IDs. There's also http://genderednames.freebaseapps.com/index?name=SueTom Morrishttps://www.blogger.com/profile/05658717311518859311noreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-4051330463702777682012-05-07T14:21:05.274-04:002012-05-07T14:21:05.274-04:00You're right: it's as clear as day on the ...You're right: it's as clear as day on the IPUMS web site. I have no idea why I thought they weren't.Benhttps://www.blogger.com/profile/04856020368342677253noreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-55423113443258742952012-05-07T14:14:07.725-04:002012-05-07T14:14:07.725-04:00First and last names are available in all the hist...First and last names are available in all the historical samples (1930 and earlier), and will be available in 1940 soon.Evan Robertshttps://www.blogger.com/profile/11535512581510397532noreply@blogger.com