tag:blogger.com,1999:blog-8929346053949579231.post79526590332057070..comments2024-03-11T02:10:31.396-04:00Comments on Sapping Attention: PCA on yearsBenhttp://www.blogger.com/profile/04856020368342677253noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-8929346053949579231.post-18408058303168919002011-02-22T10:27:28.323-05:002011-02-22T10:27:28.323-05:00@Ted: You almost make it sound like a hallucinogen...@Ted: You almost make it sound like a hallucinogenic. Go for it, man: expand your mind, see the higher dimensions.Benhttps://www.blogger.com/profile/04856020368342677253noreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-40334377938508678462011-02-22T09:00:38.192-05:002011-02-22T09:00:38.192-05:00You've convinced me that I need to try PCA, at...You've convinced me that I need to try PCA, at least as a exploratory technique.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-15105209414235777592011-02-21T10:53:53.804-05:002011-02-21T10:53:53.804-05:00Allen,
I definitely don't want to dig myself ...Allen,<br /><br />I definitely don't want to dig myself in behind PCA, because I certainly <i>haven't</i> found it that easy to explain, particularly past the first component. (That one is pretty easy, actually, which is why I like having this year rotation data on hand). And once you do explain it, you have to spend a lot of time issuing caveats, because the intuitive explanations of the axes are always going to be correlated with each other. I do think spatial transformations stay a little closer to the original data in some ways (although that's more true for a technique like cosine similarity than what I'm doing, which requires a lot of scaling and normalizing).<br /><br />I guess my intuition is that topic modeling is, as it says, modeling, while PCA is just rearranging. When I say it like that, I think that's probably an incorrect view; my only question is whether it's an idiosyncratically incorrect view, or one that other humanists might share. In which case, PCA might be able to find a niche.Benhttps://www.blogger.com/profile/04856020368342677253noreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-85229507492890928862011-02-21T09:40:57.314-05:002011-02-21T09:40:57.314-05:00Ben,
I've never looked at the Ulrich book. I&...Ben,<br /><br />I've never looked at the Ulrich book. I'm going to go request it now. Thanks for the reminder.<br /><br />I think I get your point about being comfortable with the vector model. What I'm a little confused about is why PCA is any less removed from the word frequencies than the topic model is. Personally, I think it's easier to explain Bayes' rule and probability distributions than to explain an orthogonal projection matrix. But maybe you've had some experience explaining PCA to folks...Allen Riddellhttp://ariddell.orgnoreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-4812083623097283322011-02-20T15:59:11.841-05:002011-02-20T15:59:11.841-05:00Allen,
Thanks, that Hall paper is great and I wis...Allen,<br /><br />Thanks, that Hall paper is great and I wish I'd seen it earlier. And I'd seen Blevins' stuff in passing, but he goes more in depth than I thought at first. I just taught the Ulrich book last week, and it's neat how some of the charts she spent a while creating—the decline of Ballard's midwifery practice, say—just pop out.<br /><br />I know topic modeling is the future, but I've just got an affection for vector models that I can actually intuit--topic modeling seems a bit too much like magic, and I'd have to write an implementation to dispel that. Also, as you say, it takes more power: I suspect that some of the established models might choke my computer against 20GB of data. Have you used the MALLET package? I've heard that's better than the various R packages for some reason.Benhttps://www.blogger.com/profile/04856020368342677253noreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-48611805761030513432011-02-19T21:17:10.421-05:002011-02-19T21:17:10.421-05:00I feel like PCA is pretty opaque and the results a...I feel like PCA is pretty opaque and the results are hard to interpret for text. Topic modeling seems a step better and you can target it to specific hypotheses easier, IMHO. It does require a bunch more computing power. There's a nice R package, topicmodels that you might want to check out.<br /><br />Here are three examples:<br /><br />1. Blei's topic modeling of Science magazine<br />2. David Hall's intellectual history/topic modeling of the linguistics journals "Studying the History of Ideas Using Topic Models"<br />3. Cameron Blevins' topic modeling 27 years of 18th century diary entries http://historying.org/2010/04/01/topic-modeling-martha-ballards-diary/Allen Riddellhttp://ariddell.orgnoreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-61526967785584240872011-02-19T18:40:36.545-05:002011-02-19T18:40:36.545-05:00@Hank
It's probably not worth concretizing, b...@Hank<br /><br />It's probably not worth concretizing, but I sort of want methods to be approaches, big-picture stuff, which tools can help instantiate. Close to a strategy/tactics distinction, I guess. So yeah: I think that certain digital tools can be, if well controlled, relevant to wider developments in the field as a whole.<br /><br />@Allen<br /><br />Yeah, I've been wondering about topic modelling for a while, but just holding back from the overhead of actually learning/doing it in depth. Partly because there seem to be all these subtly different approaches - CTM, tLDA, that I don't feel qualified to discriminate among, and partly because I know it will be a while before I could get as good an understanding of any probabilistic model as I get intuitively of the vector-space models I'm using here, even though I do believe they're probably better. It's a good point this might be the place to dive in: these genre groupings would be a lot smaller to start, which is probably a good thing. I've been wondering a lot about the interpretibility of topic model for exploration, not just classification, which is what it seems to be built for… there's definitely something there.<br /><br />Somehow David Blei, who seems to be Mr. Topic Models, has been enticed to come by the history department this week to talk about electronic archives; depending on what he says that might really start me off in a new direction.<br /><br />Have you used them yourself at all? I haven't seen any good examples of digital humanists using topic models for anything but smaller, <a href="historying.org/2010/04/01/topic-modeling-martha-ballards-diary/" rel="nofollow">fun</a> <a href="www.stanford.edu/~mjockers/cgi-bin/drupal/node/39" rel="nofollow">projects</a>.Benhttps://www.blogger.com/profile/04856020368342677253noreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-40398572983739515662011-02-19T17:27:07.756-05:002011-02-19T17:27:07.756-05:00Possibly something useful?
Topics over Time http:...Possibly something useful?<br /><br />Topics over Time http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.2460<br /><br />Dynamic Topic Models http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.62.2783Allen Riddellhttp://ariddell.orgnoreply@blogger.comtag:blogger.com,1999:blog-8929346053949579231.post-46705693833122795102011-02-19T15:59:01.299-05:002011-02-19T15:59:01.299-05:00Ben: Last comment for today, and now I think I'...Ben: Last comment for today, and now I think I'm up to date on what you've been thinking. I like the distinction between method and tools - it's an important one for a whole host of reasons. One:<br /><br />We readily sense that tools get <i>used</i>, but, personally, I think we too readily do the same for methods. These latter seem somehow larger to me - do they to you? - and so coming up with a better way of talking about how they're developed and why might:<br /><br />(A) help us scramble out of a vocabulary of man-the-tool-user in the history of ideas and (B) situate our own use of digital "tools" in a wider context of developments in the discipline or in the humanities as a whole. Make any sense?Hankhttps://www.blogger.com/profile/02841787256060612291noreply@blogger.com