Tuesday, October 30, 2012

Data narratives and structural histories: Melville, Maury, and American whaling

Note: this post is part I of my series on whaling logs and digital history. For the full overview, click here.

Data visualizations are like narratives: they suggest interpretations, but don't require them. A good data visualization, in fact, lets you see things the interpreter might have missed. This should make data visualization especially appealing to historians. Much of the historian's art is turning dull information into compelling narrative; visualization is useful for us because it suggests new ways of making interesting the stories we've been telling all along. In particular: data visualization lets us make historical structures immediately accessible in the same way that narratives have let us do so for stories about individual agents.

I've been looking at the ship's logs that climatologists digitize because it's a perfect case of forlorn data that might tell a more interesting story. My post on European shipping gives more of the details about how to make movies from ship's logs, but this time I want to talk about why, using a new set with about a half-century of American vessels sailing around the world. It looks like this:

I'll repost this below the break with a bit more of an explanation. First I want to ask some basic questions: If this is a narrative, what kind of story does it tell? And how compelling can a story from data alone be: is there anything left from a view so high that no individuals are present?

To make matters worse: narrative is not the whole game in history writing. Academic historians tend to be suspicious of too good a story, particular when it draws heavily on the standard tropes of Barnes and Noble historiography—the tragically flawed leader, the Good War, the innovator who changed the world. But while data visualizations may not always pack the argumentative punch that real academic history prizes, there's reason to be hopeful since they don't tell those old stories, either. The appeal of this sort of visualization, in fact, is compelling evidence that there are substantial publics out there that want to see historical data tell stories about the way that systems operate rather than individuals. Stanford's ORBIS project, for example, hit a viral wave describing the Roman world without a shred of swords-and-sandals.

One of the problems with individual narrative is that it's not very good at telling systemic stories; historians are reduced to arcane feats of construction where they try to assemble 'networks of actors' who can stand for the whole system, and interlock their paths like a Victorian novelist. Using data, where available, to make the systemic side of things accessible can be good for historians if it lets them tell stories that aren't fundamentally about individual agents. Visualizing individual experience as data may usually be crudely reductive; but visualizing group dynamics through data is often profoundly revealing.

But unlike prose narratives, visualization is much harder to interweave with an argument. When I made a video of European shipping like the one of the US here: it made its rounds on the Internet, usually clipped out from the my explanation, and the most frequent interpretation was that it showed, as Zero Hedge put it, "the origins of globalization." In other words, the dominant story was the refinement and perfection of a series of commercial interactions that brought the world closer and closer together. As a master narrative, it's compelling; but compelling in a way that satisfies present prejudices about trade, and might be applied to just about any data set at any time. There's certainly some truth to it (although globalization certainly existed before 1750), but the data is too sparse to confirm it, and I'm not sure it's a story which will help us better understand the past in any case.

The Maury Logs

To find something I might more usefully be able to discuss, I went through the biggest source of historical shipping records, the ICOADs collection, and pulled out the very first systematic collection of logbooks ever assembled: Matthew Maury's collection of American shipping from about 1785 to 1860, assembled mostly before the Civil War. (For my longer post on the history of their digitization, see here).

Using basically the same methodology as last time, I made an image that tells a story of sailing centered on the United States instead of the European colonial powers. (This is the same video as at the top of the post):

Once again, I'm not going to be able to catch every interesting trend here, and those who know more shipping history than I will surely notice things I'm missing. But a few highlights pop out:
  • It's worth clicking through to 1849 to watch the gradual buildup of shipping to San Francisco as the Gold Rush turns us into a bicoastal nation without a transcontinental railroad.
  • You can also see the first American expeditions near Japan, including one of the ships in Admiral Perry's second fleet in 1854.
  • The trade winds remain very visible through the end in the Atlantic, where all the ships take a similar looping path out and back. (It's possible they get more stereotyped over time, thanks to better charts from Matthew Maury himself). We'd have to push forward in time a bit more to see the steamers have any effect.
  • The very first ship, which I suspect Maury selected for symbolic reasons, is the Empress of China. As you know if you've read Dael Norwood's dissertation, this was the first American trader to head to Canton. That reflects a feature of the data: the Pacific is obviously a major area of American interest from quite early on, far more than it was in the European sets. (Unfortunately, the track is lost somewhere in the Pacific before it actually reaches Asia).
  • While the colors are a bit hard to distinguish at times (and sometimes they switch, since the data is less formulaic than was CLIWOC), the early years have many Salem ships on spice missions to Zanzibar or the East Indies; later years have less exotic routes, like an endless stream of New York to Liverpool. This is partly Maury's selection bias, but the global intentions of American trade from the beginning are striking.
Part of my interest in this data set is that there are enormous biases which should be immediately evident to anyone who knows the subject; our historical heritage was not preserved by statistically sampling the available resources. To forestall two possible misconceptions, I'd just note that certain types of voyages are over-represented, particularly those sailing to more exotic locales: and that you shouldn't conclude that having fewer ships on the seas in (say) 1858 as opposed to 1842 actually means the volume of shipping decreased. (For a lot more on this, once again, see that post on how Maury got this data).

The bulk of the data is from the 1840s; and far and away the most compelling visual aspect here are the amazing seasonal migrations of ships through Hawaii each year, following the warm weather north and south. The patterns are very clear in the 1840s period: but a seasonal map like I made last time, superimposing every ship onto a single calendar of year, can very quickly show a far greater seasonal variation than anything in the European routes:

The greatest drivers of those seasonal pattern are whaling ships. And that suggests a narrative and an argument quite different from the one of commercial expansion that my European visualization seems to show, or that you might draw form the commercial voyages in the Maury set alone.

In short: we're tempted to look at maps like this and view the big systemic action we can see in shipping being connecting. But whaling suggests a wholly different story, one where the action involves resource depletion, tragedies of the commons, and industrial labor. And that, as well, is a story that's hard to tell effectively with narrative alone.

Floating Factories

Whaling wasn't about connecting trade ports. Focus, for example, on Japan from about 1845 to 1855 in the video above. Matthew Perry arrives with his gunboats in 1852; but American incursion into the sea of Japan begins years earlier, and is largely about the whales. There's a real argument to be made that the whale ships, not the black ships, were the real force opening Japan.

That roving character, is intimately tied in to the character of whaling itself. To understand that, we might turn back to traditional narrative for a second; everything we'd like to know about this period of whaling is pretty well summed up in Moby-Dick and the scholarship around it. After all, Melville wrote about whaling from personal experience. (As I pointed out earlier, both the whaler Melville sailed on, the Acushnet, and the one which he based the Pequod's sinking, the Essex, have some of their tracks in the chart above). Melville can help us here.

One of the funny things about claiming Moby Dick as the great American novel is that nearly the entire story takes place in international waters. No one claims The Ambassadors captures the totality of American life from France: so why should Moby-Dick from the Pacific? I always liked the argument I took from Charles Olson on this: The whaling ship may be at sea, but that ocean represents the whole American experience far more than any individual place could. The wide open spaces—Melville refers to the Pacific's 'watery prairies'—stand in for the frontier; while the labor on the ships foreshadows the industrial era. Whaling ships, Olson points out, were factories, full of working class men engaged not just in hunting but in slaughtering, processing, and packaging valuable industrial goods.

Networking and pillaging

It's easy to think of ships—particularly American ones—as bearing either commerce or war. But the reminder that whaling is industrial leads us to think of shipping, and the behavior driving it, in a different way. The idea of these maps as showing global commerce is appealing because it leans so heavily on one of the most well-trod paths in contemporary thought: the idea of networksThe land is filled with nodes, which are realthe sea is the space between, just waiting to be traversed and made as minimal as possible.

But seeing the ocean as a site of industrial production in itself reminds us that connecting only exists in the context of production; and production requires very different forms of behavior. Maury certainly over-collected whaling voyages because he knew they went to exotic places which would help fill in the empty squares in his sea charts. But even if whaling is an aberration from most ocean traffic (although fishing, which Maury doesn't seem collect, has always been a major reason people go to the sea in ships), this suggests we should pay special attention to the whaling voyages we have: because it lets us see real actions, not just their ghostly interconnections.

In a network, the goal is to travel one point to another as quickly as possible; the changes will be improvements as tracks grow straighter (thanks to things like Maury's charts or steam engines) or previously closed routes become navigable (the Suez canal in the 19th century, the St. Lawrence seaway in the 20th, the Northwest Passage in the 21st.) But in industry, the real action takes place at sea: the goal is chase and kill these animals beneath the sea. The route from Boston to London in the age of sail varies little: but the tracks to hunt whales out of New Bedford changes dramatically from year to year, because the whales go different places.

But while the destinations are incredibly different—near Japan one year, off the coast of British Columbia another—the type of behavior stays the same. And that's immediately evident in the images: the repetitive behavior, writing over the same area again and again, is obviously separate from the basic network-act of 'connecting.' (In the yearly visualization it reminds me, if anything, of those old "scrubbing bubbles" commercials from Dow Chemical.)

Matthew Maury used those ships logs to make a map of whale locations:

Maury's 1851 map of whale populations: from the Smithsonian Institution

But a static map implies a permanance to whaling grounds. That doesn't match against the extreme variations of where the American whaling fleet sailed. To make Maury's data tell the whaling story in a different way, I've selected just the whaling routes (a longer description of using machine learning to supplement historical ship metadata to pull out just the whaling routes is here) and changed the plotting parameters so that each voyage leaves a permanent mark in black in addition to its temporary trail in red.

Here you can see the story in a different light:
  • The various whaling grounds and the routes to them becoming gradually overwritten. 
  • Far from being fixed (which is how we sometimes talk about whaling or fishing grounds), they shift over time as the whale population disappears.
  • Ships move in clusters each year; a few try out new or old grounds, but large numbers follow the same pattern: Kamchatka one year, the sea of Japan the next, etc.
  • Those clusters are probably sharing information somehow: the semiannual convergence on Hawaii, in particular, probably means that any ship knows which of the last season's grounds was full and which empty.
  • By the early 1850s, the whalers are regularly pushing up through Bering Strait. This is literally the end of the earth: before the Panama Canal, there was no sailing destination in the world farther from New Bedford or Sag Harbor.
  • By the end of the period, the numbers of ships in action declines.
And that suggests a big story about the data: although a few points (the origins on the east coast, the refueling at the islands) show networking, we're overwhelmingly seeing pillaging behavior. In short, the narrative goes something like: There's only so much spermacetti to be collected in the world; whalers chase it to the gates of what Melville called the whale's "polar citadels"; and once it's exhausted, the industry disappears.

Does the Whale's Magnitude Diminish? Will he Perish?

Like a lot of good stories, this can't stand up entirely under close inspection. The evidence for American whaling severely depleting the whale population is thin; more likely, the declining numbers after 1850 are from a combination of economic sources (after 1849, Californian gold looked more attractive to risk-takers than Alaskan whales) and data problems (perhaps Maury didn't collect whaling logs from the 1850s because he turned his attention to more standardized sources from various merchant marines).

Melville himself addressed just this question in Chapter 105 of Moby Dick, and comes to the same conclusion as more recent literature: American whaling wasn't going to permanently imperil the cetacean populations. (Melville's always worth quoting at length, but the Pacific Ocean–Great Plains comparison definitely makes it worth it here).
Whether owing to the almost omniscient look-outs at the mast-heads of the whaleships, now penetrating even through Behring’s straits, and into the remotest secret drawers and lockers of the world; and the thousand harpoons and lances darted along all continental coasts; the moot point is, whether Leviathan can long endure so wide a chase, and so remorseless a havoc; whether he must not at last be exterminated from the waters, and the last whale, like the last man, smoke his last pipe, and then himself evaporate in the final puff.

Comparing the humped herds of whales with the humped herds of buffalo, which, not forty years ago, overspread by tens of thousands the prairies of Illinois and Missouri, and shook their iron manes and scowled with their thunder-clotted brows upon the sites of populous river-capitals, where now the polite broker sells you land at a dollar an inch; in such a comparison an irresistible argument would seem furnished, to show that the hunted whale cannot now escape speedy extinction.

But you must look at this matter in every light.  Though so short a period ago—not a good lifetime—the census of the buffalo in Illinois exceeded the census of men now in London, and though at the present day not one horn or hoof of them remains in all that region; and though the cause of this wondrous extermination was the spear of man; yet the far different nature of the whale-hunt peremptorily forbids so inglorious an end to the Leviathan.  Forty men in one ship hunting the Sperm Whales for forty-eight months think they have done extremely well, and thank God, if at last they carry home the oil of forty fish.  Whereas, in the days of the old Canadian and Indian hunters and trappers of the West, when the far west (in whose sunset suns still rise) was a wilderness and a virgin, the same number of moccasined men, for the same number of months, mounted on horse instead of sailing in ships, would have slain not forty, but forty thousand and more buffaloes; a fact that, if need were, could be statistically stated.
But though Melville was right about the particular technology not being sufficient to extinguish the whales, there's still reason to think that the behavior we see here does tell us about systemic actions. The real threat of massive whale extinction came in the 20th century as 'factory ships' (there's that word again) began to take to the sea. They would have been largely unrecognizable to Melville: steam engines, onboard harpoons, and massive sluices to winch whales onto the ship while still moving.

The Southern Harvester (whales dragged up the chute)
Unknown date: probably 1950s.

A visualization of this period of whaling is going to be harder to find. The ICOADS collection does have three decks exclusively of whalers, but it's difficult to pull ship tracks out of them since the data was punch-carded earlier and is not as nicely formatted as the US Maury collection. All we can see are the points, not the paths (and each ship appears once for each logbook reading it takes, up to 24 a day).

(South Africa in orange, Japan in red, Norway in blue.)

As visualization, I don't find this nearly so compelling. But although the appearance is different, but the behavior is the same: the ships come from away, do their work of scrubbing in the obscure places of the ocean, and continue on until the ground is completely exhausted.

Histories of depletion through data

So even if the Maury tracks didn't extinguish the whales, the sort of behavior they show came extremely close before international compacts mostly ended international whaling. At some point I'm going to post a chaotic video with all the voyages in the ICOADs database, though the year 2000: the same scrubbing pattern is visible in several other decks besides the four discussed here. Humans use the ocean to network, but they also and always use it to pillage. And the basic behavior is probably mappable to all sorts of behavior. If there was data that showed the extermination of the passenger pigeons, or the desertification of the dust bowl, or any of the other rational behaviors with systemic consequences, I suspect we'd be able to see behaviors were, in some fundamental way, the same, but with significant variations.

That's something all humanists should be interested in. For scientists, this sort of information might be useful because it can allow reconstructions of population dynamics: there's been some interesting work done with logbooks including sighting counts on reconstructing whale populations, for example. Only at the limits does that sort of approach to data begin to resemble something that historians will recognize as their stock in trade.

But since the swarming behavior we see here is a fundamental part of human behavior, humanists need the vocabulary to describe and communicate it. While we want to tell only narratives in prose with human characters, this will be hard: the swarm is about aggregate behavior, not about individuals. If we try to tell the story of whaling as Melville did, from the ground up, we'll always be awed at the power of nature and the insignificance of man; it will always seem like a few Ahabs, however crazed, will only imperil their own ships.

But when we use data to write history, it's much easier to talk about human actions that are aggregate but very real. An insistence that humanism must always hold true to individual experience can be emancipatory, but it can also be morally blind. I had a friend whose final workshop at Princeton, before he left the ivory tower, damned the whole profession for failing to confront the challenge of climate change in every work it produces. That may be a bit much; but there's no question that the  insistence, both conservative and liberal, that history should be rooted in the archival experiences of individuals makes it much harder for us to explain some of the most important consequences of human actions in the world.

This is why historians need to use work with data. Data lets historians tell engaging stories that aren't narratives, and that tap into a source of explanations slightly removed from the actions of individuals or networks. The experience of Ishamel on the seas, and the individual logbooks pushing farther north in search of prey, won't tell us the story of where the whaling industry went as fully as we can supplemented by even scant data. And those dynamics of swarming and exploitation play into too many of the crises that historians need to contextualize, right up to the present.

To end by going back to the sources: It's no accident that the data set I use to get the whaling ship paths was created to understand the dynamics of climate change. Nor should it be surprising that the US government's effort to make that data available is facing crippling spending cuts. It's not just historians, after all, who are struggling to give adequate structural explanations of how humans do and should act on complex systems. The reason our history should include narratives of data like this is because a humanistic approach to data can help us better appreciate the ways those system work and fail. It's something we learn from reading Melville. Whales, and the ways we kill them, can be a metaphor for something bigger.


  1. Magnifique! It's very interesting. One the best application of data mining of historical documents I know. Thanks to share how you do it.

    It'll be very fascinating if we can build a project like yours about terrestrial tranport: trains, post, road for cars, public transportation, etc.
    You show how "time data", chronological data are necessary as space and geographical data are.

    There is many tools to geo-code, for geolocalisation, but chrono-coding and chrono-localisation is essential too. In our "data in real-time" era, your work show how historical data, time and chronology are useful and essential. (Luc Gauvreau, Montréal, Canada)

  2. Interesting visualization. It is good to see the focus on whaling as a distinct type of voyages where there is seldom an announced port destination, but rather a general region. Much different dynamics than shipping logbooks.

    In the visualization, I did note the traffic across the Indian Ocean and through the South China Sea to Hong Kong, as well as that in and out of the Bangladesh region. That traffic, especially the latter, did not show up in our work as clearly. Very interesting indeed. Are you sure that all of the vessels were whalers, and do you know if any were not American whalers?

    I've worked with the Maury extractions of American whaleship logs, as well as making additional extractions from such logbooks. Some of this work has recently been published in Public Library of Science One (PLoS One) as: Smith TD, Reeves RR, Josephson EA, Lund JN (2012) Spatial and Seasonal Distribution of American Whaling and Whales in the Age of Sail. PLoS ONE 7(4): e34905. doi:10.1371/journal.pone.0034905

    As you mention, there is a problem in such vessel log data in that the population of ships is not always clear. In the case of the American whaling effort in the 19th century, such a list has been compiled (almost complete we believe), and this has allowed us to interpret these data somewhat better:

    Lund JN, Josephson EA, Reeves RR, Smith TD (2010) American offshore whaling voyages 1667 to 1927. New Bedford, MA: Old Dartmouth Hist. Soc. and New Bedford Whaling Museum/Vol. I: Voyages by vessel, 670 p.; vol. II: Voyages by master. 349 p.

    also see http://nmdl.org/aowv/whindex.cfm

    Also, we did work with just Maury's data on the progression of whaling grounds in the North Pacific area earlier in Josephson EA, Smith TD, Reeves RR (2008) Historical distribution of right whales in the North Pacific. Fish and Fisheries 9: 1–14.

    Interested in the R code that you used; have you made that available?

    Thanks, Tim Smith

    1. Thanks Tim, especially for the link to your PLoS paper; very sorry to have missed it the first time, but it seems invaluable for thinking about whaling voyages particularly in the later period. I did see the earlier article on right whale distribution, which fortunately saved from thinking that I should go out and try to worry about the whale populations. Is the raw data for those available anywhere?

      I'm sure I've mis-categorized some of the whaling voyages here, since it's just using a machine-learning algorithm to pull certain voyages out based on similarity to New Bedford, Sag Harbor, and a few other ports. I had noticed some of the Canton routes, as you say. (Fuller description here). When I come back to this, I'll check those Bangladesh voyages to see why they're showing up in the set. I suspect with better/larger ground truth sets, the machine learning would work better. The problem I've encountered is that the ICOADS digitization of the Maury logs can be difficult to effectively crosswalk against other data. Since doing this I've heard that there are separate digitization projects that may be better done--(ICOADS US Maury also has some obviously incorrect year data for a few ships)--was yours independent?

      For nationality: the descriptions of the file indicate there are only American voyages in the American Maury set. One thing that is potentially interesting about this sort of data is being able to use similar classification procedures to search for whaling voyages in other places, though. The major sources out there from this period are British and German merchant marine, though, and I'd be surprised were there much whaling in there.

      Most of the code is available here, though it's still quite slapdash: hopefully I'll get a chance to clean it up some later.

    2. Thanks for the reply. We did digitize the whale information manually as an addition to the ICOADS files. The whale data recorded by Maury has some gaps in it, and also we've found that some but not all voyages have long gaps (1 to 3 months) where the vessels appear to have been whaling in enclosed waters (s.e. Sea of Okhotsk Sea). That's giving us a little distortion in our picture.

      Thanks for the R code.... I am interested in your approach to the geographic plotting in particular. I'll see if I can get my head around how you are doing that.

      On the Canton routes and the Bangladesh voyages, it would be helpful to be able to identify the specific voyages and the full tracks to see if they seem to be whaling. We have little information on whaling in that region (which may have been British vessels), so being sure becomes important to us.
      Thanks, Tim

    3. I should probably be upfront that I bet the Canton routes are a machine learning artifact, not unknown whaling, probably showing up because they passed through Fiji or Zanzibar or some other area that whaleships tended to frequent more than traders. Since this is Maury's data, you've probably already rejected those tracks. Whaling is the easiest to tag, but I may go through and seed a few other types of trips (the China trade, spice traders) with representative examples that could help highlight other things going on in the data.

      For people like you who have better versions of the Maury data than ICOADS, one thing that might be interesting would be to extent something like this to the 99% of ICOADS that isn't American whaling. It might be possible to do some sort of analysis identify likely whaling clusters in later periods of the dataset (1880-1960). Most likely they'd have to be registered as part of national merchant marine to be in it any large numbers--do you think any particular countries would be a good place to start?

      That's very helpful to know about the Sea of Okhotsk; because of other data continuity problems I haven't been checking the time off on these, but a next step might also be to start checking where voyages break off and start to see if there are others regions that seem to be omitted. (It seems in some cases (like the Acushnet, Melville's trip) someone didn't bother to save the North Atlantic parts of the journeys, and in other ICOADS sets whole decades will be missing data from (say) south of the equator.

  3. Hey Ben -

    I think you might find this visualization interesting - http://jcachat.com/zeitgeist.html