Suppose I have a business making clothing. I want to know how many items of what size to manufacture. If I can know the distribution of men's and women's adult heights in the U.S., I can plan how many shirts or pants to make of each size. Assume that the height data are not readily available. So I draw a sample of people, measure their heights, and calculate means, standard deviations, quartiles, and probably other measures of the distribution. I think that an anthropologist down at the U might be interested in my height data, so I head downtown.
The first anthropologist I run into is a cultural anthropologist. When I show him my data, he chides me for being simplistic. How can I possibly think I have described my population of people when I have only looked at their height? We want to know so much more about people, she says. My little study is ridiculously limited and it can't help him understand people at all. It is reductionist, It is useless. Why did I bother.
Then I run into an evolutionary anthropologist. She likes the data I gathered. She can compare these results with her own measurements of height in Lower Slobovia, and learn something about human height variation. To her, these are interesting and important data.
For my own purposes, and for the evolutionary anthropologist, my little study of height provides important data. It helps each of us answer a question of importance about height. Is this study rigorous and useful? Yes.It is reductionist? Yes, again. Is that bad? Only for the cultural anthropologists who wants more information and more nuance.
You can probably see where I am going here. Over the past couple of years, I have encountered considerable opposition to our work in settlement scaling from archaeologists, historians, and others.
(On the scaling work, see this post from 2014, or a bunch of posts in Wide Urban World; this is the latest post there.)
These people complain that this research is reductionist. How can we possibly understand ancient settlements by just comparing the population to one other variable using a graph and an equation? Cities and settlements are far too complex to be explained by two variables. But we have never claimed to explain ancient cities or settlements on the basis of a scaling regression. Instead, we claim to produce a better understanding of a particular limited domain of ancient settlements. If you want a comprehensive analysis of individual ancient cities, then be my guest. I have done that kind of thing (Smith 2008), and it is a useful approach. But now, when I am addressing a limited domain using a few variables, please don't accuse me of reductionism, as if that charge invalidates the research.
This is not just me feeling oppressed by clueless reviewers, colleagues, audience members, and such. The roster of the reductionism naysayers I have encountered includes some good, smart scholars. In fact, even very well-known and respected scholars fall victim to this malady of poo-pooing single indices or variables for not explaining everything one might want to know about a phenomenon. For example, here is what Thomas Piketty, in Capital in the Twenty-First Century, says about the Gini index: "Indeed, it is impossible to summarize a multidimensional reality with a unidimensional index without unduly simplifying matters and mixing up things that should not be treated together" (Piketty 2014:266). As pointed out by Branko Milanovic (2014), Piketty dismisses the Gini index as an "aseptic" measure of inequality. But who has claimed that the Gini index will tell us everything we want to know about inequality? It tells us one kind of thing, and it allows us to compare separate contexts.
The Gini index, and my hypothetical measure of height, are intentionally reductionist. Their goal is NOT to document or explain everything about some domain. Rather, their goal is to abstract a key dimension from a complex reality, to reduce the messy details to a single measure so that comparisons can be made among domains. Comparative analysis is impossible without simplification, without ignoring a lot of details. If you want to say, "I'd rather do a detailed comprehensive analysis of one case," that is fine. If you want to say "I don't like statistical studies or regression analysis," that is fine (well, maybe its not really fine, but it is not too uncommon). But please do not say "Because I happen to like details, then your reductionist measure is worthless."
I gave a talk in Europe recently promoting comparative approaches to past urbanism. I made the point that in order to compare cities, one had to abstract some key aspects and ignore many details. This allows one to generate useful and interesting conclusions. When I was done, the first question (from an urban historian) was,"Isn't all this quite reductionistic?" My answer was "Yes! And that is precisely why I do it!"
Take a look at my post from last year , "Against nuance," for some related ideas.
2014 The Return of 'Patrimonial Capitalism': A Review of Thomas Piketty's Capital in the Twenty-First Century. Journal of Economic Literature 52 (2): 519-534.
2014 Capital in the Twenti-first Century. Belknap Press, Cambridge, MA.
Smith, Michael E.
2008 Aztec City-State Capitals. University Press of Florida, Gainesville.
Monday, August 15, 2016
Thursday, June 9, 2016
Why would a journal called "Scientific Data" publish bad data? The Chandler/Modelski city-size problem
Scholars interested in changes in city size over long periods of time often turn to one or both of two encyclopedia compilations of data: Tertius Chandler’s, Four Thousand Year of Urban Browth (Chandler 1987), and George Modelski’s World Cities: -3000 to 2000 (Modelski 2003). Chandler’s book is an update of an earlier version (Chandler and Fox 1978). The data in both Chandler and Modelski are a mess, routinely dismissed by urban demographic historians as worthless for serious scholarship. Yet a growing number of scholars—particularly economic historians—mine those sources for city-size data in order to investigate various questions. This situation brings up a number of thorny professional and ethical questions.
In this post I describe the situation and point out some of the troubling questions that come to mind. This is not a thorough exploration of either realm. I only have two more days at the lab in Teotihuacan, Mexico, and I have about four days worth of tasks to complete.
I am moved to write this because of a new paper relying on Chandler and Modelski’s bad data was just published in a new journal called Scientific Data (Reba et al. 2016).
What is wrong with the data of Chandler and Modelski?
The new paper (Reba et al. 2016) describes these sources. How they were compiled, and various problems and difficulties with the data. The latter are mostly limited to context, measurement, and presentation issues. The basic question of whether the data are accurate is barely considered. I am not a demographic historian, so I cannot provide a detailed critique. But I do use systematic archaeological and historical data on city size in some of my research, so I have looked into this question. I will limit myself to quoting experts (that is, urban demographic historians, scholars with experience working with primary sources). Some of these refer to the earlier version of Chandler (Chandler and Fox 1978), but the methods and reliability are not very different between the two editions.
(de Vries 1984)
· “Three Thousand Years of Urban Growth by Tertius Chandler and Gerald Fox, is a massive collection of information about the size of cities. Its unsystematic character, and, even worse, the authors’ reliance on suspect sources and their completely uncritical use of such sources renders the volume all but unusable.” (p. 18)
· “the compendious, useful but not obviously reliable T. Chandler and G. Fox,” (p. 2)
· Chandler and Fox: “But it is only after the year 800 that this survey may be regarded as truly systematic. This monumental and extremely useful work nevertheless leaves a number of important gaps unfilled. These gaps result from the omission of a fairly sizeable fraction _ / - of the cities of the world (probably 20% of the larger cities and 60% of the smaller ones) and especially from the failure to undertake a systematic review of recent studies on the history of various individual cities.” (p. 116-117)
· “Chandler and Fox fail to use sources in languages essential for their task” (p.66)
· There are “serious flaws in the work” (p.23)
· [Chandler and Fox employ] “dubious rules of thumb”
· “Statements and methods like these undermine confidence in the work as a whole, because they suggest a fundamental naiveté … The authors overestimate the importance of statistics in the thinking of our ancestors, and assume too much uniformity in pre-industrial population trends. For all the cautious skepticism proclaimed in their introduction, they still seem too willing to make census taker out of medieval travelers, and to assume that, in the absence of factories and international capitalism, one place was more or less like another. This is not a work informed by the research of other scholars on the various reasons for collecting data in the past, or in the complexity of population trends even in small, non-industrial cities. Without more attention to such matters, the reader can only guess at the margins of error he ought to attach to the Chandler-Fox statistics, and therefore cannot use them with assurance in further calculations (for example, in computing rates or indices with gross population as a base.” (p. 23-24)
· “One of the weaknesses of the Chandler-Fox collection is that the authors are sometimes too gullible in dealing with their informants. Similar errors may be introduced by mis-enumeration on the part of those who first gathered the evidence.” (26-27)
· “Issues of completeness, scale, and time-depth make some studies (e.g.. Chandler 1987) less appropriate for comparative purposes.” (p.40)
· Critical of Chandler, but uses the data
· Chandler’s data are “more doubtful, at least for some regions.” (p. 219)
(Chase-Dunn et al. 2005:97)
· Chandler's “estimates are obviously error-prone.”
In order to make sure my view of this topic was up-to-date, I emailed several urban demographic historians (about a year ago). I did not ask for their permission to quote their remarks, so I include the following quotations as anonymous. These two are well-respected historians, each of whom has published books and articles on population reconstruction in the past.
· “Many years ago I thought about writing a paper on this/these books (There are two editions, I think) but never did get around to writing it. But as far as I'm concerned it's all but worthless. Chandler was assiduous, without a doubt, but just compiled (or piled) one estimate after another without any indications of the merits or demerits of each. It might be useful for tracking down estimates but not for the estimates themselves.
· Modelski I don't know--yet.”
· “I do not have a high opinion of Chandler’s data. In those cases where I was able to check his Greek, Roman, medieval or early-modern data, they turn out to be seriously wrong. Do not know about the more modern data, though.”
A note on these scholars
Most historians are notoriously picky about their sources; they like to stick close to their textual sources and hesitate to compare or generalize beyond one or two cases (Grew 1990; Kocka 2003). Thus one might be tempted to reject the above critiques as reflecting this bias. Some historians simply do not accept the validity of broad comparative analyses that require simplification and standardization of diverse local datasets. I discuss this and related issues of comparative analysis elsewhere (Smith and Peregrine 2012). But most or all of the individual quoted above actively pursue comparative analysis. They have all assembled regional or temporal databases of city-size data, although not on the scale of Chander and Modelski. While they are well aware of the historiographic issues of data sources, they are also willing and able to systematize, simpilify, and push forward with comparative data. In other words, their critiques do not reflect the cranky complaints of particularistic historians who are anti-comparativist. Instead, their critiques reflect real historiographical issues in the origins, quality, and relevance of primary data on city sizes in the past.
Some positive comments
· “This book will be useful to students of population and urban development because it is the only worldwide compilation done by one person using consistent criteria.” (p. 22)
Christopher Chase-Dunn and Daniel Pasciuti reviewed Modelski very positively in Jr. World Systems Research, 2004. They work with Modelski and have co-published with him. They evidently liked the book so much that they later published the IDENTICAL book review, word-for-word, in the journal Globalizations, in 2006. Hmmmmmmm……..
Uses of these data by scholars seemingly oblivious to potential historiographical problems:
· (Reba et al. 2016)
· (Manning 2005)
· (Jedwab and Vollrach 2015)
· (Jedwab and Vollrach 2016)
· (Nunn and Qian 2011)
Uses of the data by scholars who attempt to verify or adjust the data in relation to other scholarship
· Ian Morris uses some of the data cautiously, in the context of a discussion of it’s validity:
· “in my opinion some of Chandler and Fox’s estimate are not supported well by the data.”
· “While there would be some advantages to taking a single source like Chandler and Fox’s Three Thousand Years of Urban Growth and then relying on it consistently, the drawbacks seem to outweigh them.” (p. 146)
(Acemoglu et al. 2002)
· These scholars analyze Chandler’s data, compare them to, Bairoch’s data, and attempt to evaluate their usefulness.
My gut reaction to works like the new paper by Reba et al, is “Garbage in, garbage out.” The data cannot be trusted, so why should we be expected to trust the results of the analysis? Here are some troubling questions that arise out of this situation.
(1) Why do otherwise rigorous scholars feel free to use bad data?
· Because it is there. If data exist in some usable format, and they seem relevant to a research question, someone will use the data, even if the data are terrible, unreliable, or inaccurate. I have considered posting a bogus dataset online to see if people will analyze it. The risks seem to outweigh the benefits, though.
· Because scholars lack training in other disciplines. The paper by Reba et al is curious in that it has a very rigorous discussion of geospatial methods, but virtually no discussion of the historiography of the city size data. See some of the sources quoted above for historiographical discussions of historical city-size data.
· Because scholars are not critical of sources. Perhaps there is an assumption that all data in other disciplines are valid. Perhaps scholars believe that if something is published it is true. Do the authors think that data published by non-experts in another discipline (neither Chandler nor Modelski are/were urban historians or demographic historians) are valid just because they are published? Reba et al justify their use of Chandler’s data based on the fact it is published and other scholars have used it. They provide citation data for Chandler and Modelski. The message is that if it is published and cited, it must be valid. Hmmm, I haven’t encountered that principle in my readings on methodology in the social sciences or the historical sciences.
(2) Why don’t scholars who know the data well care about this?
Scholars like Bairoch, de Vries, and others quote above DO care about data quality, and hence their negative remarks on Chandler and Modelski’s data. But many of these quotations come before the time when economic historians and others started mining data sources for city size data. Perhaps these and other historians simply don’t care that historical data are being used badly. With the advent of the Internet, there are all sorts of bad data readily available, and all sorts of bad analyses of bad (and good) data. Or perhaps disciplinary myopia is the cause. Urban historians are not reading journals like Scientific Data, Explorations in Economic History, so perhaps they are unaware of the uses of Chandler and Modelski’s data. Perhaps they have trouble believing that such terrible data would be taken seriously by scholars.
This latter is a very real factor in work that crosses disciplinary boundaries. For years I didn’t believe that Jane Jacobs’s silly and ridiculously inaccurate model that cities preceded agriculture could be taken seriously by anyone at all. Then I found that geographers were citing and praising the model. I got cranky and did a blog post or two on this. Then when a paper in a major journal promoted this idea, I finally got motivated to publish a critique, so I rounded up a few colleagues and we published a paper (Smith et al. 2014). Maybe the urban historians have not yet been provoked sufficiently to mount a proper attack on Chander and Modelski.
(3) What can be done about this situation?
I have to admit that I really despair of this situation. I am very upset that such obviously poor data are being used by otherwise rigorous scholars, and I am upset that I don’t have better data. I have talked to quite a few colleagues—archaeologists and ancient historians—about this situation. I have asked if any of them were involved in assembling reliable and accurate data on ancient city sizes in their region of specialty, and the answer has been negative. I have asked if they knew of anyone doing systematic urban demographic history in their region, and again the answer is no. In my own region, Mesoamerica, there was a flurry of demographic work on city size in the 1980s, but then scholars lost interest. I have asked if anyone might be interested in mounting such a systematic comparative project, again with a negative answer. This may not arise entirely from a lack of interest – if someone asked me to assemble demographic data from all Mesoamerican cities at, say, 50- or 100-year intervals, I might claim that I am too busy for such a task. It would take a lot of time, and it is hard to see how a granting agency would get excited about such a project.
I don’t have any grand conclusions here. The situation is bad--journals and scholars are merrily using bad data--and I don’t have any good solutions, beyond the suggestion that bad data should be avoided. The journal Scientific Data should be ashamed for using such non-scientific data. We can all do better than this.
Acemoglu, Daron, Simon Johnson, and James Robinson
2002 Reversal of Fortune: Geography and Institutions in the Making of the Modern World. Quarterly Journal of Economics 117 (4): 1231-1294.
1988 Cities and Economic Development: From the Dawn of History to the Present. Translated by Christopher Braider. University of Chicago Press, Chicago.
Binford, Henry C.
1975 Never Trust the Census Taker, even when he'd Dead. Urban History 2: 22-28.
1987 Four Thousand Years of Urban Growth: An Historical Census. St. David's University Press, Lewiston, NY.
Chandler, Tertius and Gerald Fox
1978 Three Thousand Years of Urban Growth: An Historical Census. Academic Press, New York.
Chase-Dunn, Christopher, Alexis Álvarez, and Daniel Pasciuti
2005 Power and Size: Urbanization and Empire Formation in World-Systems Since the Bronze Age. In The Historical Evolution of World-Systems, edited by Christopher Chase-Dunn and E. N. Anderson, pp. 92-112. Palgrave Macmillan, London.
de Vries, Jan
1984 European Urbanization 1500-1800. Harvard University PRess, Cambridge.
1990 On the Current State of Comparative Studies. In Marc Bloch aujourd'hui: Histoire comparée y et sciences sociales, pp. 323-334. Éditions de l'École de Hautes Études en Sciences Sociales, Paris.
1978 Conquerors and Slaves. Cambridge University Press, New York.
Jedwab, Remi and Dietrich Vollrach
2015 Urbanization without Growth in Historical Perspective. Explorations in Economic History (under review??) 58: 1-21.
2016 The Urban Mortality Transition and the Rise of Poor Mega-Citiesunpublished paper posted online.
2003 Comparison and Beyond. History and Theory 42: 39-44.
Kowalewski, Stephen A.
1990 The Evolution of Complexity in the Valley of Oaxaca. Annual Review of Anthropology 19: 39-58.
2005 Migration in World History. Routledge, New York.
2003 World Cities: -3000 to 2000. Faros 2000, Washington, DC.
2013 The Measure of Civilization: How Social Development Decides the Fate of Nations. Princeton University Press, Princeton.
Nunn, Nathan and Nancy Qian
2011 The potato's contribution to population and urbanization: Evidence from an historical experiment. Quarterly Journal of Economics 126 (2): 593-650.
Reba, Meredith, Femke Reitsma, and Karen C. Seto
2016 Data Descriptor: Spatializing 6,000 years of global urbanization from 3700 BC to AD 2000. Scientific Data 3 (160034).
1978 Urban Networks and Historical Stages. Journal of Interdisciplinary History 9 (1): 65-91.
Smith, Michael E. and Peter Peregrine
2012 Approaches to Comparative Analysis in Archaeology. In The Comparative Archaeology of Complex Societies, edited by Michael E. Smith, pp. 4-20. Cambridge University Press, New York.
Smith, Michael E., Jason Ur, and Gary M. Feinman
2014 Jane Jacobs’s 'Cities-First' Model and Archaeological Reality. International Journal of Urban and Regional Research 38 (4): 1525-1535.
2003 Urbanisation in Europe and China during the second millennium: a review of urbanism and demography. International Journal of Population Geography 9 (3): 215-227.
Thursday, April 21, 2016
Anarchist theory developed when non-anthropologists (like Peter Kropotkin), who knew little about small-scale nonwestern societies, discovered that not all social situations are hierarchical, that there are ways of organizing society without rulers or elites, and that cooperation among individuals has positive benefits. For people whose experience and knowledge is limited to modern nation-states, these may be real insights that describe attractive alternative social patterns. But anthropology developed as a discipline that studied small-scale nonwestern societies. Non-hierarchical social arrangements, lacking rulers and elites, where cooperation reigns, are not a big deal. The world is (was) full of such societies, and anthropologists long ago figured out what they were like and how they worked.
So why would archaeologists wan to use anarchist theory -- developed by people without much knowledge of small-scale nonwestern societies, writing about alternatives within modern nation-states -- instead of the fruits of more than a century of ethnographic research and anthropological analysis? Yes, Kropotkin hung out with villagers in Siberia and learned something about their way of life. But ethnographers lived for decades in villages all over the world, and produced far better knowledge about small-scale society than Kropotkin or the other anarchists could ever produce. This is my puzzlement about the adoption of anarchist theory by archaeologists. Anthropology has better data and better theory about small-scale societies.
On the other hand, one of my favorite urban scholars is British anarchist Colin Ward, whose work I find insightful. I encountered his work when I was researching informal settlements and their urban attributes (Smitih 2010; Smith et al. 2015). Ward worked with radical housing advocate John F.C. Turner in the 1970s, writing the preface of Turner's 1977 book, Housing by People. On shantytowns, Ward (1973:70) states,
“The poor of the Third World shanty-towns, acting anarchically, because no authority is powerful enough to prevent them from doing so, have three freedoms which the poor of the rich world have lost. As John Turner puts it, they have the freedom of community self-selection, the freedom to budget one’s own resources and the freedom to shape one’s own environment. In the rich world, every bit of land belongs to someone, who has the law and the agents of law-enforcement firmly on his side.”
For Ward, shantytowns in the developing world exhibit the basic principles of his "anarchist theory of organization (Ward 1966). The act of building in informal settlements is:
I enjoy teaching Ward's (1973a) chapter on shantytowns, "We house, you are housed, they are homeless." It challenges students views that slums are terrible places of crime and social breakdown, and it also challenges their views of anarchism. Students often think anarchists are old guys holed up in a cabin in the woods with their guns and dogs. The notion that anarchism is a collective and communal way of life is a good discussion topic.
For me, the theoretical value of Ward's work is that it is based on the notion of the generative power of social collectivities:
· “An important component of the anarchist approach to organisation is what me might call the theory of spontaneous order: the theory that, given a common need, a collection of people will, by trial and error, by improvisation and experiment, evolve order out of the situation—this order being more durable and more closely related to their needs than any kind of externally imposed authority could provide.” (Ward 1973b:31).
While I like Ward's work and his perspective, I don't find much analytical power. That is, he has a nice descriptive account of generative processes, but without the causal mechanisms and theoretical power of many alternative social-science approaches to generative processes. For example, collective action theory (Levi 1988), cooperation research in economics (Bowles & Gintis 2011), neighborhood analysis (Sampson 2012), and Elinor Ostrom's (1990, 2005) institutional analysis are examples of theoretical approaches that have more power and (for me) more usefulness than Colin Ward's anarchist theory. And the urban scaling research I am involved with now is based on a generative theory that derives quantitative urban patterns from the social interactions among people within built environments (Bettencourt 2013). Colin Ward's anarchist theory is entirely consistent with the scaling model, but the latter is a far more powerful model.
So, is anarchist theory useful? I guess if it helps one think about important issues, then it is useful. In this sense, Colin Ward's anarchist theories of architecture and urbanism have been useful to me (for analyses of Ward's thought, see Honyewell 2011, or especially White 2007). But for more powerful explanatory models, I need to look elsewhere. As for more generalized anarchist theory, it is hard to understand why archaeologists would take the word of anthropologically-clueless anarchists over anthropologists who have been studying "anarchist" societies for more than a century.
Angelbeck, Bill and Colin Grier (2012) Anarchism and the Archaeology of Anarchic Societies: Resistance to Centralization in the Coast Salish Region of the Pacific Northwest Coast. Current Anthropology 53(5):547-587.
Bettencourt, Luís M. A. (2013) The Origins of Scaling in Cities. Science 340:1438-1441.
Bowles, Samuel and Herbert Gintis (2011) A Cooperative Species: Human Reciprocity and its Evolution. Princeton University Press, Princeton.
Dugatkin, Lee A. (2011) The Prince of Evolution: Peter Kropotkin's Adventures in Science and Politics. Createspace.
Honeywell, Carissa (2011) A British Anarchist Tradition: Herbert Read, Alex Comford and Colin Ward. Continuum, New York.
Levi, Margaret (1988) Of Rule and Revenue. University of California Press, Berkeley.
Ostrom, Elinor (1990) Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press, New York.
Ostrom, Elinor (2005) Understanding Institutional Diversity. Princeton University Press, Princeton.
Sampson, Robert J. (2012) Great American City: Chicago and the Enduring Neighborhood Effect. University of Chicago Press, Chicago.
Smith, Michael E. (2010) Sprawl, Squatters, and Sustainable Cities: Can Archaeological Data Shed Light on Modern Urban Issues? Cambridge Archaeological Journal 20:229-253.
Smith, Michael E., Ashley Engquist, Cinthia Carvajal, Katrina Johnston, Amanda Young, Monica Algara, Yui Kuznetsov and Bridgette Gilliland (2015) Neighborhood Formation in Semi-Urban Settlements. Journal of Urbanism 8(2):173-198.
Ward, Colin (1966) Anarchism as a Theory of Organization. Anarchy 62:97-109. Reprinted at "The Anarchist Library, Anti-Copyright".
Ward, Colin (1973b) the Theory of Spontaneous Order. In Anarchy in Action, pp. 31-39. George Allen and Unwin, London.
Ward, Colin (1973a) We House, You are Housed, They are Homeless (chapter 6). In Anarchy in Action, pp. 67-73. George Allen and Unwin, London.
White, Stuart (2007) Making Anarchism Respectable?: The Social Philosophy of Colin Ward. Journal of Political Ideologies 12:11-28.