Change of plans

12/06/2013

 
Below is a letter I recently sent to my committee members and mentors. I share it here so that people know and understand where I am headed from here. I wish to thank everyone who has supported me during my graduate school career: family, friends, mentors, colleagues, and collaborators. I also want to let everyone know that my life as a researcher is far from over. For my collaborators, know that I'm still open to reading and working on manuscripts for projects we've done. And for those of you who have asked me for advice about graduate school, I hope it remains valuable to you. Thanks.


Dear Mentors, Committee Members, and GSR:

I write to inform you of my plans regarding my graduate school career. Depending on the availability of funding next quarter, I will go on official leave from the UW Graduate School beginning either Winter or Spring Quarter 2014. The length of my leave is as yet undetermined. I will maintain ties to the graduate school until I am either no longer eligible to do so, established myself outside academia, or returned to finish my dissertation. This decision has the support of my committee co-chairs, Drs Eric Smith and Donna Leonetti. 

I make this decision in light of: (a) personal adversities that I have faced, which are related to my recent divorce from my wife and my efforts to rebuild my life as a single father; (b) changes in my career priorities from academia to the field of data science within the public or private sector; and (c) my plans to found a company within the next two years. The company will build on my efforts during 2012 to statistically analyze fact checkers' reports to measure the truthfulness of politicians, along with our uncertainty in those judgments. You can learn about the concept of the company, which will be called SoundCheks, and follow its early development at www.soundcheks.com.

My short-term plan is to finish the first chapter of my dissertation, which involves the use of a complex hierarchical Bayesian model to aggregate multiple informants' reports to infer the social bargaining power of household members. The model pools variance across multiple households to infer both the social bargaining structure within households and informants' reports. The social bargaining information was to be used in my analysis of its interactive effects with kinship on remitters' decisions in the Commonwealth of Dominica. Now the model, along with other items in my portfolio, will showcase my statistical modeling prowess to prospective employers. While finishing this work, I will develop some additional skills before going on the job market. These skills include the use of Hadoop for manipulation of large data sets, specifically the RHadoop framework for doing so in R. I will also brush up on my Python and Django skills in preparation for developing SoundCheks.

My future is uncertain, but still promising. Your mentoring and my graduate school experience more generally have prepared me for what lies ahead. Thank you for your guidance, support, and teaching through the years. Please let me know if you have any questions
 
 
Earlier this week, Dr. Kate Clancy published a refreshingly snarky review of a recent paper in PLOS Computational Biology about the evolution of menopause. (Or is it post-reproductive lifespan? or is it both? The debate about menopause is sometimes very confused about this.) Though snarky, the main point of the article was subtle (though not as subtle as Christie Wilcox's fantastic response to Geoffrey North's editorial on science blogging in Current Biology). In one stroke, Dr. Clancy harped on the sexism that lingers in the field of physical anthropology, and is downright pervasive in media coverage of reproductive ecology. (I call her Dr. Clancy because she damn well deserves it, and she will kick my ass if I call here Kate, and perhaps murder me if I call her "Sugar".) 

Anyway, hidden within the article's snark was an insightful scientific critique of the PLOS article's key assumptions. First, I want to discuss the snark and the response it generated from at least one frustrated reader who very likely has a penis. Second, I want to discuss Dr. Clancy's scientific critique of the article (which I hope she will expand in a subsequent post that I would totally read, hint hint!).

First, the snark. It follows a recent series of articles in Dr. Clancy's blog that criticizes the Nature Research Center's reassignment of Canopy Meg (aka Dr. Meg Lowman) to what Dr. Clancy argues is a ceremonial position without any decision-making power. Before that was a study co-authored by Dr. Clancy on sexism and sexual assault during fieldwork. Dr. Clancy is on a mission to expose some serious gender issues in anthropology and other fields, and in media coverage thereof. I unequivocally support this mission, especially from a physical anthropologist, because such critiques often originate from cultural anthropologists and social theorists, whose research (thus reputation and persona) is neither as visible nor scientifically credible as Dr. Clancy's. Moreover, I tire of the caveman-ish way that the media covers reproductive ecology.

At least one reader (probably male) was frustrated by Dr. Clancy's approach because he didn't see that two things were going on in the blog post. You can see the full comment here. The reader accuses Dr. Clancy of conflating social commentary with scientific research. On one hand, Dr. Clancy doesn't do that, and clearly understands the difference between the two. On the other, readers of Dr. Clancy's article might not, and the reader who wrote Comment #1133 (who is apparently intelligent if, as Dr. Clancy describes it, "willfully ignorant") is an example. It would also be easy for many readers to interpret the article as an excoriation of the authors (who are all men) for their sexism (perhaps subconscious). For example, here is a quote from near the beginning of the post:

"In order to understand this revelation, and the contribution Morton et al. are making to reinforce...the reality of the patriarchy,..."

Interpreting the article in that way, many humans with penises will get emotional. As for me, I'd like to know if there is other reason aside from the author's gender and the article's downplay of female-centric explanations for post-reproductive lifespan to believe that Morton et al. are trying to reinforce the reality of the patriarchy...or is that comment part of the joke? My sense is that it is part of the joke, but I wonder if I am wrong. Moreover, if there is reason to believe the authors are trying to reinforce patriarchy, I'd like to know about it. And that's all about the snark, really, except that I hope Dr. Clancy continues to publish thought-provoking articles like this so I can Retweet them with gusto.

Now for Dr. Clancy's scientific points about the article, which I will address in turn. I am not a reproductive ecologist, so by "addressing these points", I mean mainly that I'm asking questions about them.


Female reproduction is also functionally constrained. In order to select really nice eggs that make really nice babies in whom we will want to invest, we have to make all our eggs at once, at about five months gestation. This is a time that we’re protected from all sorts of environmental factors that could make our eggs wonky, and it gives us time to have a few massive culling events so we use our very best eggs for ovulation. The problem is, if you make all your eggs at once, then they are all going to expire at about the same time, putting a limit on how long women are fertile.
So there may be physiological constraints arising from follicular atresia. But that's a proximate cause, not an ultimate one. It is entirely possible that phylogenetic inertia or the genetic architecture that underlies follicular atresia places a limit on the ability of the female reproductive system to push reproduction beyond a certain age. But it is also possible that phylogenetic inertia is not in effect, and that the genetic architecture of follicular atresia does not prevent mutations that would slow the rate of follicular atresia, which would require an explanation from the long post-reproductive lifespan. Is there evidence or logic that suggests which of these possibilities is more likely?


Subsequent work among other modern and historical populations have shown that grandmother presence can have a more variable influence on infant mortality and child health (Jamison et al. 2002; Ragsdale 2004; Sear et al. 2000). In a few studies, paternal grandmothers – so a mother’s mother-in-law – have a negative effect on their grandchildren (Strassmann et al. 2006; Voland and Beise 2002).
My suspicion is that a lot of the variation in the grandmother effects is statistical noise, and that the bias is strong for publishing neat results in human evolutionary biology journals, resulting in the publication of artificially large and statistically significant positive or negative effects, which causes people to throw up their hands. Then again, my suspicion is an untested hypothesis.


Even if human longevity and ovarian expiration dates are the reason for post-reproductive life, still, what you do with that post-reproductive life can have consequences for reproductive success and thus the gene frequencies of future generations.
Yes! I think that this is a much under-studied possibility, although it looks like there is a recent shift in the rhetoric. Are the effects strong enough to explain the post-reproductive lifespan? Some reason models derived by Hal Caswell (which I can't find anywhere for some reason, probably because I'm rushing to finsih this and get back to work) suggests they are. Then again, those models (which I saw presented in a seminar hosted by the Center for Studies in Demoraphy and Ecology) do not include the male sex (which is, by the way, ignored in most matrix population models) and thus did not include the effects that Tuljapurkar et al. and, yes, Morton et al. have modeled.

Lastly:


The model requires that fertility into old age is part of our ancestral history if menopause is to eventually evolve, yes? Then probably our closest living relatives, like say chimpanzees, don’t have menopause, unless it independently evolved more than once of course.
I need to read the paper in depth to see if  this assumption is necessary to make the model work. But what I have read suggests that the model describes the evolution of fertility and survival rates, which suggests that what the model really requires is that fertility and mortality were more closely linked in our ancestors.


Wait, you mean there is controversy here, and some papers provide evidence to suggest chimpanzees have menopause, just not the long post-reproductive life spans (e.g., Hawkes et al. 2009)? Well, then surely chimpanzee males prefer young females, just like human males, which is why it evolved in them as well.
Some papers provide evidence that there is menopause among chimpanzees. Then again, others, like this one, suggest that while chimpanzees have menopause, and while the wall of death in many species is more permeable than Hamilton's model suggests (perhaps due to the mothering effect), the link between fertility and mortality is tighter in species like chimpanzees. Humans have a long post-reproductive lifespan even in difficult environments, and that needs explaining. Perhaps the explanation isn't adaptive. Then again, perhaps it is, and perhaps males had something to do with it (as Dr. Clancy mentions in the final quote before I leave):


Finally, it might not be that one of these hypotheses is right, but that all of them make a contribution. You need to combine the major hypotheses in order for menopause models to work, and this has been done with theoretical modeling as well as using Sear and Mace’s empirical data from the Gambia (Shanley et al. 2007).
 
 
The collaborative project I'm workin on at the Northwest Center for Public Health Practice will use computer simulations to explore the best practice for emergency public health alerts and advisories. What kind of alerts and advisories should we send, who should we send them to, and how should we send them? The type of computer simulations we'll use are called agent-based models, sometimes called individual based models. We simulate the behavior of individual agents by coding their actions and interactions. 

Today I learned about Stormview, which takes a different approach to modeling the diffusion and actionability of information during emergencies. They use real, live individuals in a web-based sort of game. They track the individuals' decisions and compare those decisions across different communication methods. For example, do people prepare for a storm more often when the are shown the most likely path of a hurricane, or just uncertainty cones, and does that only apply to people who live near the most likely path?

Stormview reminds me of a lunch-time discussion session we had at the Northwest Center last week, led by Carl Osaki. Osaki is a veteran leader of what public health practitioners call table top exercises. There are basically role playing exercises where public health officials are lead through a series of emergency narratives, and have to discuss the limitations of existing protocols to deal with such emergencies. Osaki introduced the concept of "WIIFM" (What's In It For Me), which hooks participants into the narrative. I was thinking about how and if dynamic simulation software could be integrated into these table top exercises.
 
 
I'm writing an agent-based model of emergency public health communication networks for the Northwest Center for Public Health Practice. I'm using the RNetLogo to control the experimentation workflow and pass probabilistic model results from R to NetLogo. Eventually, I'll use the parallel package to run the experiments on the multiple cores of a computing cluster. For this reason, I'm writing the code within an R Markdown document that I format into HTML using knitr. That brings me into the realm of dynamic report generation, which is becoming all the the rage in statistical computing, and with good reason. I think agent-based modeling methods need to adapt to dynamic report generation methods. I'll give some examples why below.

Which comes first, model initialization or input data?

To organize the design and documentation of the model and experiments, I'm using the ODD and TRACE protocols. The TRACE protocol is fairly new. The ODD protocol has been around for about seven years and has rapidly become a if not the standard documentation protocol for agent-based modelers. 

The ODD protocol is broken into a series of sections that must be written in a set order. In the final section (the Details section), there are to sub-sections: Initialization and Input data. Initialization describes the starting state of the model. Input data describes any outside data that is input into the model. One of my input data sources is the age and sex distribution from the 2011 Current Population Survey by the U.S. Census Bureau, which I use to populate the model with a realistic age and sex distribution drawn from a Dirichlet distribution parameterized from the observed counts and assuming a flat prior. This input data must be coded in before the initialization of the model, and yet the set order of the ODD protocol puts Initialization before Input data.

Suppose that I want my ODD protocol and code to, as much as possible, be in the same dynamic report. Well, then I need to find a way to write in the input data before the initialization code in a way that puts the latter before the former. Either that or I need to violate the ODD protocol standard. Either the ODD protocol needs to be more flexible in its order, it needs to reverse the order of these two sections (Input data makes sense to come before Initialization), or I need to grin and bear it. After a conversation over email with one of the ODD protocol authors, I'm going to just grin and bear it for now. But hopefully future editions of the ODD protocol will address this issue. I'm sure similar issues will come up as more people go the dynamic report generation route.

NetLogo needs to combine the Info and Code tabs


The NetLogo GUI has three tabs: the Interface, Info, and Code tabs. The Info tab is where the model documentation goes. For example, you could put your ODD protocol there, using the tab's Edit feature to format it using a simple markup language. In the Code tab, you put...well...you code. You can also comment on your code. But wait a second....why do my code and the information about the code need to be in two separate places? Why not have a dynamic report tab where I can use a language similar to R Markdown (call it NetLogo Markdown), which allows me to insert inline and display code right into the report? Oh, if I only had time to develop that! Unfortunately, I only had time enough to write this crappy blog post. Hopefully Seth Tisue or someone will read it and go, "Hrm..."
 
 
I'm on my lunch break from my day job at the the Northwest Center for Public Health Practice. What better thing to do on a lunch break than write another blog post about the origins of human violence? I recently wrote a couple of posts (this one and that one) that criticized archaeologist Brian Ferguson's reevaluation of the archaeological evidence that Steven Pinker uses to argue that pre-state societies were more violent than state societies. My conclusion: Ferguson raises the possibility that Pinker's list is biased and corrects a few double-counted sites, but provides no alternative data and uses faulty statistical reasoning that is guaranteed to introduce bias on his side (whereas there is only the possibility that Pinker's sample overestimates ancient homicide rates).

Today, I set my sights on a recent article in Foreign Policy magazine written by international relations expert John Arquilla. The article is called "The Big Kill." More importantly, its tagline is "Sorry, Steven Pinker, the world isn't getting less violent." You can guess what Arquilla's main argument is!

When I saw this article, I was excited. I thought, Somebody has better data that refutes Pinker's findings! I was sorely disappointed. Here is a paragraph that summarizes Arquilla's criticism of Pinker.


The problem with the conclusions reached in these studies is their reliance on "battle death" statistics. The pattern of the past century -- one recurring in history -- is that the deaths of noncombatants due to war has risen, steadily and very dramatically. In World War I, perhaps only 10 percent of the 10 million-plus who died were civilians. The number of noncombatant deaths jumped to as much as 50 percent of the 50 million-plus lives lost in World War II, and the sad toll has kept on rising ever since. Perhaps the worst, but hardly the only, terrible example of this trend can be seen in the Congo war -- flaring up again right now -- in which over 90 percent of the several million dead were noncombatants. As to Pinker's battle-death ratios, they are somewhat skewed by the fact that overall populations have exploded since 1940; so even a very deadly war can be masked by a "per 100,000 of population" stat.
Arquilla goes on to argue that the number of violent conflicts has increased in recent human history (in particular the 20th and early 21st centuries). He also rattles off a bunch of large figures for the numbers of deaths due to recent wars.

Where do I even begin? I guess I'll start by pointing out that Arquilla completely misunderstands the central point of Pinker's book, which is that an individual's risk of being killed, physically assaulted, or raped  has been declining over the long term of human existence. In other words, Pinker is measuring exactly what he should be measuring to make the argument he wants to make. The number of "big kill" wars are likely to increase as your population increases! That is Pinker's pointRepeatedly, Pinker states that the number of homicides and number of violent conflicts is largely irrelevant if you take population growth into account. That's where the real trends emerge. 

Let's see, what else. Oh, you know those "battle death" statistics that Pinker's conclusions rely on? Yeah, Pinker's conclusions are not based solely on deaths during battle. In fact, Figure 2-2 in Better Angels, one of the book's most important graphs, shows figures for deaths from wars and genocides and man-made famines during the 20th century. In addition, here is a relevant excerpt from page 50 of Better Angels:


If we consider that a bit more than 6 billion people died during the 20th century, and put aside from demographic subtleties, we may estimate that around 0.7 percent of the world's population died in battles during that century. Even if we tripled or quadrupled the estimate to include indirect deaths from war-caused famine and disease, it would barely narrow the gap between state and nonstate societies.
If you're going to refute an argument, you need to understand it first, or at least read a press release of the book you're criticizing. Otherwise, you have nothing but colorful (in fact, downright sensationalist) language like the last few lines of Arquilla's article (which I do not think should have even been published it is so far off the mark):


No, war is not on the wane. The second horseman of the Apocalypse remains with us. Indeed, it seems he may even have found a fresh mount. 
Language like that makes you wonder: what's this guy's motivation and how did his perspective get so skewed that he misunderstands one of the first (and last) points Pinker makes in his book? To investigate possible answers, we should start by noting that he teaches in the special operations program in the United States Naval Postgraduate School and is chairman of the Defense Analysis department.
 
 
(0) cd "C:\Users\Benjamin\Documents\

(1) repeat until 7 pm:

(a) [a few lines of NetLogo code]

(b) [check syntax]

(c) git add NWCPHP-ABM

(d) git commit -m [character string]

(e) git push origin master
 
 
This summer, I'm working at the Northwest Center for Public Health Practice with Janet Baseman, Debra Revere, and Ian Painter. We will build a computer simulation that will help us understand the best practices for sending emergency public health alerts and advisories. What advisories should get priority? How can we avoid information overload? How do we get the message to hard-to-reach populations? How can we make the information system more robust? What methods should we use to send the messages? Ultimately, how do maximize the rate at which a message diffuses through a population so that people will take action that save lives?

 Before I joined the center, the research team did a controlled experiment to address some of these questions. They sent public health advisories to health care providers in three different formats (text message, fax, and email), and randomized the delivery method across providers. The team investigated whether health care providers suffer from "alert fatigue," where the volume of messages diminishes their recall of message content. They looked at the influence of age on recall, and on the acceptance of different message delivery types as credible. The trouble is that many unforeseen factors of public health information dissemination could have influenced their results. That's why they hired someone like me: to use quantitative models to explore the consequences of those unforeseen factors.

So let's restate the research problem. We want to understand the effect of public health communication strategy on the dissemination of public health information and the rate at which people take action on that information. But we want to understand this process under realistic assumptions about the way information spreads within realistic social networks. We want a realistic model for the way public health agencies, health care facilities, community based organizations, and the general public interact with one another, and a realistic model for the way people make decisions based on the public health information they receive.

Because we're modeling a complex system with high realism, it would be difficult to use a simple mathematical model to represent it. Instead, we're going with an agent-based model that can handle complicated social interactions and decision-making. Thankfully, there is a lot of preexisting theory about the diffusion of information that we can incorporate into this complex model. Some of it actually comes from my own discipline, evolutionary anthropology. For example, it will be exciting to apply some cultural transmission theory, hitherto a fairly abstract idea, to a practical problem.

We had our first major model development meeting yesterday. We discussed the overall purpose of the model, justified our use of an agent-based model, and identified the basic social and environmental building blocks of the model. Today, I'm excited to translate some of these decisions into a formal protocol and pseudo-code. I might even start coding within the software framework we're using. 
 
 
Someone named Roger Gathman responded to my recent post about the debate over the origins of human violence. I reproduce the insightful comment below:


Oddly, you don't include what to my mind is Pinker's most scandalous mistake - that in two cases, he evidently thinks that two names for one site means that there are two sites! One is the Brittany site which is counted twice, once as a Brittany site, once as a French site. The other is Boggebaken Denmark, also counted twice due to the fact that it is also named Vedmack, Denmark. That is a pretty high error rate. Ferguson, as he himself says, is concerned with the way Pinker's list skews the reader to think that these are proofs of a universally high violence prehistoric scene. I don't think he has to recalculate Pinker - the point is that the list doesn't take into account other sites (and thus gives us really zero information about the prevelance of prehistoric violence) and that even the list itself is full of errors. For instance, the SaraiNahar
Rai site in India.is falsely claimed to exhibit a 30 percent violent death rate. Ferguson pretty convincingly shows that this is a gross distortion of the evidence, which shows 1 killing. 
Your statistical point is a rather different point than Ferguson's, who is only trying to show how a small selection of evidence, a sample, has been manipulated - or, I would imagine, simply transferred from one secondary source to the other, as I can't imagine Pinker spent any time actually reading the literature about these sites, taking their descriptions on faith from his sources.
And now I will address each of Gathman's points.

Double-counted sites


Oddly, you don't include what to my mind is Pinker's most scandalous mistake - that in two cases, he evidently thinks that two names for one site means that there are two sites! One is the Brittany site which is counted twice, once as a Brittany site, once as a French site. The other is Boggebaken Denmark, also counted twice due to the fact that it is also named Vedmack, Denmark. That is a pretty high error rate.
Gathman uses the word "scandalous" to imply that Pinker purposefully misrepresented the evidence. It seems more likely that Keeley's and Bowles' descriptions conflict enough that Pinker honestly mistook them as two different sites. The cited dates are 2000 years apart! Anyway, let's go with Ferguson and discard one of these counts. For the Brittany/Ile Teviec site, let's "keep" the smaller figure, which is 8% death-from-war rate (Bowles). As for Bogebakken, let's again go with the smaller figure, which is 12%. Let's not "discard" any other sites yet for the reasons I describe in the previous post.

Recall that the average death-from-war rate from Pinker's original list was 15%. The contribution this average by the double counts is only (13.6% + 12%)/21 = 1.21%. So what's mainly happening from discarding these double counts is a slight widening of the confidence intervals, and a slight lowering of the average. So what's the big deal?

The big deal is that Pinker's list might be skewed toward sites with high death-by-war rates. Okay. So give us a better sample and use proper statistical methods to estimate the posterior distribution of the casualty rates. The rest is hand waving.

"Zero information"?


...the point is that the list doesn't take into account other sites (and thus gives us really zero information about the prevelance [sic] of prehistoric violence)...
No, Pinker's list doesn't give us "zero information" about prevalence of prehistoric violence. In fact it is more and better than the information provided by Ferguson, which includes not one graph or table to summarize the evidence that Pinker's list is biased. Ferguson's argument is almost entirely verbal and citation-based. Time to chase down those edited volumes and check if they were cited properly!

Listen, I am open to the possibility that hunter-gatherers were not that violent, perhaps even "peaceful". My concern is with the science in Ferguson's paper, which is not very good. I'm trying to counteract any hype that could arise from this paper that over-reaches the weight of its evidence. The message of the paper should be "we might need a better sample", not "we can effectively ignore Pinker's list." This subject deserves better science, and that's why I'm asking archaeologists to assemble a more comprehensive assemblage of prehistoric remains, and do a proper statistical analysis of the casualty rates. The "assemblage" could be a damned literature review where the articles are properly vetted! Yes, we can do better than Pinker's list, but Pinker's list does better than Ferguson's narrative. Ferguson's narrative rightly raises questions. It provides little competing quantitative evidence.

What about Sarai Nahir Rai?


For instance, the SaraiNahar Rai site in India.is [sic] falsely claimed to exhibit a 30 percent violent death rate. Ferguson pretty convincingly shows that this is a gross distortion of the evidence, which shows 1 killing.
Okay. There were eleven well-preserved skeletons. Pinker's figure is based on the assumption that three of them were homicides. Suppose just one of them is homicide. So that doesn't matter? We should just discard all samples that have only one homicide? Ferguson discards this sample for this reason, and I've already argued that this reason is silly, not to mention guaranteed to underestimated the homicide rate. As for the new homicide rate for this site accounting for the supposed mistake in homicide classification, we go from almost 30% to about 10%. The contribution to the average rate is what, a little more than 1%? Again, so what? The qualitative result remains the same, but there is marginally less certainty in the comparison.

The fault isn't all Ferguson's. I finally got a hold of Better Angels. Indeed, Pinker calls these "war deaths". How the $#*& do we think we can reliably classify a homicide that occurred thousands of years ago and that is part of a small sample as a war death? Both sides are being silly on this matter. But recall that Pinker's argument isn't just about warfare. It's about all violence.


Your statistical point is a rather different point than Ferguson's, who is only trying to show how a small selection of evidence, a sample, has been manipulated - or, I would imagine, simply transferred from one secondary source to the other, as I can't imagine Pinker spent any time actually reading the literature about these sites, taking their descriptions on faith from his sources.
My statistical point is the point that matters. We are trying to estimate the homicide rate throughout human existence. We need to do that using a proper statistical model. And if you want to argue that someone's sample is possibly biased, great. But manipulated? That's a sharp accusation. In academia, accusations like that are investigated thoroughly. Know how they are investigated? By using proper statistical methods! No, I think Ferguson is just saying that Pinker is so convinced he's right that he and others have subconsciously cherry-picked the data. This is way different from cooking data to predetermine the result. Even so, if you want to argue that a sample is biased, the best way is to collect a better sample and compare the results.
 
 
Yesterday on my bus commute home, I had a great conversation with anthropologist Jason Anstrosio of Living Anthropologically and Andrew Badenoch of Evolvify. (I apologize to Badenoch for not biking home, or to the southern coast of the Arctic Ocean, for that matter). We talked about the long-running debate about the origins of human violence, popularized most recently by Steven Pinker's book The Better Angels of Our Nature, Jared Diamond's The World until Yesterday, Napoleon Chagnon's Noble Savages, and a conversation among a bunch of old men hosted by John Brockman's Edge.

To summarize the debate, there are effectively two sides. One side is convinced that human violence has decreased relative to population size throughout our existence. The supposed downward trend began in the days when we were all hunter-gatherers and continues into the present, when almost no one is a hunter-gatherer. In other words, this side thinks Hobbes was right about the nastiness, brutality, and brevity of human life before our self-domestication. Another side believes one of two things. Either we don't know how much violence there was among ancient hunter-gatherers, or there was less violence before we domesticated plants and animals, and became what a 19th century gentleman scholar might have called "civilized". Here's what I hope to convince you: Both sides are too confident in their ability to discern warfare from other causes of homicide in the archaeological record. That's very important because the debate about the nature of human violence often digresses into a debate about warfare, which is only one manifestation of violence. What about homicide? Suicide? Assault and battery?

Back to my conversation with Anstrosio and Badenoch. Anstrosio recently reviewed a new book called War, Peace, & Human Nature. It's not about war, peace, and human nature so much as it is about how much war and peace factor into human nature, and if indeed there even is a singular "human nature". Unfortunately, the book costs a ludicrous $85. Also unfortunate is that it's checked out of my university library until the 7th of July. Thankfully, Anstrosio put up PDFs of two the book's chapters, both written by Brian Ferguson, a critic of the Hobbesian point of view. I'll focus on one of these articles, called "Pinker's list: exaggerating prehistoric war mortality," the title of which is pretty self explanatory. Okay, maybe not.

So what is Pinker's list? In Better Angels, Steven Pinker put together two of the largest archaeological datasets on prehistoric homicide. The data spans multiple temporal, geographic, and cultural settings. One of the two datasets comes from Lawrence Keeley's War Before Civilization. Sam Bowles assembled the other for an article in Science magazine. From this data, Pinker calculated that the average death-from-war rate is 15%. (Ferguson didn't report the confidence intervals and I don't know if Pinker did, either, because all three copies of Better Angels are checked out of my university's library). Ferguson's goal in this chapter is show that, of the 21 cases in Pinker's list, six can be thrown out, and the rest are biased samples.

The strength of the book chapter is Ferguson's summary of the data sources for each of the 21 cases. He also makes a good verbal case that the data might be biased. But the chapter offers neither graph, table, nor parameter estimate to show how biased it might be, nor how certain we are in that bias. Ferguson also uses some weird logic to argue that some of the 21 cases should be thrown out. For example:


So let us look over Pinker's list. Of the original 21, Gobero, Nigher is out because it has no war deaths.
First, um, okay, go ahead and throw out data that will potentially support your argument that hunter gatherer "war" deaths were lower than what Pinker calculated. Second, no, don't do that! When doing this sort of cross-study analysis, you are not allowed to throw out data just because the sample size is small. Instead, you use a statistical method that allows data sets with small sample sizes to borrow information from the variation across all the data sets. In this case, I'm pretty sure that borrowing information from the rest of the data would result in a non-zero death rate for the case that Ferguson cites.


Three cases...are all eliminated because they only have one instance of violent death.
Here, Ferguson neglects to mention that Pinker's book's premise is that all forms of violence, including but not limited to war, have decreased over time. You can't fault Ferguson for focusing on war so much that he throws out data that shows evidence of homicide. After all, the book has "war" in its title. From the first chapter, the editor argues that the central question is about how and when war became a part of human existence. The book is also clearly a response to people like Pinker. But a more basic question is, "What is the trend for any form of violence throughout human existence, and how much uncertainty do we have in the overall shape and scale of that trend?" Throwing out cases that are fairly good evidence of homicide is guaranteed to bias your sample with regard to this much better research question.
 
 
Last year, when I was living in the Commonwealth of Dominica, and experiencing some severe culture shock, I emailed independent scientist and theoretical physicist Julian Barbour.
       


Dear Dr. Barbour:

You envision a universe in which time is an illusion. I am not sure at what level you are familiar with modern evolutionary theory. But you are likely aware that the most basic definition of evolution is a change in allele frequency over -time- via selection, drift, gene flow, and mutation. So I have three questions for you:

(1) What would an evolutionary theory look like without time?
(2) Or do you believe that time is a useful tool that helps us understand certain things such as natural selection and genetic drift, even though time doesn't exist?
(3) Do you think it possible that the perception of time may be under selection? 
(4) If so, what adaptive benefit do you think a linear perception of time has?
(5) Can you envision an organism that need not perceive time as linear?

Thank you for your time.
Get it? "Thank you for your time"? I just thanked a man who argues that time doesn't exist for his time. How cool is that?

Anyway, Julian Barbour never answered me, probably because he is very busy doing Russian translations, answering emails from more important people, and avoiding people like me who might sound like quacks and say they have three questions when they actually have five. Okay, I might actually be a quack.

But dude! Those questions are deep! I mean, seriously, if time doesn't actually exist, what is the underlying biological explanation for why we experience one damn thing happening after another instead of experiencing time backwards like Merlin? That is, why do we sense the increase of entropy instead of its decrease?

Let me take a few steps backward. Okay. So there's this thing that physicists call the "arrow of time". The arrow of time refers to the fact that there is only one quantity in the physical sciences...seriously, only one...that requires time to have a direction. That quantity is entropy. 

Entropy is a measure of "disorder". Think of a bag of potato chips. That got your attention. Imagine walking around with that bag of potato chips in your backpack, and occasionally reaching in since you can't eat just one. At the beginning of the day, the potato chips are very orderly in space, with this well-defined, potato chip shape, separated in the bag by equally orderly pockets of air. By the end of the day, however, the potato chips have been ground into a chaotic crapstorm of salty, greasy crumbs, evenly distributed at the bottom of the bag below a not-very-interesting pocket of air. 

That's kind of like how the universe works. In the beginning, everything was orderly and hot as a mother fucker. Billions of years down the road, all matter will be evenly distributed about the universe, uninteresting and cold until...well, I don't know, I'm not a physicist.

This whole analogy requires an arrow of time from order to chaos, from hot to cold. Or does it? I haven't studied up on his hypothesis much, and my brother is the astrophysicist, but I think that's the question Julian Barbour's asking. And since he asked it, I think that in the back of their minds, all evolutionary biologists should also ask it. After all! Biology is just applied chemistry. And chemistry? Just applied physics.

Although really, if time is just an illusion and we've been making such a big stink about it only because we're organisms who happen to experience entropy in this funky, linear way, then maybe, at least in this case, physics is just applied evolutionary biology!