Epistemic Spot Check: Fatigue and the Central Governor Module

Epistemic spot checks used to be a series in which I read papers/books and investigated their claims with an eye towards assessing the work’s credibility. I became unhappy with the limitations of this process and am working on creating something better. This post about both the results of applying the in-development process to a particular work, and observations on the process. As is my new custom, this discussion of the paper will be mostly my conclusions. The actual research is available in my Roam database (a workflowy/wiki hybrid), which I will link to as appropriate.

This post started off as an epistemic spot check of Fatigue is a brain-derived emotion that regulates the exercise behavior to ensure the protection of whole body homeostasis, a scientific article by Timothy David Noakes. I don’t trust myself to summarize it fairly (we’ll get to that in a minute), so here is the abstract:

An influential book written by A. Mosso in the late nineteenth century proposed that fatigue that “at first sight might appear an imperfection of our body, is on the contrary one of its most marvelous perfections. The fatigue increasing more rapidly than the amount of work done saves us from the injury which lesser sensibility would involve for the organism” so that “muscular fatigue also is at bottom an exhaustion of the nervous system.” It has taken more than a century to confirm Mosso’s idea that both the brain and the muscles alter their function during exercise and that fatigue is predominantly an emotion, part of a complex regulation, the goal of which is to protect the body from harm. Mosso’s ideas were supplanted in the English literature by those of A. V. Hill who believed that fatigue was the result of biochemical changes in the exercising limb muscles – “peripheral fatigue” – to which the central nervous system makes no contribution. The past decade has witnessed the growing realization that this brainless model cannot explain exercise performance.This article traces the evolution of our modern understanding of how the CNS regulates exercise specifically to insure that each exercise bout terminates whilst homeostasis is retained in all bodily systems. The brain uses the symptoms of fatigue as key regulators to insure that the exercise is completed before harm develops.These sensations of fatigue are unique to each individual and are illusionary since their generation is largely independent of the real biological state of the athlete at the time they develop.The model predicts that attempts to understand fatigue and to explain superior human athletic performance purely on the basis of the body’s known physiological and metabolic responses to exercise must fail since subconscious and conscious mental decisions made by winners and losers, in both training and competition, are the ultimate determinants of both fatigue and athletic performance

The easily defensible version of this claim is that fatigue is a feeling in the brain. The most out there version of the claim is that humans are capable of unlimited physical feats, held back only by their own mind, and the results of sporting events are determined beforehand through psychic dominance competitions. That sounds like I’m being unfair, so let me quote the relevant portion

[A]thletes who finish behind the winner may make the conscious decision not to win, perhaps even before the race begins. Their deceptive symptoms of “fatigue” may then be used to justify that decision. So the winner is the athlete for whom defeat is the least acceptable rationalization

(He doesn’t mention psychic dominance competitions explicitly, but it’s the only way I see to get exactly one person deciding to win each race).

This paper generated a lot of ESC-able claims, which you can see here. These were unusually crisp claims that he provided citations for: absolutely the easiest thing to ESC (having your own citations agree with your summary of them is not sufficient to prove correctness, but lack of it takes a lot works out). But I found myself unenthused about doing so. I eventually realized that I wanted to read a competing explanation instead. Luckily Noakes provided a citation to one, and it was even more antagonistic to him than he claimed.

VO2,max: what do we know, and what do we still need to know?, by Benjamin D. Levine takes several direct shots at Noakes, including:

For the purposes of framing the debate, Dr Noakes frequently likes to place investigators into two camps: those who believe the brain plays a role in exercise performance, and those who do not (Noakes et al. 2004b). However this straw man is specious. No one disputes that ‘the brain’ is required to recruit motor units – for example, spinal cord-injured patients can’t run. There is no doubt that motivation is necessary to achieve VO2,max. A subject can elect to simply stop exercising on the treadmill while walking slowly because they don’t want to continue; no mystical ‘central governor’ is required to hypothesize or predict a VO2 below maximal achievable oxygen transport in this case.

Which I would summarize as “of course fatigue is a brain-mediated feeling: you feel it.” 

I stopped reading at this point, because I could no longer tell what the difference between the hypotheses was. What are the actual differences in predictions between “your muscles are physically unable to contract?” and “your brain tells you your muscles are unable to contract”? After thinking about it for a while, I came up with a few:

  1. The former suggests that there’s no intermediate between “safely working” and “incapacitation”.
  2. The latter suggests that you can get physical gains through mental changes alone.
  3. And that this might lead to tissue damage as you push yourself beyond safe limits.

Without looking at any evidence, #1 seems unlikely to be true. Things rarely work that way in general, much less in bodies.

The strongest pieces of evidence for #2 and #3 isn’t addressed by either paper: cases when mental changes have caused/allowed people to inflict serious injuries or even death to themselves.

  1. Hysterical strength (aka mom lifts car off baby)
  2. Involuntary muscle spasms (from e.g., seizures or old-school ECT)
  3. Stiff-man syndrome.

So I checked these out.

Hysterical strength has not been studied much, probably because IRBs are touchy about trapping babies under cars (with an option on “I was unable to find the medical term for it). There are enough anecdotes that it seems likely to exist, although it may not be common. And it can cause muscle tears, according to several sourceless citations. This is suggestive, but if I was on Levine’s team I’d definitely find it insufficient.

Most injuries from seizures are from falling or hitting something, but it appears possible for injuries to result from overactive muscles themselves. This is complicated by the fact that anti-convulsant medications can cause bone thinning, and by the fact that some unknown percentage of all people are walking around with fractures they don’t know about.

Unmodified electro-convulsive therapy had a small but persistent risk of bone fractures, muscle tears, and join dislocation. Newer forms of ECT use muscle relaxants specifically to prevent this.

Stiff-man Syndrome: Wikipedia says that 10% of stiff-man syndrome patients die from acidosis or autonomic dysfunction. Acidosis would be really exciting- evidence that overexertion of muscles will actually kill you. Unfortunately when I tried to track down the citation, it went nowhere (with one paper inaccessible). Additionally, one can come up with other explanations for the acidosis than muscle exertion. So that’s not compelling.

Overall it does seem clear that (some) people’s muscles are strong enough to break their bones, but are stopped from doing so under normal circumstances. You could call this vindication for Noake’s Central Governor Model, but I’m hesitant. It doesn’t prove you can safely get gains by changing your mindset alone.  It doesn’t prove all races are determined by psychic dominance fights. Yes, Noakes was speculating when he postulated that, but without it his theory is something like “you notice when your muscles reach their limits”. When you can safely push what feel like physical limits on the margin feels like a question that will vary a lot by individual and that neither paper tried to answer.

Overall, Fatigue is a brain-derived emotion that regulates the exercise behavior to ensure the protection of whole body homeostasis neither passed nor failed epistemic spot checks as originally conceived, because I didn’t check its specific claims. Instead I thought through its implications and investigated those, which supported the weak but not strong form of Noake’s argument.

In terms of process, the key here was feeling and recognizing the feeling that investigating forward (evaluating the implications of Noake’s arguments) was more important than investigating backwards (the evidence Noake provided for his hypothesis). I don’t have a good explanation for why that felt right at this time, but I want to track it.

Epistemic Spot Check: Unconditional Parenting

Epistemic spot checks started as a process in which I investigate a few of a book’s claims to see if it is trustworthy before continuing to read it. This had a number of problems, such as emphasizing a trust/don’t trust binary over model building, and emphasizing provability over importance. I’m in the middle of revamping ESCs to become something better. This post is both a ~ESC of a particular book and a reflection on the process of doing ESCs and what I have and should improve(d).

As is my new custom, I took my notes in Roam, a workflowy/wiki hybrid. Roam is so magic that my raw notes are better formatted there than I could ever hope to make them in a linear document like this, so I’m just going to share my conclusions here, and if you’re interested in the process, follow the links to Roam. Notes are formatted as follows:

  • The target source gets its own page
  • On this page I list some details about the book and claims it makes. If the claim is citing another source, I may include a link to the source.
  • If I investigate a claim or have an opinion so strong it doesn’t seem worth verifying (“Parenting is hard”), I’ll mark it with a credence slider. The meaning of each credence will eventually be explained here, although I’m still working out the system.
    • Then I’ll hand-type a number for the credence in a bullet point, because sliders are changeable even by people who otherwise have only read privileges.
  • You can see my notes on the source for a claim by clicking on the source in the claim
  • You may see a number to the side of a claim. That means it’s been cited by another page. It is likely a synthesis page, where I have drawn a conclusion from a variety of sources.

This post’s topic is Unconditional Parenting (Alfie Kohn) (affiliate link), which has the thesis that even positive reinforcement is treating your kid like a dog and hinders their emotional and moral development.

Unconditional Parenting failed its spot check pretty hard. Of three citations I actually researched (as opposed to agreed with without investigation, such as “Parenting is hard”), two barely mentioned the thing they were cited for as an evidence-free aside, and one reported exactly what UP claimed but was too small and subdivided to prove anything. 

Nonetheless, I thought UP might have good ideas kept reading it. One of the things Epistemic Spot Checks were designed to detect was “science washing”- the process of taking the thing you already believe and hunting for things to cite that could plausibly support it to make your process look more rigorous. And they do pretty well at that. The problem is that science washing doesn’t prove an idea is wrong, merely that it hasn’t presented a particular form of proof. It could still be true or useful- in fact when I dug into a series of self-help books, rigor didn’t seem to have any correlation with how useful they were. And with something like child-rearing, where I dismiss almost all studies as “too small, too limited”, saying everything needs rigorous peer-reviewed backing is the same as refusing to learn. So I continued with Unconditional Parenting to absorb its models, with the understanding that I would be evaluating its models for myself.

Unconditional Parenting is a principle based book, and its principles are:

  • It is not enough for you to love your children; they must feel loved unconditionally. 
  • Any punishment or conditionality of rewards endangers that feeling of being loved unconditionally.
  • Children should be respected as autonomous beings.
  • Obedience is often a sign of insecurity.
  • The way kids learn to make good decisions is by making decisions, not by following directions.

These seem like plausible principles to me, especially the first and last ones. They are, however, costly principles to implement. And I’m not even talking about things where you absolutely have to override their autonomy like vaccines. I’m talking about when your two children’s autonomies lead them in opposite directions at the beach, or you will lose your job if you don’t keep them on a certain schedule in the morning and their intrinsic desire is to watch the water drip from the faucet for 10 minutes. 

What I would really have liked is for this book to spend less time on its principles and bullshit scientific citations, and more time going through concrete real world examples where multiple principles are competing. Kohn explicitly declines to do this, saying specifics are too hard and scripts embody the rigid, unresponsive parenting he’s railing against, but I think that’s a cop out. Teaching principles in isolation is easy and pointless: the meaningful part is what you do when they’re difficult and in conflict with other things you value.

So overall, Unconditional Parenting:

  • Should be evaluated as one dude’s opinion, not the outcome of a scientific process
  • Is a useful set of opinions that I find plausible and intend to apply with modifications to my potential kids.
  • Failed to do the hard work of demonstrating implementation of its principles.
  • Is a very light read once you ignore all the science-washing.

 

 

As always, tremendous thanks to my Patreon patrons for their support.

 

What Comes After Epistemic Spot Checks?

When I read a non-fiction book, I want to know if it’s correct before I commit anything it says to memory. But if I already knew the truth status of all of its claims, I wouldn’t need to read it. Epistemic Spot Checks are an attempt to square that circle by sampling a book’s claims and determining their truth value, with the assumption that the sample is representative of the whole.

Some claims are easier to check than others. On one end are simple facts, e.g., “Emperor Valerian spent his final years as a Persian prisoner”. This was easy and quick to verify: googling “emperor valerian” was quite sufficient. “Roman ship sizes weren’t exceeded until the 15th century” looks similar, but it wasn’t. If you google the claim itself, it will look confirmed (evidence: me and 3 other people in the forecasting tournament did this). At the last second while researching this, I decided to check the size of Chinese ships, which surpassed Roman ships sooner than Europe did, by about a century.

On first blush this looks like a success story, but it’s not. I was only able to catch the mistake because I had a bunch of background knowledge about the state of the world. If I didn’t already know mid-millenium China was better than Europe at almost everything (and I remember a time when I didn’t), I could easily have drawn the wrong conclusion about that claim. And following a procedure that would catch issues like this every time would take much more time than ESCs currently get.

Then there’s terminally vague questions, like “Did early modern Europe have more emphasis on rationality and less superstition than other parts of the world?” (As claimed by The Unbound Prometheus). It would be optimistic to say that question requires several books to answer, but even if that were true, each of them would need at least an ESC themselves to see if they’re trustworthy, which might involve checking other claims requiring several books to verify… pretty soon it’s a master’s thesis.

But I can’t get a master’s degree in everything interesting or relevant to my life. And that brings up another point: credentialism. A lot of ESC revolves around “Have people who have officially been Deemed Credible sanctioned this fact?” rather than “Have I seen evidence that I, personally, judge to be compelling?” 

The Fate of Rome (Kyle Harper) and The Fall of Rome (Bryan Ward-Perkins) are both about the collapse of the western Roman empire. They both did almost flawlessly on their respective epistemic spot checks. And yet, they attribute the fall of Rome to very different causes, and devote almost no time to the others’ explanation. If you drew a venn diagram of the data they discuss, the circles would be almost but not quite entirely distinct. The word “plague” appears 651 times in Fate and 6 times in Fall, who introduces the topic mostly to dismiss the idea that it was causally related to the fall- which is how Fate treats all those border adjustments happening from 300 AD on. Fate is very into discussing climate, but Fall uses that space to talk about pottery.

This is why I called the process epistemic spot checking, not truth-checking. Determining if a book is true requires not only determining if each individual claim is true, but what other explanations exist and what has been left out. Depending on the specifics, ESC as I do them now are perhaps half the work of reading the subsection of the book I verify. Really thoroughly checking a single paragraph in a published paper took me 10-20 hours. And even if I succeed at the ESC, all I have is a thumbs up/thumbs down on the book.

Around the same time I was doing ESCs on The Fall of Rome and The Fate of Rome (the ESCs were published far apart to get maximum publicity for the Amplification experiment, but I read and performed the ESCs very close together), I was commissioned to do a shallow review on the question of “How to get consistency in predictions or evaluations of questions?” I got excellent feedback from the person who commissioned it, but I felt like it said a lot more about the state of a field of literature than the state of the world, because I had to take authors’ words for their findings. It had all the problems ESCs were designed to prevent.

I’m in the early stages of trying to form something better: something that incorporates the depth of epistemic spot checks with the breadth of field reviews. It’s designed to bootstrap from knowing nothing in a field to being grounded and informed and having a firm base on which to evaluate new information. This is hard and I expect it to take a while. 

 

Epistemic Spot Checks: The Fall of Rome

Introduction

Epistemic spot checks are a series in which I select claims from the first few chapters of a book and investigate them for accuracy, to determine if a book is worth my time. This month’s subject is The Fall of Rome, by Bryan Ward-Perkins, which advocates for the view that Rome fell, and it was probably a military problem.

Like August’s The Fate of Rome, this spot check was done as part of a collaboration with Parallel Forecasting and Foretold, which means that instead of resolving a claim as true or false, I give a confidence distribution of what I think I would answer if I spent 10 hours on the question (in reality I spent 10-45 minutes per question). Sometimes the claim is a question with a numerical answer, sometimes it is just a statement and I state how likely I think the statement is to be true.

This spot check is subject to the same constraints as The Fate of Rome, including:

  1. Some of my answers include research from the forecasters, not just my own.
  2. Due to our procedure for choosing questions, I didn’t investigate all the claims I would have liked to.

Claims

Claim made by the text:  “[Emperor Valerian] spent the final years of his life as a captive at the Persian Court”
Question I answered: what is the chance that is true?
My answer: I estimate a chance of (99 – 3*lognormal(0,1)) that Emperor Valerian was captured by the Persians and spent multiple years as a prisoner before dying in captivity.

You don’t even have to click on the Wikipedia page to confirm this is the common story: it’s in the google preview for “emperor valerian”. So the only question is the chance that all of history got this wrong. Wikipedia lists five primary sources, of which I verified three.  https://www.ancient-origins.net/history/what-really-happened-valerian-was-roman-emperor-humiliated-and-skinned-hands-enemy-008598 raises questions about how badly Valerian was treated, but not that he was captive.

My only qualm is the chance that this could be a lie perpetuated at the time. Maybe Valerian died and the Persians used a double, maybe something weirder happened. System 2 says the chance of this is < 10% but gut says < 15%.

 

Claim made by the text: “What had totally disappeared, however, were the good-quality, low-value items, made in bulk, and available so widely in the Roman period”
Question I answered: What is the chance mass-produced, low-value items available so widely in the Roman period, disappear in Britain by 600 AD?
My answer: I estimate a chance of (64 to 93, normal distribution) that mass-produced, low-value items were available in Britain during Roman rule and not after 600 AD.

This was one of the hardest claims to investigate, because it represents original research by Ward-Perkins. I had basically given up on answering this without a pottery PhD until google suggestions gave me the perfect article.

This is actually a compound claim by Ward-Perkins: 

  1. Roman coinage and mass-produced, low-cost, high-quality pottery disappeared from Britain and then the rest of post-Roman Europe.
  2. The state of pottery and coinage is a good proxy for the state of goods and trades as a whole, because they preserve so amazingly well and are relatively easy to date.

Data points:

    • Focuses on how amphorae were never really abundant in Britain
    • Chart stops at 400 AD
    • Graph showing large drops in amphorae distribution by 410 AD

If we believe Ward-Perkins and Brewminate, I estimate the chances that pottery massively declined at 95-99,  times 80-95 that other good declined with them. There remains the chances that the historical record is massively misleading (very unlikely with pots, although I don’t know how likely it is to have missed sites entirely), and that W-P et al are misinterpreting the record. I would be very surprised if so many sites had been missed as to invalidate this data, call it 5-15%. Gut feeling, 5-20% chance the W-P crowd are exaggerating the data, but given the absence of challenges, not higher than that and not a significant chance they’re just making shit up.

(95 to 99)*(85 to 95) * (80 to 95) = 64 to 93%

 

Claim made by the text: The Romans had mass literacy, which declined during the Dark Ages.
Question I answered: “[% population able to read at American 1st grade level during Imperial Rome] – [% population able to do same in the same geographic area in 1000 AD] = N%. What is N?”
My answer: I estimate that there is a 95% chance [Roman literacy] – [Dark Ages literacy] = (0 to 60, normal distribution) 

Data Points:

 

The highest estimate of literacy in Roman Empire I found is 30%.  Call it twice that for ability to read at a 1st grade level in cities. So the range is 5%-60%. 

The absolute lowest the European 1000AD literacy rate could be is 0; the highest estimate is 5% (and that was in the 1300s, which were probably more literate).  From the absence of graffiti I infer that even minimal literacy achievement dropped a great deal. 

Maximum = 60%-1% = 59%
Minimum = 5%-5%=0

 

Claim made by the text: “What some people describe as “the invasion of Rome by Germanic barbarians”, Walter Goffart describes as “the Romans incorporating the Germanic tribes into their citizenry and setting them up as rulers who reported to the empire.” and “Rome did fall, but only because it had voluntarily delegated its own power, not because it had been successfully invaded”.”
Question I answered: What is my confidence that this accurately represents historian Walter Goffart’s views?
My answer: I estimate that after 10 hours of research, I would be 68-92% confident this describes Goffart’s views accurately.

Data points:

  • https://blog.oup.com/2005/12/the_fall_of_rom/
    • Peter Heather: The most influential statement of this, perhaps, is Walter Goffart’s brilliant aphorism that the fall of the Western Empire was just ‘an imaginative experiment that got a little out of hand’. Goffart means that changes in Roman policy towards the barbarians led to the emergence of the successor states, dependant on barbarian military power and incorporating Roman institutions, and that the process which brought this out was not a particularly violent one.
  • https://www.goodreads.com/book/show/1680215.Barbarians_and_Romans_A_D_418_584?from_search=true 
    • Despite intermittent turbulence and destruction, much of the Roman West came under barbarian control in an orderly fashion.”
  • https://press.princeton.edu/titles/1036.html
    • Despite intermittent turbulence and destruction, much of the Roman West came under barbarian control in an orderly fashion. Goths, Burgundians, and other aliens were accommodated within the provinces without disrupting the settled population or overturning the patterns of landownership. Walter Goffart examines these arrangements and shows that they were based on the procedures of Roman taxation, rather than on those of military billeting (the so-called hospitalitas system), as has long been thought. Resident proprietors could be left in undisturbed possession of their lands because the proceeds of taxation,rather than land itself, were awarded to the barbarian troops and their leaders.”
  • https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1478-0542.2008.00523.x
    • “the barbarians and Rome, instead of being in constant conflict with each other, occupied a joined space, a single world in which both were entitled to share. What we call the barbarian invasions was primarily a drawing of foreigners into Roman service, a process sponsored, encouraged, and rewarded by Rome. Simultaneously, the Romans energetically upheld their supremacy. Many barbarian peoples were suppressed and vanished; the survivors were persuaded and learned to shoulder Roman tasks. Rome was never discredited or repudiated. The future endorsed and carried forward what the Empire stood for in religion, law, administration, literacy, and language.”
  • https://books.google.com/books/about/Rome_s_Fall_and_After.html?id=55pDIwvWnpoC “Rome’s Fall and After” indicates Goffat does believe Rome fell. But suggests its main problem was constantinople, not interactions with barbarians at all. So top percentage correct = 90%)

 

This seems pretty conclusive that Goffart thought Barbarians were accommodated rather than conquered the area (so my minimum estimate that the summary was correct must be greater than 50%). However it’s not clear how much power he thought they took, or whether rome fell at all. This could be a poor restatement, or it could be that if I read Goffart’s actual work and not just book jacket blurbs I’d agree.

 

Question I answered: Chance Elizabeth would recommend this book as a reliable source on the topic to an interested friend, if they asked tomorrow (8/31/19)?
My answer: There is a (91-99%, normal distribution) chance I would recommend this to a friend.

99% is in range, because I definitely think it’s worth reading if they’re interested in the topic. I think I’d recommend it before Fate of Rome, because it establishes that rome fell more concretely.

Is there a chance I wouldn’t recommend it?

  • They could have already read it
  • They could be more interested in disease and climate change (in which case I’d recommend Fate)
  • I could forget about it
  • I could not want to take responsibility for their reading.
  • I could be unconfident that Fall was better than what they’d find by chance.
    • This feels like the biggest one.
    • But the question doesn’t say “best book”, it just says “reliable source”
    • Only real qualm on that is that is normal history book qualms

So the minimum is 91%

 

Bonus Claims

These are the claims I didn’t check, but other people made predictions on how I would guess. Note that at this point the predictions haven’t been very accurate- whether they’re net positive depends on how you weight the questions. And Foretold is beta software that hasn’t prioritized export yet, so I’m using *shudder* screen shots. But for the sake of completeness:

Claim made by the text: The Fall of Rome: Roman Pottery pre-400AD was high quality and uniform.
Predicted answer: 29.9% to 63.5% chance this claim is correct

Claim made by the text: “In Britain new coins ceased to reach the island, except in tiny quantities, at the beginning of the fifth century”
Predicted answer: 31.6% to 94% chance this claim is correct

 

Claim made by the text: The Fall of Rome: [average German soldiers’ height] – [average Roman soldiers’ height] = N feet. What is N? .
Predicted answer: -0.107 to 0.61 ft.

 

Claim made by the text: The Romans chose to cede local control of Gaul to the Germanic tribes in the 400s, as opposed to losing them in a military conquest.
Predicted answer: 28.5% to 85.6% chance this claim is correct

 

Claim made by the text: The Germanic tribes who took over local control of Gaul in the 400s reported to the Emperor.
Predicted answer: 4.77% to 50.9% chance this claim is correct

 

Conclusion

The Fall of Rome did very well on spot-checking- no outright disagreements at all, just some uncertainties. 

On the other hand, The Fall of Rome barely mentions disease and doesn’t mention climate change at all, which my previous book, The Fate of Rome, claimed to be the main causes of the fall. The Fate of Rome did almost as well in epistemic spot checking as Fall, yet they can’t both be correct. What’s going on? I’m going to address that in a separate post, because I want to be able to link to it without forcing people to read this entire spot check.

In terms of readability, Fall starts slowly but the second half is by far the most interested I have ever been in pottery or archeology.

[Many thanks to my Patreon patrons and Parallel Forecast for financial support for this post]

Does combining epistemic spot checks and prediction markets sound super fun to you? Good news: We’re launching round three of the experiment today, with prizes of up to $65/question. The focal book will be The Unbound Prometheus, by David S. Landes, on the Industrial Revolution. The market opens today and will remain open until 10/27 (inclusive).

 

Epistemic Spot Check: The Fate of Rome (Kyle Harper)

Introduction

Epistemic spot checks are a series in which I select claims from the first few chapters of a book and investigate them for accuracy, to determine if a book is worth my time. This month’s subject is The Fate of Rome, by Kyle Harper, which advocates for the view that Rome was done in by climate change and infectious diseases (which were exacerbated by climate change).

This check is a little different than the others, because it arose from a collaboration with some folks in the forecasting space. Instead of just reading and evaluating claims myself, I took claims from the book and made them into questions on a prediction market, for which several people made predictions of what my answer would be before I gave it. In some but not all cases I read their justifications (although not numeric estimates) before making my final judgement.

I expect we’ll publish a post-mortem on that entire process at some point, but for now I just want to publish the actual spot check. Because of the forecasting crossover, this spot check will differ from those that came before in the following ways:

  1. Claims are formatted as questions answerable with a probability. If a claim lacks a question mark, the implicit question is “what is the probability this is true?”.
  2. Questions have a range of specificity, to allow us to test what kind of ambiguities we can get away with (answer: less than I used).
  3. Some of my answers include research from the forecasters, not just my own.
  4. Due to timing issues, I finished the book and a second on the topic before I did the research for spot check.
  5. Due to our procedure for choosing questions, I didn’t investigate all the claims I would have liked to.

 

Claims

Original Claim: “Very little of Roman wealth was due to new technological discoveries, as opposed to diffusion of existing tech to new places, capital accumulation, and trade.”
Question: What percentage of Rome’s gains came from technological gains, as opposed to diffusion of technical advantages, capital accumulation, and trade?

1%-30% log distribution

Data:

  • The Fall of Rome talks extensively about how trade degraded when the Romans left and how that lowered the standard of living.
  • https://brilliantmaps.com/roman-empire-gdp/ shows huge differences in GDP by region, implying there was a big opportunity to grow GDP through trade and diffusion of existing tech. That means potential growth just from catch up growth was > 50%.
  • Wikipedia doesn’t even show growth in GDP per capita (with extremely wide error bars) from 14AD to 150AD.
  • Rome did have construction and military tech (https://en.wikipedia.org/wiki/Roman_technology)
  • It also seems likely that expansion created a kind of Dutch disease, in which capable, ambitious people were drawn to fighting and/or politics, and not discovering new tech.
  • One potential place where Roman technology could have contributed greatly to the economy was lowering disease via sanitation infrastructure. According to Fate of Rome and my own research, this didn’t happen; sanitation was not end to end and therefor you had all the problems inherent in city living.

Original Claim: “The blunt force of infectious disease was, by far, the overwhelming determinant of a mortality regime that weighed heavily on Roman demography”
Question: Even during the Republic and successful periods of the empire, disease burden was very high in cities.

60%-90% normal distribution

The wide spread and lack of inclusion of 100% in the confidence interval stem from the lack of precision in the question. What distinguishes “high” from “very high”, and are we counting diseases of malnutrition or just infectious ones? I expected to knock this one out in two minutes, but ended up feeling the current estimates of disease mortality lack the necessary precision to answer it.

Data:

 

Original Claim: “The main source of population growth in the Roman Empire was not a decline in mortality but, rather, elevated levels of fertility”
Question: When Imperial Rome’s population was growing, it was due to a decline in death rates, rather than elevated fertility.

80-100%, c – log distribution

“Elizabeth, that rephrase doesn’t look much like that original claim” you might be saying quietly to yourself. You are correct- I misread the claim in the book, at least twice, and didn’t catch it until this write-up. This isn’t as bad as it seems. The claims are not quite opposite, because my rephrase was trying to explain variation in growth within Rome, and the book was trying to explain absolute levels, or possibly the difference relative to today.

Back when he was doing biology, Richard Dawkins had a great answer to the common question “how much is X due to genetics, as opposed to environment?”. He said asking that is like asking how much of a rectangle’s area is due to its length, as opposed to its width. It’s a nonsensical question. But you assign proportionate responsibility for the change in area between two rectangles.

Fate‘s original claim was much like asking how much of a trait is due to genetics. This is bad and it should feel bad, but it’s a very common mistake, and I give Fate a lot of credit for providing the underlying facts such that I could translate it into the “what causes differences between things” question without even noticing.

Since weak framing wasn’t a systemic problem in the book and it presented the underlying facts well enough for me to form my own, correct, model, I’m not docking Fate very harshly on this one.

Original Claim: “The size of Roman merchant ships was not exceeded until the 15th century, and the grain ships were not surpassed until the 19th.”
Question: “The size of Roman merchant ships was not exceeded until the 15th century, and the grain ships were not surpassed until the 19th.”

0-10% log distribution.

This is true within the Mediterranean, but if  you check Chinese ships it’s obvious it’s off by at least 100 years, possibly more.

Original Claim: too diffuse to quote.
Question: The Roman Empire suffered greatly from intense epidemics, more so than did the Republic or 700-1000 AD Europe.

90-100% c – log distribution

https://en.wikipedia.org/wiki/List_of_epidemics shows a pretty clear presence of epidemics in the relevant period and absence in the others.

 

Original Claim: too diffuse to quote.
Question: Starvation was not a big concern in Imperial Rome’s prime.

80-100% c – log distribution

https://en.wikipedia.org/wiki/List_of_famines shows Roman famine in 441 BC (the Republic) and isolated famines from 370 on, but pretty much validates that during the prime empire, mass starvation was not a threat.

Conclusion:

My fact checking found two flaws:

  1. An inaccuracy in when ships that exceeded the size of Roman trade ships were built, and/or forgetting China was a thing. The inaccuracy does not invalidate the author’s point, which is that the Romans had better shipping technology than the cultures that followed them.
  2. Bad but extremely common framing for the relative effects of disease mortality vs. birth rates.

These is well within tolerances for things a book might get wrong. I’m happy I read this book, and would read another by the same author (with perhaps more care when it refers to happenings outside of Europe), but they are not jumping to the of my list.

Is The Fate of Rome correct in its thesis that Rome was brought down by climate change and disease? I don’t know. It certainly seems plausible, but is clearly advocating for a position rather than trying to present all the relevant facts. There are obvious political implications to Fate even if it doesn’t spell them out, so I would want to read at least one of the 80 million other books on the Fall of Rome before I developed an opinion. I’m told some people think it had to do with something military, which Fate barely deigns to mention. In the future I hope to be a good enough prediction-maker to put a range on this anyways, however wide it must be, but for now I’m succumbing to the siren song of “but you could just get more data”.

[Many thanks to my Patreon patrons and Parallel Forecast for financial support for this post]

PS. This book is the first step of an ongoing experiment with epistemic spot checks and prediction markets. If you would like to participate in or support these experiments, please e-mail me at elizabeth-at-this-domain-name. The next round is planned to start Saturday August 24th.

Power Buys You Distance From The Crime

2077158526_98393523f5_o

Introduction

Taxes are typically meant to be proportional to money (or negative externalities, but that’s not what I’m focusing on). But one thing money buys you is flexibility, which can be used to avoid taxes. Because of this, taxes aimed at the wealthy tend to end up hitting the well-off-or-rich-but-not-truly-wealthy harder, and tax cuts aimed at the poor end up helping the middle class. Examples (feel free to stop reading these when you get the idea, this is just the analogy section of the essay):

  • Computer programmers typically have the option to work remotely in a low-tax state; teachers need to be where the classroom is. 
  • Estate taxes tend to hit families with single large assets (like a business) harder than those with diverse investments (who can simply sell assets to pay for taxes), who are hit harder than those with enough wealth to create trust funds.
  • Executives can choose to receive stock (which is taxed more favorably) instead of cash to the exact percentage they desire. Well paid employees are offered stock, but the amount will not be tailored to their needs. Lower level employees either are not offered this, or are not in a position to take advantage of it.
  • The legal distinction between a business (whose expenses are tax deductible) and a hobby (deductions not allowed) is based on whether the activity nets you income (there are complications and you can sometimes prove a money loser is a business, but this is a good rule of thumb). Small business owners (e.g. lawyers) can fold their occasionally-revenue-generating hobby (e.g. photography) into their real business, enabling tax deductions for their hobby.
  • IRAs, 401ks, HSAs, and FSAs all lock your money up for a time or purpose, in exchange for lower or delayed taxes. You can only take advantage of them if you’re sure you won’t need the money for another purpose sooner.
  • More examples here.

Note that most of these are perfectly legal and the rest are borderline. But we’re still not getting the result we want, of taxes being proportional to income.

When we assess moral blame for a situation, we typically want it to be roughly in proportion to much power a person has to change said situation. But just like money can be used to evade taxes, power can be used to avoid blame. This results in a distorted blame-distribution apparatus which assigns the least blame to the person most able to change the situation. Allow me a few examples to demonstrate this.

 

Examples 1 + 2: Corporate Malfeasance

Amazon.com provides a valuable service by letting any idiot sell a book, with minimal overhead. One of the costs of this complete lack of verification is that people will sell things that wouldn’t pass verification, such as counterfeits, at great cost to publishers and authors. Amazon could never sell counterfeits directly: they’re a large company that’s easy to sue. But by setting themselves up as a platform on which other people sell, they enable themselves to profit from counterfeits.

Or take slavery. No company goes “I’m going to go out and enslave people today” (especially not publicly), but not paying people is sometimes cheaper than paying them, so financial pressure will push towards slavery. Public pressure pushes in the opposite direction, so companies try not to visibly use slave labor. But they can’t control what their subcontractors do, and especially not what their subcontractors’ subcontractors’ subcontractors do, and sometimes this results in workers being unpaid and physically blocked from leaving.

Who’s at fault for the subcontractor(^3)’s slave labor? One obvious answer is “the person locking them in during the fire” or “the parent who gives their kid piecework”, and certainly it couldn’t happen without them. But if we say “Nike’s lack of knowledge makes them not responsible”, we give them an incentive to subcontract without asking follow up questions. The executive is probably benefiting more from the system of slave labor than the factory owner is from his little domain, and has more power to change what is happening. If the small factory owner pays fair wages, he gets outcompeted by a factory that does use slave labor. If the Nike CEO decides to insource their manufacturing to ensure fair working conditions, something actually changes.

…Unless consumers switch to a cheaper, slavery-driven shoe brand.

Which is actually really hard to not do. You could choose more expensive shoes, but the profit margin is still bigger if you shrink expenses, so that doesn’t help (which is why Fairtrade was a failure from the workers’ perspective). You can’t investigate the manufacturing conditions of everything you buy– it’s just too time consuming. But if you punish obvious enslavement and conduct no follow up studies, what you get is obscured enslavement, not decent working conditions.

 

Moral Mazes describes the general phenomenon on page 21:

Moreover, pushing down details relieves superiors of the burden of too much knowledge, particularly guilty knowledge. A superior will say to a subordinate, for instance: “Give me your best thinking on the problem with [X].” When the subordinate makes his report, he is often told: “I think you can do better than that,” until the subordinate has worked out all the details of the boss’s predetermined solution, without the boss being specifically aware of “all the eggs that have to be broken.” It is also not at all uncommon for very bald and extremely general edicts to emerge from on high. For example, “Sell the plant in [St. Louis]; let me know when you’ve struck a deal,” or “We need to get higher prices for [fabric X]; see what you can work out,” or “Tom, I want you to go down there and meet with those guys and make a deal and I don’t want you to come back until you’ve got one.” This pushing down of details has important consequences.

First, because they are unfamiliar with—indeed deliberately distance themselves from—entangling details, corporate higher echelons tend to expect successful results without messy complications. This is central to top executives’ well-known aversion to bad news and to the resulting tendency to kill the messenger who bears the news.

Second, the pushing down of details creates great pressure on middle managers not only to transmit good news but, precisely because they know the details, to act to protect their corporations, their bosses, and themselves in the process. They become the “point men” of a given strategy and the potential “fall guys” when things go wrong. From an organizational standpoint, overly conscientious managers are particularly useful at the middle levels of the structure. Upwardly mobile men and women, especially those from working-class origins who find themselves in higher status milieux, seem to have the requisite level of anxiety, and perhaps tightly controlled anger and hostility, that fuels an obsession with detail. Of course, such conscientiousness is not necessarily, and is certainly not systematically, rewarded; the real organizational premiums are placed on other, more flexible, behavior.

These examples differ in an important way from tax structuring: structuring requires seeking out advice and acting on it to achieve the goal. It’s highly agentic. The Wells Fargo and apparel-outsourcing cases required no such agency on the part of executives. They vaguely wished for something (more revenue, fewer expenses), and somehow it happened. An employee who tried to direct the executives’ attention to the fact that they were indirectly employing slaves would probably be fired before they ever reached the executives. Executives are not only outsourcing their dirty work, they’re outsourcing knowledge of their dirty work. 

[Details of personal anecdotes changed both intentionally and by the vagaries of human memory]

Example/Exception 2.5: Corporate Malfeasance Gone Wrong

The Wells Fargo account fraud scandal: in order to meet quotas, entry level Wells Fargo employees created millions of unauthorized accounts (typically extra services for existing customers). I originally included this as an example of “executives incentivizing entry level employees to commit fraud on their behalf”, but it turns out Wells Fargo made almost no money off the fraud- $2m over five years, which hardly seems worth the employees’ time, much less the $185m fine. I’ve left this in as an example of how the incentives-not-orders system doesn’t always work in powerful people’s favor.

Thanks to Larks for pointing this out.

Example 3: Foreign Medical Care

My cousin Angela broke her leg while traveling in Thailand, and was delighted by the level of care she received at the Thai hospital– not just medically, but socially. Nurses brought her flowers and were just generally nicer than their American counterparts. Her interpretation was that Thailand was a place motivated by love and kindness, not money, and Americans should aspire to this level of regard for their fellow human being. My interpretation was that she had enough money to buy the goodwill of everyone in the room without noticing, so what she should have learned is that being rich is awesome, and that being an American who travels internationally is enough to qualify you as rich.

This is mostly a success story for the free market: Angela got good medical care and the nurses got money (I’m assuming). Any crime in this story were committed off-screen. But Angela was certainly benefiting from the nurses’ restrained choices in life. And had she had actual power to affect healthcare in US, trying to fix it based on what she learned in Thailand would have done a lot of damage.

 

Example 4: My Dating an Artist Experience

My starving-artist ex-boyfriend, Connor, stayed with me for two months after a little bad luck and a lot of bad decisions cost him his job and then apartment (this was back when I had a two bedroom apartment to myself– I miss Seattle). During this time we had one big fight. My view on the fight now is that I was locally in the right but globally the disagreement was indicative of irreconcilable differences that should have led us to break up. That was delayed by months when he capitulated.

One possibility is that he genuinely thought he could change and that I was worth the attempt. Another is that he saw the incompatibility, or knew things that should have led him to see it, but lied or blocked out the knowledge so that he could keep living with me. This would be a shitty, manipulative thing for him to do. On the other hand, what did I expect? If the punishment for breaking up with me was, best case scenario, moving into a homeless shelter, of course he felt pressure to appease me. 

It wasn’t my fault he felt that pressure, any more than it was Angela’s fault her nurses were born with fewer options than her. Time in my spare bedroom was a gift to him I had no obligation to keep giving. But if I’d really valued a coercion free decision, I would have committed to housing him independent of our relationship. Although if that becomes common knowledge, it just means people can’t make an uncoerced decision to date me at all. And if helping Connor at all meant a commitment to do so forever, he would get a lot less help.

This case is more like the Wells Fargo case than Amazon or Nike. I was getting only the appearance of what I wanted (a genuine relationship with a compatible person), not the real thing. Nonetheless, the universe was contorting itself to give me the appearance of what I wanted.

Summary

What all of these stories have in common is that (relatively) powerful people’s desires were met by people less powerful than them, without them having to take responsibility for the action or sometimes even the desire. Society conspired to give them what they wanted (or in the case of Connor and Wells Fargo, a facsimile of what they wanted) without them having to articulate the want, even to themselves. That’s what power means: ability to make the game come out like you want. Disempowered people are forced to consciously notice things (e.g., this budget is unreachable) and make plans (e.g., slavery) where a powerful person wouldn’t. And it’s unfair to judge them for doing so while ignoring the morality of the powerful who never consider the system that brings them such nice things. 

Take home message:

  1. The most agentic person in a situation is not necessarily most morally culpable. One of the things power buys you is distance from the crime.
  2. Power obscures information flow. If you are not proactively looking to see how your wants and needs are being met, you are probably benefiting from something immoral or being tricked.

 

This piece was inspired by a conversation with and benefited from comments by Ben Hoffman. I’d also like to thank several commenters on Facebook for comments on an earlier draft and Justis Mills for copyediting.

Epistemic Spot Check: The Role of Deliberate Practice in the Acquisition of Expert Performance

Epistemic spot checks typically consist of references from a book, selected by my interest level, checked against either the book’s source or my own research. This one is a little different that I’m focusing on a single paragraph in a single paper. Specifically as part of a larger review I read Ericsson, Krampe, and Tesch-Römer’s 1993 paper, The Role of Deliberate Practice in the Acquisition of Expert Performance (PDF), in an attempt to gain information about how long human beings can productivity do thought work over a time period.

This paper is important because if you ask people how much thought work can be done in a day, if they have an answer and a citation at all, it will be “4 hours a day” and “Cal Newport’s Deep Work“. The Ericsson paper is in turn Newport’s source. So to the extent people’s beliefs are based on anything, they’re based on this paper.

In fact I’m not even reviewing the whole paper, just this one relevant paragraph: 

When individuals, especially children, start practicing in a given domain, the amount of practice is an hour or less per day (Bloom, 1985b). Similarly, laboratory studies of extended practice limit practice to about 1 hr for 3-5 days a week (e.g., Chase & Ericsson, 1982; Schneider & Shiffrin, 1977; Seibel, 1963). A number of training studies in real life have compared the efficiency of practice durations ranging from 1 -8 hr per day. These studies show essentially no benefit from durations exceeding 4 hr per day and reduced benefits from practice exceeding 2 hr (Welford, 1968; Woodworth & Schlosberg, 1954). Many studies of the acquisition of typing skill (Baddeley & Longman, 1978; Dvorak et al.. 1936) and other perceptual motor skills (Henshaw & Holman, 1930) indicate that the effective duration of deliberate practice may be closer to 1 hr per day. Pirolli and J. R. Anderson (1985) found no increased learning from doubling the number of training trials per session in their extended training study. The findings of these studies can be generalized to situations in which training is extended over long periods of time such as weeks, months, and years

Let’s go through each sentence in order. I’ve used each quote as a section header, with the citations underneath it in bold.

“When individuals, especially children, start practicing in a given domain, the amount of practice is an hour or less per day”

 Generalizations about talent development, Bloom (1985)

“Typically the initial lessons were given in swimming and piano for about an hour each week, while the mathematics was taught about four hours each week…In addition some learning tasks (or homework) were assigned to be practiced and perfected before the next lesson.” (p513)

“…[D]uring the week the [piano] teacher expected the child to practice about an hour a day.” with descriptions of practice but no quantification given for swimming and math (p515).

The quote seems to me to be a simplification. “Expected an hour a day” is not the same as “did practice an hour or less per day.”

“…laboratory studies of extended practice limit practice to about 1 hr for 3-5 days a week”

Skill and working memory, Chase & Ericsson (1982)

This study focused strictly on memorizing digits, which I don’t consider to be that close to thought work.

Controlled and automatic human information processing: I. Detection, search, and attention. Schneider, W., & Shiffrin, R. M. (1977)

This study had 8 people in it and was essentially an identification and reaction time trial.

Discrimination reaction time for a 1,023-alternative task, Seibel, R. (1963)

3 subjects. This was a reaction time test, not thought work. No mention of duration studying.

 

“These studies show essentially no benefit from durations exceeding 4 hr per day and reduced benefits from practice exceeding 2 hr”

Fundamentals of Skill, Welford (1968)

In a book with no page number given, I skipped this one.

Experimental Psychology, Woodworth & Schlosberg (1954)

This too is a book with no page number, but it was available online (thanks, archive.org) and I made an educated guess that the relevant chapter was “Economy in Learning and Performance”. Most of this chapter focused on recitation, which I don’t consider sufficiently relevant.

p800: “Almost any book on applied psychology will tell you that the hourly work output is higher in an eight-hour day than a ten-hour day.”(no source)

Offers this graph as demonstration that only monotonous work has diminishing returns.

Screen Shot 2019-05-16 at 9.08.22 PM.png

 

p812: An interesting army study showing that students given telegraphy training for 4 hours/day  (and spending 4 on other topics) learned as much as students studying 7 hours/day. This one seems genuinely relevant, although not enough to tell us where peak performance lies, just that four hours are better than seven. Additionally, the students weren’t loafing around for the excess three hours: they were learning other things. So this is about how long you can study a particular subject, not total learning capacity in a day.

Many studies of the acquisition of typing skill (Baddeley & Longman, 1978; Dvorak et al.. 1936) and other perceptual motor skills (Henshaw & Holman, 1930) indicate that the effective duration of deliberate practice may be closer to 1 hr per day

The Influence of Length and Frequency of Training Session on the Rate of Learning to Type, Baddeley & Longman (1978)

“Four groups of postmen were trained to type alpha-numeric code material using a conventional typewriter keyboard. Training was based on sessions lasting for one or two hours occurring once or twice per day. Learning was most efficient in the group given one session of one hour per day, and least efficient in the group trained for two 2-hour sessions. Retention was tested after one, three or nine months, and indicated a loss in speed of about 30%. Again the group trained for two daily sessions of two hours performed most poorly.It is suggested that where operationally feasible, keyboard training should be distributed over time rather than massed”

 

Typewriting behavior; psychology applied to teaching and learning typewriting, Dvorak et al (1936)

Inaccessible book.

The Role of Practice in Fact Retrieval, Pirolli & Anderson (1985)

“We found that fact retrieval speeds up as a power function of days of practice but that the number of daily repetitions beyond four produced little or no impact on reaction time”

Conclusion

Many of the studies were criminally small, and typically focused on singular, monotonous tasks like responding to patterns of light or memorizing digits.  The precision of these studies is greatly exaggerated. There’s no reason to believe Ericsson, Krampe, and Tesch-Römer’s conclusion that the correct number of hours for deliberate practice is 3.5, much less the commonly repeated factoid that humans can do good work for 4 hours/day.

 

[This post supported by Patreon].