Amazon would like me to inform you that sometimes I use affiliate links to them, where I earn trivial amounts of money. Humble Bundle hasn’t asked for this, but I do use affiliate links for them too sometimes.

What Comes After Epistemic Spot Checks?

When I read a non-fiction book, I want to know if it’s correct before I commit anything it says to memory. But if I already knew the truth status of all of its claims, I wouldn’t need to read it. Epistemic Spot Checks are an attempt to square that circle by sampling a book’s claims and determining their truth value, with the assumption that the sample is representative of the whole.

Some claims are easier to check than others. On one end are simple facts, e.g., “Emperor Valerian spent his final years as a Persian prisoner”. This was easy and quick to verify: googling “emperor valerian” was quite sufficient. “Roman ship sizes weren’t exceeded until the 15th century” looks similar, but it wasn’t. If you google the claim itself, it will look confirmed (evidence: me and 3 other people in the forecasting tournament did this). At the last second while researching this, I decided to check the size of Chinese ships, which surpassed Roman ships sooner than Europe did, by about a century.

On first blush this looks like a success story, but it’s not. I was only able to catch the mistake because I had a bunch of background knowledge about the state of the world. If I didn’t already know mid-millenium China was better than Europe at almost everything (and I remember a time when I didn’t), I could easily have drawn the wrong conclusion about that claim. And following a procedure that would catch issues like this every time would take much more time than ESCs currently get.

Then there’s terminally vague questions, like “Did early modern Europe have more emphasis on rationality and less superstition than other parts of the world?” (As claimed by The Unbound Prometheus). It would be optimistic to say that question requires several books to answer, but even if that were true, each of them would need at least an ESC themselves to see if they’re trustworthy, which might involve checking other claims requiring several books to verify… pretty soon it’s a master’s thesis.

But I can’t get a master’s degree in everything interesting or relevant to my life. And that brings up another point: credentialism. A lot of ESC revolves around “Have people who have officially been Deemed Credible sanctioned this fact?” rather than “Have I seen evidence that I, personally, judge to be compelling?” 

The Fate of Rome (Kyle Harper) and The Fall of Rome (Bryan Ward-Perkins) are both about the collapse of the western Roman empire. They both did almost flawlessly on their respective epistemic spot checks. And yet, they attribute the fall of Rome to very different causes, and devote almost no time to the others’ explanation. If you drew a venn diagram of the data they discuss, the circles would be almost but not quite entirely distinct. The word “plague” appears 651 times in Fate and 6 times in Fall, who introduces the topic mostly to dismiss the idea that it was causally related to the fall- which is how Fate treats all those border adjustments happening from 300 AD on. Fate is very into discussing climate, but Fall uses that space to talk about pottery.

This is why I called the process epistemic spot checking, not truth-checking. Determining if a book is true requires not only determining if each individual claim is true, but what other explanations exist and what has been left out. Depending on the specifics, ESC as I do them now are perhaps half the work of reading the subsection of the book I verify. Really thoroughly checking a single paragraph in a published paper took me 10-20 hours. And even if I succeed at the ESC, all I have is a thumbs up/thumbs down on the book.

Around the same time I was doing ESCs on The Fall of Rome and The Fate of Rome (the ESCs were published far apart to get maximum publicity for the Amplification experiment, but I read and performed the ESCs very close together), I was commissioned to do a shallow review on the question of “How to get consistency in predictions or evaluations of questions?” I got excellent feedback from the person who commissioned it, but I felt like it said a lot more about the state of a field of literature than the state of the world, because I had to take authors’ words for their findings. It had all the problems ESCs were designed to prevent.

I’m in the early stages of trying to form something better: something that incorporates the depth of epistemic spot checks with the breadth of field reviews. It’s designed to bootstrap from knowing nothing in a field to being grounded and informed and having a firm base on which to evaluate new information. This is hard and I expect it to take a while. 

 

Epistemic Spot Checks: The Fall of Rome

Introduction

Epistemic spot checks are a series in which I select claims from the first few chapters of a book and investigate them for accuracy, to determine if a book is worth my time. This month’s subject is The Fall of Rome, by Bryan Ward-Perkins, which advocates for the view that Rome fell, and it was probably a military problem.

Like August’s The Fate of Rome, this spot check was done as part of a collaboration with Parallel Forecasting and Foretold, which means that instead of resolving a claim as true or false, I give a confidence distribution of what I think I would answer if I spent 10 hours on the question (in reality I spent 10-45 minutes per question). Sometimes the claim is a question with a numerical answer, sometimes it is just a statement and I state how likely I think the statement is to be true.

This spot check is subject to the same constraints as The Fate of Rome, including:

  1. Some of my answers include research from the forecasters, not just my own.
  2. Due to our procedure for choosing questions, I didn’t investigate all the claims I would have liked to.

Claims

Claim made by the text:  “[Emperor Valerian] spent the final years of his life as a captive at the Persian Court”
Question I answered: what is the chance that is true?
My answer: I estimate a chance of (99 – 3*lognormal(0,1)) that Emperor Valerian was captured by the Persians and spent multiple years as a prisoner before dying in captivity.

You don’t even have to click on the Wikipedia page to confirm this is the common story: it’s in the google preview for “emperor valerian”. So the only question is the chance that all of history got this wrong. Wikipedia lists five primary sources, of which I verified three.  https://www.ancient-origins.net/history/what-really-happened-valerian-was-roman-emperor-humiliated-and-skinned-hands-enemy-008598 raises questions about how badly Valerian was treated, but not that he was captive.

My only qualm is the chance that this could be a lie perpetuated at the time. Maybe Valerian died and the Persians used a double, maybe something weirder happened. System 2 says the chance of this is < 10% but gut says < 15%.

 

Claim made by the text: “What had totally disappeared, however, were the good-quality, low-value items, made in bulk, and available so widely in the Roman period”
Question I answered: What is the chance mass-produced, low-value items available so widely in the Roman period, disappear in Britain by 600 AD?
My answer: I estimate a chance of (64 to 93, normal distribution) that mass-produced, low-value items were available in Britain during Roman rule and not after 600 AD.

This was one of the hardest claims to investigate, because it represents original research by Ward-Perkins. I had basically given up on answering this without a pottery PhD until google suggestions gave me the perfect article.

This is actually a compound claim by Ward-Perkins: 

  1. Roman coinage and mass-produced, low-cost, high-quality pottery disappeared from Britain and then the rest of post-Roman Europe.
  2. The state of pottery and coinage is a good proxy for the state of goods and trades as a whole, because they preserve so amazingly well and are relatively easy to date.

Data points:

    • Focuses on how amphorae were never really abundant in Britain
    • Chart stops at 400 AD
    • Graph showing large drops in amphorae distribution by 410 AD

If we believe Ward-Perkins and Brewminate, I estimate the chances that pottery massively declined at 95-99,  times 80-95 that other good declined with them. There remains the chances that the historical record is massively misleading (very unlikely with pots, although I don’t know how likely it is to have missed sites entirely), and that W-P et al are misinterpreting the record. I would be very surprised if so many sites had been missed as to invalidate this data, call it 5-15%. Gut feeling, 5-20% chance the W-P crowd are exaggerating the data, but given the absence of challenges, not higher than that and not a significant chance they’re just making shit up.

(95 to 99)*(85 to 95) * (80 to 95) = 64 to 93%

 

Claim made by the text: The Romans had mass literacy, which declined during the Dark Ages.
Question I answered: “[% population able to read at American 1st grade level during Imperial Rome] – [% population able to do same in the same geographic area in 1000 AD] = N%. What is N?”
My answer: I estimate that there is a 95% chance [Roman literacy] – [Dark Ages literacy] = (0 to 60, normal distribution) 

Data Points:

 

The highest estimate of literacy in Roman Empire I found is 30%.  Call it twice that for ability to read at a 1st grade level in cities. So the range is 5%-60%. 

The absolute lowest the European 1000AD literacy rate could be is 0; the highest estimate is 5% (and that was in the 1300s, which were probably more literate).  From the absence of graffiti I infer that even minimal literacy achievement dropped a great deal. 

Maximum = 60%-1% = 59%
Minimum = 5%-5%=0

 

Claim made by the text: “What some people describe as “the invasion of Rome by Germanic barbarians”, Walter Goffart describes as “the Romans incorporating the Germanic tribes into their citizenry and setting them up as rulers who reported to the empire.” and “Rome did fall, but only because it had voluntarily delegated its own power, not because it had been successfully invaded”.”
Question I answered: What is my confidence that this accurately represents historian Walter Goffart’s views?
My answer: I estimate that after 10 hours of research, I would be 68-92% confident this describes Goffart’s views accurately.

Data points:

  • https://blog.oup.com/2005/12/the_fall_of_rom/
    • Peter Heather: The most influential statement of this, perhaps, is Walter Goffart’s brilliant aphorism that the fall of the Western Empire was just ‘an imaginative experiment that got a little out of hand’. Goffart means that changes in Roman policy towards the barbarians led to the emergence of the successor states, dependant on barbarian military power and incorporating Roman institutions, and that the process which brought this out was not a particularly violent one.
  • https://www.goodreads.com/book/show/1680215.Barbarians_and_Romans_A_D_418_584?from_search=true 
    • Despite intermittent turbulence and destruction, much of the Roman West came under barbarian control in an orderly fashion.”
  • https://press.princeton.edu/titles/1036.html
    • Despite intermittent turbulence and destruction, much of the Roman West came under barbarian control in an orderly fashion. Goths, Burgundians, and other aliens were accommodated within the provinces without disrupting the settled population or overturning the patterns of landownership. Walter Goffart examines these arrangements and shows that they were based on the procedures of Roman taxation, rather than on those of military billeting (the so-called hospitalitas system), as has long been thought. Resident proprietors could be left in undisturbed possession of their lands because the proceeds of taxation,rather than land itself, were awarded to the barbarian troops and their leaders.”
  • https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1478-0542.2008.00523.x
    • “the barbarians and Rome, instead of being in constant conflict with each other, occupied a joined space, a single world in which both were entitled to share. What we call the barbarian invasions was primarily a drawing of foreigners into Roman service, a process sponsored, encouraged, and rewarded by Rome. Simultaneously, the Romans energetically upheld their supremacy. Many barbarian peoples were suppressed and vanished; the survivors were persuaded and learned to shoulder Roman tasks. Rome was never discredited or repudiated. The future endorsed and carried forward what the Empire stood for in religion, law, administration, literacy, and language.”
  • https://books.google.com/books/about/Rome_s_Fall_and_After.html?id=55pDIwvWnpoC “Rome’s Fall and After” indicates Goffat does believe Rome fell. But suggests its main problem was constantinople, not interactions with barbarians at all. So top percentage correct = 90%)

 

This seems pretty conclusive that Goffart thought Barbarians were accommodated rather than conquered the area (so my minimum estimate that the summary was correct must be greater than 50%). However it’s not clear how much power he thought they took, or whether rome fell at all. This could be a poor restatement, or it could be that if I read Goffart’s actual work and not just book jacket blurbs I’d agree.

 

Question I answered: Chance Elizabeth would recommend this book as a reliable source on the topic to an interested friend, if they asked tomorrow (8/31/19)?
My answer: There is a (91-99%, normal distribution) chance I would recommend this to a friend.

99% is in range, because I definitely think it’s worth reading if they’re interested in the topic. I think I’d recommend it before Fate of Rome, because it establishes that rome fell more concretely.

Is there a chance I wouldn’t recommend it?

  • They could have already read it
  • They could be more interested in disease and climate change (in which case I’d recommend Fate)
  • I could forget about it
  • I could not want to take responsibility for their reading.
  • I could be unconfident that Fall was better than what they’d find by chance.
    • This feels like the biggest one.
    • But the question doesn’t say “best book”, it just says “reliable source”
    • Only real qualm on that is that is normal history book qualms

So the minimum is 91%

 

Bonus Claims

These are the claims I didn’t check, but other people made predictions on how I would guess. Note that at this point the predictions haven’t been very accurate- whether they’re net positive depends on how you weight the questions. And Foretold is beta software that hasn’t prioritized export yet, so I’m using *shudder* screen shots. But for the sake of completeness:

Claim made by the text: The Fall of Rome: Roman Pottery pre-400AD was high quality and uniform.
Predicted answer: 29.9% to 63.5% chance this claim is correct

Claim made by the text: “In Britain new coins ceased to reach the island, except in tiny quantities, at the beginning of the fifth century”
Predicted answer: 31.6% to 94% chance this claim is correct

 

Claim made by the text: The Fall of Rome: [average German soldiers’ height] – [average Roman soldiers’ height] = N feet. What is N? .
Predicted answer: -0.107 to 0.61 ft.

 

Claim made by the text: The Romans chose to cede local control of Gaul to the Germanic tribes in the 400s, as opposed to losing them in a military conquest.
Predicted answer: 28.5% to 85.6% chance this claim is correct

 

Claim made by the text: The Germanic tribes who took over local control of Gaul in the 400s reported to the Emperor.
Predicted answer: 4.77% to 50.9% chance this claim is correct

 

Conclusion

The Fall of Rome did very well on spot-checking- no outright disagreements at all, just some uncertainties. 

On the other hand, The Fall of Rome barely mentions disease and doesn’t mention climate change at all, which my previous book, The Fate of Rome, claimed to be the main causes of the fall. The Fate of Rome did almost as well in epistemic spot checking as Fall, yet they can’t both be correct. What’s going on? I’m going to address that in a separate post, because I want to be able to link to it without forcing people to read this entire spot check.

In terms of readability, Fall starts slowly but the second half is by far the most interested I have ever been in pottery or archeology.

[Many thanks to my Patreon patrons and Parallel Forecast for financial support for this post]

Does combining epistemic spot checks and prediction markets sound super fun to you? Good news: We’re launching round three of the experiment today, with prizes of up to $65/question. The focal book will be The Unbound Prometheus, by David S. Landes, on the Industrial Revolution. The market opens today and will remain open until 10/27 (inclusive).

 

5 years ago I wrote a glowing review of Thomas Was Alone (republished below). Today it went up as part of the $1 tier at the latest Humble Bundle (meaning $1 gets you 4 games). And if you use this link, I get some tiny amount for your purchase.

BONUS: I bought this bundle for a different tier and don’t need my new copy of Thomas Was Alone or Stanley Parable. I will give these away to the first person to ask for each game (one game per person). E-mail me at elizabeth-at-this-domain.

My Review

Humans have an amazing ability to ascribe intention and emotion when logic tells us there could not possibly be any, a fact demonstrated most succinctly by this clip from Community

 

but proven somewhat more rigorously by An experimental study of apparent behaviour (PDF), in which experimental subjects were asked to watch and describ a short film showing some shapes moving around.  If you would like to play along at home, I’ve embedded the video below.

 

The first subject group (n=34 undergraduate women) was given no instruction beyond “describe what happened in the movie.”  Exactly one subject described it in purely geometric terms.  Two others described the shapes as birds, and the rest described them as humans.  19 gave a full story.  The stories people told  (in this treatment and another where subjects were primed to view the shapes as people) had a shocking amount in common, suggesting there was something innate in the interpretation.*

My point is, humans will bond with anything.  In many ways it’s easier to bond with/project onto simple objects than actual humans or almost humans.  This can be used to great effect in art, to evoke desired emotions without all the messiness of using real people.  A simple example is an extremely short, simple game whose name I’m not going to tell you, because it would bias your experience of it.

Did you play it?  The game’s name is Loneliness.  Can you guess why?

I like to think the shunned little square from Loneliness grew up in to be Chris in Thomas Was Alone, a game about rectangles making friends.  Thomas Was Alone‘s premise sounds kinds of dumb: it’s a puzzle platformer with some narration ascribing emotions to the rectangles you solve puzzles with.  But it pulls this off so masterfully I actually bought branded merchandise of it, which is something I can’t say about a single other game.  The story is genuinely sweet, but the real skill is in how the puzzles reinforce it.  Each rectangle has slightly different skills, some more useful than others.  Chris is a shitty jumper whose initial story revolved around resenting the better jumper, and who is nothing but dead weight in the first puzzles (the other rectangles could get through without him, but he could not with them) suddenly becomes indispensable, I felt pride and relief.

TWA starts out a little slow.  If you want to play, finish the first world before deciding whether to continue or quit.  But I highly recommend it both as an interesting example of human psychology, and as a piece of happy art, which I don’t think we see enough of.

Okay, fine, I don’t see enough of because I’m a severe subscriber to the dark and edgy trend.  But that just makes Thomas Was Alone more impressive.

*Attenuated by the fact that women attending college during WW2 is a narrow subset of the population.

Update on my request for relationship failure data:

Based on not that many data points (12, across 9 people), plus years of reading r/relationships, I see three clusters of breakups among relationship-escalator couples who broke up after moving in together:

  1. Problems identified within a few months of living together (e.g., housekeeping standards)
  2. Slow drips of dissatisfaction (nothing is too bad, but at two years in you’re still not excited enough to marry them and so you break up)
  3. Surprise revelations that could have come at any time, perhaps made easier to catch by the day-to-day intimacy of sharing a home (e.g., lying about failing out of school),

I would like to be able to share more about my data, but due to privacy concerns cannot. This may also limit my ability to answer questions in the comments.

If you would like to add an example, especially a counter-example, to my data set, feel free to comment here or email me at elizabeth-at-this-domain

When Did You Learn Your Relationship Dealbreaker?

People who have lived with and subsequently broken up a with a partner you would have married had things gone well: what did you (or they) learn that changed your (their) mind, and how long had you lived together when it was discovered?

Common wisdom is live together for a year before making future plans, but I’m trying to figure out where the inflection points actually are.

Feel free to e-mail (elizabeth-at-this-domain) or comment anonymously if you want to share but don’t want it to be public.

So far I have eight pieces of data from five people, and have an idea about a trend but need more data to confirm it.

Epistemic Spot Check: The Fate of Rome (Kyle Harper)

Introduction

Epistemic spot checks are a series in which I select claims from the first few chapters of a book and investigate them for accuracy, to determine if a book is worth my time. This month’s subject is The Fate of Rome, by Kyle Harper, which advocates for the view that Rome was done in by climate change and infectious diseases (which were exacerbated by climate change).

This check is a little different than the others, because it arose from a collaboration with some folks in the forecasting space. Instead of just reading and evaluating claims myself, I took claims from the book and made them into questions on a prediction market, for which several people made predictions of what my answer would be before I gave it. In some but not all cases I read their justifications (although not numeric estimates) before making my final judgement.

I expect we’ll publish a post-mortem on that entire process at some point, but for now I just want to publish the actual spot check. Because of the forecasting crossover, this spot check will differ from those that came before in the following ways:

  1. Claims are formatted as questions answerable with a probability. If a claim lacks a question mark, the implicit question is “what is the probability this is true?”.
  2. Questions have a range of specificity, to allow us to test what kind of ambiguities we can get away with (answer: less than I used).
  3. Some of my answers include research from the forecasters, not just my own.
  4. Due to timing issues, I finished the book and a second on the topic before I did the research for spot check.
  5. Due to our procedure for choosing questions, I didn’t investigate all the claims I would have liked to.

 

Claims

Original Claim: “Very little of Roman wealth was due to new technological discoveries, as opposed to diffusion of existing tech to new places, capital accumulation, and trade.”
Question: What percentage of Rome’s gains came from technological gains, as opposed to diffusion of technical advantages, capital accumulation, and trade?

1%-30% log distribution

Data:

  • The Fall of Rome talks extensively about how trade degraded when the Romans left and how that lowered the standard of living.
  • https://brilliantmaps.com/roman-empire-gdp/ shows huge differences in GDP by region, implying there was a big opportunity to grow GDP through trade and diffusion of existing tech. That means potential growth just from catch up growth was > 50%.
  • Wikipedia doesn’t even show growth in GDP per capita (with extremely wide error bars) from 14AD to 150AD.
  • Rome did have construction and military tech (https://en.wikipedia.org/wiki/Roman_technology)
  • It also seems likely that expansion created a kind of Dutch disease, in which capable, ambitious people were drawn to fighting and/or politics, and not discovering new tech.
  • One potential place where Roman technology could have contributed greatly to the economy was lowering disease via sanitation infrastructure. According to Fate of Rome and my own research, this didn’t happen; sanitation was not end to end and therefor you had all the problems inherent in city living.

Original Claim: “The blunt force of infectious disease was, by far, the overwhelming determinant of a mortality regime that weighed heavily on Roman demography”
Question: Even during the Republic and successful periods of the empire, disease burden was very high in cities.

60%-90% normal distribution

The wide spread and lack of inclusion of 100% in the confidence interval stem from the lack of precision in the question. What distinguishes “high” from “very high”, and are we counting diseases of malnutrition or just infectious ones? I expected to knock this one out in two minutes, but ended up feeling the current estimates of disease mortality lack the necessary precision to answer it.

Data:

 

Original Claim: “The main source of population growth in the Roman Empire was not a decline in mortality but, rather, elevated levels of fertility”
Question: When Imperial Rome’s population was growing, it was due to a decline in death rates, rather than elevated fertility.

80-100%, c – log distribution

“Elizabeth, that rephrase doesn’t look much like that original claim” you might be saying quietly to yourself. You are correct- I misread the claim in the book, at least twice, and didn’t catch it until this write-up. This isn’t as bad as it seems. The claims are not quite opposite, because my rephrase was trying to explain variation in growth within Rome, and the book was trying to explain absolute levels, or possibly the difference relative to today.

Back when he was doing biology, Richard Dawkins had a great answer to the common question “how much is X due to genetics, as opposed to environment?”. He said asking that is like asking how much of a rectangle’s area is due to its length, as opposed to its width. It’s a nonsensical question. But you assign proportionate responsibility for the change in area between two rectangles.

Fate‘s original claim was much like asking how much of a trait is due to genetics. This is bad and it should feel bad, but it’s a very common mistake, and I give Fate a lot of credit for providing the underlying facts such that I could translate it into the “what causes differences between things” question without even noticing.

Since weak framing wasn’t a systemic problem in the book and it presented the underlying facts well enough for me to form my own, correct, model, I’m not docking Fate very harshly on this one.

Original Claim: “The size of Roman merchant ships was not exceeded until the 15th century, and the grain ships were not surpassed until the 19th.”
Question: “The size of Roman merchant ships was not exceeded until the 15th century, and the grain ships were not surpassed until the 19th.”

0-10% log distribution.

This is true within the Mediterranean, but if  you check Chinese ships it’s obvious it’s off by at least 100 years, possibly more.

Original Claim: too diffuse to quote.
Question: The Roman Empire suffered greatly from intense epidemics, more so than did the Republic or 700-1000 AD Europe.

90-100% c – log distribution

https://en.wikipedia.org/wiki/List_of_epidemics shows a pretty clear presence of epidemics in the relevant period and absence in the others.

 

Original Claim: too diffuse to quote.
Question: Starvation was not a big concern in Imperial Rome’s prime.

80-100% c – log distribution

https://en.wikipedia.org/wiki/List_of_famines shows Roman famine in 441 BC (the Republic) and isolated famines from 370 on, but pretty much validates that during the prime empire, mass starvation was not a threat.

Conclusion:

My fact checking found two flaws:

  1. An inaccuracy in when ships that exceeded the size of Roman trade ships were built, and/or forgetting China was a thing. The inaccuracy does not invalidate the author’s point, which is that the Romans had better shipping technology than the cultures that followed them.
  2. Bad but extremely common framing for the relative effects of disease mortality vs. birth rates.

These is well within tolerances for things a book might get wrong. I’m happy I read this book, and would read another by the same author (with perhaps more care when it refers to happenings outside of Europe), but they are not jumping to the of my list.

Is The Fate of Rome correct in its thesis that Rome was brought down by climate change and disease? I don’t know. It certainly seems plausible, but is clearly advocating for a position rather than trying to present all the relevant facts. There are obvious political implications to Fate even if it doesn’t spell them out, so I would want to read at least one of the 80 million other books on the Fall of Rome before I developed an opinion. I’m told some people think it had to do with something military, which Fate barely deigns to mention. In the future I hope to be a good enough prediction-maker to put a range on this anyways, however wide it must be, but for now I’m succumbing to the siren song of “but you could just get more data”.

[Many thanks to my Patreon patrons and Parallel Forecast for financial support for this post]

PS. This book is the first step of an ongoing experiment with epistemic spot checks and prediction markets. If you would like to participate in or support these experiments, please e-mail me at elizabeth-at-this-domain-name. The next round is planned to start Saturday August 24th.