ESC Process Notes: Claim Evaluation vs. Syntheses

Forgive me if some of this is repetitive, I can’t remember what I’ve written in which draft and what’s actually been published, much less tell what’s actually novel. Eventually there will be a polished master post describing my overall note taking method and leaving out most of how it was developed, but it also feels useful to discuss the journey.

When I started taking notes in Roam (a workflowy/wiki hybrid), I would:

  1. Create a page for the book (called a Source page), with some information like author and subject (example)
  2. Record every claim the book made on that Source page
  3. Tag each claim so it got its own page
  4. When I investigated a claim, gather evidence from various sources and list it on the claim page, grouped by source

This didn’t make sense though: why did some sources get their own page and some a bullet point on a claims page? Why did some claims get their own page and some not? What happened if a piece of evidence was useful in multiple claims?

Around this time I coincidentally had a call with Roam CEO Conor White-Sullivan to demo a bug I thought I had found. There was no bug, I had misremembered the intended behavior, but this meant that he saw my system and couldn’t hide his flinch. Aside from wrecking performance, there was no need to give each claim its own page: Roam has block references, so you can point to bullet points, not just pages.

When Conor said this, something clicked. I had already identified one of the problems with epistemic spot checks as being too binary, too focused on evaluating a particular claim or book than building knowledge. The original way of note taking was a continuation of that. What I should be doing was gathering multiple sources, taking notes on equal footing, and then combining them into an actual belief using references to the claims’ bullet points. I call that a Synthesis (example). Once I had an actual belief, I could assess the focal claim in context and give it a credence (a slider from 1-10), which could be used to inform my overall assessment of the book.

Sometimes there isn’t enough information to create a Synthesis, so something is left as a Question instead (example).

Once I’d proceduralized this a bit, it felt so natural and informative I assumed everyone else would find it similarly so.  Finally you didn’t have to take my word for what was important- you could see all the evidence I’d gathered and then click through to see the context on anything you thought deserved a closer look. Surely everyone will be overjoyed that I am providing this

Feedback was overwhelming that this was much worse, no one wanted to read my Roam DB, and I should keep presenting evidence linearly.

I refuse to accept that my old way is the best way of presenting evidence and conclusions about a book or a claim. It’s too linear and contextless. I do accept that “here’s my Roam have fun” is worse. Part of my current project is to identify a third way that shares the information I want to in a way that is actually readable.

How’s that Epistemic Spot Check Project Coming?

Quick context: Epistemic spot checks started as a process in which I did quick investigations a few of a book’s early claims to see if it was trustworthy before continuing to read it, in order to avoid wasting time on books that would teach me wrong things. Epistemic spot checks worked well enough for catching obvious flaws (*cou*Carol Dweck*ugh*), but have a number of problems. They emphasize a trust/don’t trust binary over model building, and provability over importance. They don’t handle “severely flawed but deeply insightful” well at all. So I started trying to create something better

Below are some scattered ideas I’m playing with that relate to this project. They’re by no means fully baked, but it seemed like it might be helpful to share them. This kind of assumes you’ve been following my journey with epistemic spot checks at least a little. If you haven’t that’s fine, a more polished version of these ideas will come out eventually.

A parable in Three Books.

I’m currently attempting to write up an investigation of Children and Childhood in Roman Italy (Beryl Rawson) (affiliate link) (Roam notes). This is very slow going, because ChCiRI doesn’t seem to have a thesis. At least, I haven’t found one, and I’ve read almost half of the content. It’s just a bunch of facts. Often not even syntheses, just “Here is one particular statue and some things about it.” I recognize that this is important work, even the kind of work I’d use to verify another book’s claims. But as a focal source, it’s deadly boring to take notes on and very hard to write anything interesting about. What am I supposed to say? “Yes, that 11 year old did do well (without winning) in a poetry competition and it was mentioned on his funeral altar, good job reporting that.” I want to label this sin “weed based publishing” (as in, “lost in the weeds”, although the fact that I have to explain that is a terrible sign for it as a name).

One particular bad sign for Children and Childhood in Roman Italy was that I found myself copying multiple sentences at once into my notes. Direct quoting can sometimes mean “there’s only so many ways to arrange these words and the author did a perfectly good job so why bother”, but when it’s frequent, and long, it often means “I can’t summarize or distill what the author is saying”, which can mean the author is being vague, eliding over important points, or letting implications do work that should be made explicit. This was easier to notice when I was taking notes in Roam (a workflowy/wiki hybrid) because Roam pushes me to make my bullet points as self-contained as possible (so when you refer them in isolation nothing is lost), so it became obvious and unpleasant when I couldn’t split a paragraph into self contained assertions. Obviously real life is context-dependent and you shouldn’t try to make things more self-contained than they are, but I’m comfortable saying frequent long quotes are a bad sign about a book.

On the other side you have The Unbound Prometheus (David S. Landes) (affiliate link) (Roam notes), which made several big, interesting, important, systemic claims (e.g., “Britain had a legal system more favorable to industrialization than continental Europe’s”, “Europe had a more favorable climate for science than Islamic regions”), none of which it provided support for (in the sections I read- a friend tells me he gets more specific later). I tried to investigate these myself and ended up even more confused- scholars can’t even agree on whether Britain’s patent protections were strong or weak. I want to label this sin “making me make your case for you”.

A Goldilocks book is The Fate of Rome (Kyle Harper) (affiliate link) (Roam notes). Fate of Rome’s thesis is that the peak of the Roman empire corresponds with unusually favorable weather conditions in the mediteranean. It backs this up with claims about climate archeology, e.g., ice core data (claim 1, 2). This prompted natural and rewarding follow up questions like “What is ice core capable of proving?” and “What does it actually show?”. My note taking system in Roam was superb at enabling investigations of questions like these (my answer).

Based on claims creation, Against the Grain (James Scott) (affiliate link) (Roam notes) is even better. It has both interesting high level models (“settlement and states are different thing that came very far apart”, “states are entangled with grains in particular”) and very specific claims to back them up (“X was permanently settled in year Y but didn’t develop statehood hallmarks A, B, and C until year Z”). It is very easy to see how that claim supports that model, and the claim is about as easy to investigate as it can be. It is still quite possible that the claim is wrong or more controversial than the author is admitting, but it’s something I’ll be able to determine in a reasonable amount of time. As opposed to Unbound Prometheus, where I still worry there’s a trove of data somewhere that answers all of the questions conclusively and I just failed to find it.

[Against the Grain was started as part of the Forecasting project, which is currently being reworked. I can’t research its claims because that would ruin our ability to use it for the next round, should we choose to do so, so evaluation is on hold.]

If you asked me to rate these books purely on ease-of-reading, the ordering (starting with the easiest) would be:

  • Against the Grain
  • The Fate of Rome
  • Children and Childhood in Roman Italy
  • The Unbound Prometheus

Which is also very nearly the order they were published in (Against the Grain came out six weeks before Fate of Rome; the others are separated by decades). It’s possible that the two modern books were no better epistemically but felt so because they were easier to read. It’s also possible it’s a coincidence, or that epistemics have gotten better in the last 50 years.

Model Based Reading

As is kind of implied in the parable above, one shift in Epistemic Spot Checks is a new emphasis on extracting and evaluating the author’s models, which includes an emphasis on finding load bearing facts. I feel dumb for not emphasizing this sooner, but better late than never. I think the real trick here is not identifying that knowing a book’s models are good, but creating techniques for how to do that.

How do we Know This?

The other concept I’m playing with is that “what we know” is inextricable from “how we know it”. This is dangerously close to logical positivism, which I disagree with my limited understanding of. And yet it’s really improved my thinking when doing historical research.

This is a pretty strong reversal for me. I remember strongly wanting to just be told what we knew in my science classes in college, not the experiments that revealed it. I’m now pretty sure that’s scientism, not science.

How’s it Going with Roam?

When I first started taking notes with Roam (note spelling), I was pretty high on it. Two months later, I’m predictably loving it less than I did (it no longer drives me to do real life chores), but still find it indispensable. The big discovery is that the delight it brings me is somewhat book dependent- it’s great for Against the Grain or The Fate of Rome, but didn’t help nearly so much with Children and Childhood in Roman Italy, because it was most very on-the-ground facts that didn’t benefit from my verification system and long paragraphs that couldn’t be disambiguated.

I was running into a ton of problems with Roam’s search not handling non-sequential words, but they seem to have fixed that. Search is still not ideal, but it’s at least usable

Roam is pretty slow. It’s currently a race between their performance improvements and my increasing hoard of #Claims.

What Comes After Epistemic Spot Checks?

When I read a non-fiction book, I want to know if it’s correct before I commit anything it says to memory. But if I already knew the truth status of all of its claims, I wouldn’t need to read it. Epistemic Spot Checks are an attempt to square that circle by sampling a book’s claims and determining their truth value, with the assumption that the sample is representative of the whole.

Some claims are easier to check than others. On one end are simple facts, e.g., “Emperor Valerian spent his final years as a Persian prisoner”. This was easy and quick to verify: googling “emperor valerian” was quite sufficient. “Roman ship sizes weren’t exceeded until the 15th century” looks similar, but it wasn’t. If you google the claim itself, it will look confirmed (evidence: me and 3 other people in the forecasting tournament did this). At the last second while researching this, I decided to check the size of Chinese ships, which surpassed Roman ships sooner than Europe did, by about a century.

On first blush this looks like a success story, but it’s not. I was only able to catch the mistake because I had a bunch of background knowledge about the state of the world. If I didn’t already know mid-millenium China was better than Europe at almost everything (and I remember a time when I didn’t), I could easily have drawn the wrong conclusion about that claim. And following a procedure that would catch issues like this every time would take much more time than ESCs currently get.

Then there’s terminally vague questions, like “Did early modern Europe have more emphasis on rationality and less superstition than other parts of the world?” (As claimed by The Unbound Prometheus). It would be optimistic to say that question requires several books to answer, but even if that were true, each of them would need at least an ESC themselves to see if they’re trustworthy, which might involve checking other claims requiring several books to verify… pretty soon it’s a master’s thesis.

But I can’t get a master’s degree in everything interesting or relevant to my life. And that brings up another point: credentialism. A lot of ESC revolves around “Have people who have officially been Deemed Credible sanctioned this fact?” rather than “Have I seen evidence that I, personally, judge to be compelling?” 

The Fate of Rome (Kyle Harper) and The Fall of Rome (Bryan Ward-Perkins) are both about the collapse of the western Roman empire. They both did almost flawlessly on their respective epistemic spot checks. And yet, they attribute the fall of Rome to very different causes, and devote almost no time to the others’ explanation. If you drew a venn diagram of the data they discuss, the circles would be almost but not quite entirely distinct. The word “plague” appears 651 times in Fate and 6 times in Fall, who introduces the topic mostly to dismiss the idea that it was causally related to the fall- which is how Fate treats all those border adjustments happening from 300 AD on. Fate is very into discussing climate, but Fall uses that space to talk about pottery.

This is why I called the process epistemic spot checking, not truth-checking. Determining if a book is true requires not only determining if each individual claim is true, but what other explanations exist and what has been left out. Depending on the specifics, ESC as I do them now are perhaps half the work of reading the subsection of the book I verify. Really thoroughly checking a single paragraph in a published paper took me 10-20 hours. And even if I succeed at the ESC, all I have is a thumbs up/thumbs down on the book.

Around the same time I was doing ESCs on The Fall of Rome and The Fate of Rome (the ESCs were published far apart to get maximum publicity for the Amplification experiment, but I read and performed the ESCs very close together), I was commissioned to do a shallow review on the question of “How to get consistency in predictions or evaluations of questions?” I got excellent feedback from the person who commissioned it, but I felt like it said a lot more about the state of a field of literature than the state of the world, because I had to take authors’ words for their findings. It had all the problems ESCs were designed to prevent.

I’m in the early stages of trying to form something better: something that incorporates the depth of epistemic spot checks with the breadth of field reviews. It’s designed to bootstrap from knowing nothing in a field to being grounded and informed and having a firm base on which to evaluate new information. This is hard and I expect it to take a while.