When I read a non-fiction book, I want to know if it’s correct before I commit anything it says to memory. But if I already knew the truth status of all of its claims, I wouldn’t need to read it. Epistemic Spot Checks are an attempt to square that circle by sampling a book’s claims and determining their truth value, with the assumption that the sample is representative of the whole.
Some claims are easier to check than others. On one end are simple facts, e.g., “Emperor Valerian spent his final years as a Persian prisoner”. This was easy and quick to verify: googling “emperor valerian” was quite sufficient. “Roman ship sizes weren’t exceeded until the 15th century” looks similar, but it wasn’t. If you google the claim itself, it will look confirmed (evidence: me and 3 other people in the forecasting tournament did this). At the last second while researching this, I decided to check the size of Chinese ships, which surpassed Roman ships sooner than Europe did, by about a century.
On first blush this looks like a success story, but it’s not. I was only able to catch the mistake because I had a bunch of background knowledge about the state of the world. If I didn’t already know mid-millenium China was better than Europe at almost everything (and I remember a time when I didn’t), I could easily have drawn the wrong conclusion about that claim. And following a procedure that would catch issues like this every time would take much more time than ESCs currently get.
Then there’s terminally vague questions, like “Did early modern Europe have more emphasis on rationality and less superstition than other parts of the world?” (As claimed by The Unbound Prometheus). It would be optimistic to say that question requires several books to answer, but even if that were true, each of them would need at least an ESC themselves to see if they’re trustworthy, which might involve checking other claims requiring several books to verify… pretty soon it’s a master’s thesis.
But I can’t get a master’s degree in everything interesting or relevant to my life. And that brings up another point: credentialism. A lot of ESC revolves around “Have people who have officially been Deemed Credible sanctioned this fact?” rather than “Have I seen evidence that I, personally, judge to be compelling?”
The Fate of Rome(Kyle Harper) and The Fall of Rome (Bryan Ward-Perkins) are both about the collapse of the western Roman empire. They both did almost flawlessly on their respective epistemic spot checks. And yet, they attribute the fall of Rome to very different causes, and devote almost no time to the others’ explanation. If you drew a venn diagram of the data they discuss, the circles would be almost but not quite entirely distinct. The word “plague” appears 651 times in Fate and 6 times in Fall, who introduces the topic mostly to dismiss the idea that it was causally related to the fall- which is how Fate treats all those border adjustments happening from 300 AD on. Fate is very into discussing climate, but Fall uses that space to talk about pottery.
This is why I called the process epistemic spot checking, not truth-checking. Determining if a book is true requires not only determining if each individual claim is true, but what other explanations exist and what has been left out. Depending on the specifics, ESC as I do them now are perhaps half the work of reading the subsection of the book I verify. Really thoroughly checking a single paragraph in a published paper took me 10-20 hours. And even if I succeed at the ESC, all I have is a thumbs up/thumbs down on the book.
Around the same time I was doing ESCs on The Fall of Rome and The Fate of Rome (the ESCs were published far apart to get maximum publicity for the Amplification experiment, but I read and performed the ESCs very close together), I was commissioned to do a shallow review on the question of “How to get consistency in predictions or evaluations of questions?” I got excellent feedback from the person who commissioned it, but I felt like it said a lot more about the state of a field of literature than the state of the world, because I had to take authors’ words for their findings. It had all the problems ESCs were designed to prevent.
I’m in the early stages of trying to form something better: something that incorporates the depth of epistemic spot checks with the breadth of field reviews. It’s designed to bootstrap from knowing nothing in a field to being grounded and informed and having a firm base on which to evaluate new information. This is hard and I expect it to take a while.
Epistemic spot checks are a series in which I select claims from the first few chapters of a book and investigate them for accuracy, to determine if a book is worth my time. This month’s subject is The Fall of Rome, by Bryan Ward-Perkins, which advocates for the view that Rome fell, and it was probably a military problem.
Like August’s The Fate of Rome, this spot check was done as part of a collaboration with Parallel Forecasting and Foretold, which means that instead of resolving a claim as true or false, I give a confidence distribution of what I think I would answer if I spent 10 hours on the question (in reality I spent 10-45 minutes per question). Sometimes the claim is a question with a numerical answer, sometimes it is just a statement and I state how likely I think the statement is to be true.
This spot check is subject to the same constraints as The Fate of Rome, including:
Some of my answers include research from the forecasters, not just my own.
Due to our procedure for choosing questions, I didn’t investigate all the claims I would have liked to.
Claim made by the text: “[Emperor Valerian] spent the final years of his life as a captive at the Persian Court” Question I answered: what is the chance that is true? My answer: I estimate a chance of (99 – 3*lognormal(0,1)) that Emperor Valerian was captured by the Persians and spent multiple years as a prisoner before dying in captivity.
My only qualm is the chance that this could be a lie perpetuated at the time. Maybe Valerian died and the Persians used a double, maybe something weirder happened. System 2 says the chance of this is < 10% but gut says < 15%.
Claim made by the text: “What had totally disappeared, however, were the good-quality, low-value items, made in bulk, and available so widely in the Roman period” Question I answered: What is the chance mass-produced, low-value items available so widely in the Roman period, disappear in Britain by 600 AD? My answer: I estimate a chance of (64 to 93, normal distribution) that mass-produced, low-value items were available in Britain during Roman rule and not after 600 AD.
This was one of the hardest claims to investigate, because it represents original research by Ward-Perkins. I had basically given up on answering this without a pottery PhD until google suggestions gave me the perfect article.
This is actually a compound claim by Ward-Perkins:
Roman coinage and mass-produced, low-cost, high-quality pottery disappeared from Britain and then the rest of post-Roman Europe.
The state of pottery and coinage is a good proxy for the state of goods and trades as a whole, because they preserve so amazingly well and are relatively easy to date.
https://brewminate.com/the-perils-of-periodization-roman-ceramics-in-britain-after-400-ce/ Cites exactly the pattern Ward-Perkins describes in pottery and coins, citing Ward-Perkins and 4 other source I couldn’t verify. This seems like very strong confirmation of W-P’s hypothesis. However it leaves open the chance that this is an area of contention, of which W-P and brewminate happen to be on the same side. Brewminate’s main focus is not on the British empire, but on pottery in the gap between the Roman’s and the Anglo-Saxons. I estimate that would bias them towards increasing the size of the discontinuity between Roman and post-Roman Britain.
Searching for “the fall of rome ward-perkins criticism” turns up nothing interesting in the first two pages.
Searched for “late antiquity british pottery” found nothing. “late antiquity” is the term for the narrative that Rome didn’t fall, it just transformed, and is what Ward-Perkins is directly arguing against, so I’d expect criticism of him to be coded with that term.
If we believe Ward-Perkins and Brewminate, I estimate the chances that pottery massively declined at 95-99, times 80-95 that other good declined with them. There remains the chances that the historical record is massively misleading (very unlikely with pots, although I don’t know how likely it is to have missed sites entirely), and that W-P et al are misinterpreting the record. I would be very surprised if so many sites had been missed as to invalidate this data, call it 5-15%. Gut feeling, 5-20% chance the W-P crowd are exaggerating the data, but given the absence of challenges, not higher than that and not a significant chance they’re just making shit up.
(95 to 99)*(85 to 95) * (80 to 95) = 64 to 93%
Claim made by the text: The Romans had mass literacy, which declined during the Dark Ages. Question I answered: “[% population able to read at American 1st grade level during Imperial Rome] – [% population able to do same in the same geographic area in 1000 AD] = N%. What is N?” My answer: I estimate that there is a 95% chance [Roman literacy] – [Dark Ages literacy] = (0 to 60, normal distribution)
“Estimates of the average literacy rate in the Empire range from 5 to 30% or higher, depending in part on the definition of “literacy” I’ll use the high end, since I’m using a pretty minimal definition of literacy
The highest estimate of literacy in Roman Empire I found is 30%. Call it twice that for ability to read at a 1st grade level in cities. So the range is 5%-60%.
The absolute lowest the European 1000AD literacy rate could be is 0; the highest estimate is 5% (and that was in the 1300s, which were probably more literate). From the absence of graffiti I infer that even minimal literacy achievement dropped a great deal.
Maximum = 60%-1% = 59% Minimum = 5%-5%=0
Claim made by the text: “What some people describe as “the invasion of Rome by Germanic barbarians”, Walter Goffart describes as “the Romans incorporating the Germanic tribes into their citizenry and setting them up as rulers who reported to the empire.” and “Rome did fall, but only because it had voluntarily delegated its own power, not because it had been successfully invaded”.” Question I answered: What is my confidence that this accurately represents historian Walter Goffart’s views? My answer: I estimate that after 10 hours of research, I would be 68-92% confident this describes Goffart’s views accurately.
Peter Heather: The most influential statement of this, perhaps, is Walter Goffart’s brilliant aphorism that the fall of the Western Empire was just ‘an imaginative experiment that got a little out of hand’. Goffart means that changes in Roman policy towards the barbarians led to the emergence of the successor states, dependant on barbarian military power and incorporating Roman institutions, and that the process which brought this out was not a particularly violent one.
“Despite intermittent turbulence and destruction, much of the Roman West came under barbarian control in an orderly fashion. Goths, Burgundians, and other aliens were accommodated within the provinces without disrupting the settled population or overturning the patterns of landownership. Walter Goffart examines these arrangements and shows that they were based on the procedures of Roman taxation, rather than on those of military billeting (the so-called hospitalitas system), as has long been thought. Resident proprietors could be left in undisturbed possession of their lands because the proceeds of taxation,rather than land itself, were awarded to the barbarian troops and their leaders.”
“the barbarians and Rome, instead of being in constant conflict with each other, occupied a joined space, a single world in which both were entitled to share. What we call the barbarian invasions was primarily a drawing of foreigners into Roman service, a process sponsored, encouraged, and rewarded by Rome. Simultaneously, the Romans energetically upheld their supremacy. Many barbarian peoples were suppressed and vanished; the survivors were persuaded and learned to shoulder Roman tasks. Rome was never discredited or repudiated. The future endorsed and carried forward what the Empire stood for in religion, law, administration, literacy, and language.”
This seems pretty conclusive that Goffart thought Barbarians were accommodated rather than conquered the area (so my minimum estimate that the summary was correct must be greater than 50%). However it’s not clear how much power he thought they took, or whether rome fell at all. This could be a poor restatement, or it could be that if I read Goffart’s actual work and not just book jacket blurbs I’d agree.
Question I answered: Chance Elizabeth would recommend this book as a reliable source on the topic to an interested friend, if they asked tomorrow (8/31/19)? My answer: There is a (91-99%, normal distribution) chance I would recommend this to a friend.
99% is in range, because I definitely think it’s worth reading if they’re interested in the topic. I think I’d recommend it before Fate of Rome, because it establishes that rome fell more concretely.
Is there a chance I wouldn’t recommend it?
They could have already read it
They could be more interested in disease and climate change (in which case I’d recommend Fate)
I could forget about it
I could not want to take responsibility for their reading.
I could be unconfident that Fall was better than what they’d find by chance.
This feels like the biggest one.
But the question doesn’t say “best book”, it just says “reliable source”
Only real qualm on that is that is normal history book qualms
So the minimum is 91%
These are the claims I didn’t check, but other people made predictions on how I would guess. Note that at this point the predictions haven’t been very accurate- whether they’re net positive depends on how you weight the questions. And Foretold is beta software that hasn’t prioritized export yet, so I’m using *shudder* screen shots. But for the sake of completeness:
Claim made by the text: The Fall of Rome: Roman Pottery pre-400AD was high quality and uniform. Predicted answer:29.9% to 63.5% chance this claim is correct
Claim made by the text: “In Britain new coins ceased to reach the island, except in tiny quantities, at the beginning of the fifth century” Predicted answer:31.6% to 94% chance this claim is correct
Claim made by the text: The Fall of Rome: [average German soldiers’ height] – [average Roman soldiers’ height] = N feet. What is N? . Predicted answer:-0.107 to 0.61 ft.
Claim made by the text: The Romans chose to cede local control of Gaul to the Germanic tribes in the 400s, as opposed to losing them in a military conquest. Predicted answer:28.5% to 85.6% chance this claim is correct
Claim made by the text: The Germanic tribes who took over local control of Gaul in the 400s reported to the Emperor. Predicted answer:4.77% to 50.9% chance this claim is correct
The Fall of Rome did very well on spot-checking- no outright disagreements at all, just some uncertainties.
On the other hand, The Fall of Rome barely mentions disease and doesn’t mention climate change at all, which my previous book, The Fate of Rome, claimed to be the main causes of the fall. The Fate of Rome did almost as well in epistemic spot checking as Fall, yet they can’t both be correct. What’s going on? I’m going to address that in a separate post, because I want to be able to link to it without forcing people to read this entire spot check.
In terms of readability, Fall starts slowly but the second half is by far the most interested I have ever been in pottery or archeology.
Does combining epistemic spot checks and prediction markets sound super fun to you? Good news: We’re launching round three of the experiment today, with prizes of up to $65/question. The focal book will be The Unbound Prometheus, by David S. Landes, on the Industrial Revolution. The market opens today and will remain open until 10/27 (inclusive).
Epistemic spot checks are a series in which I select claims from the first few chapters of a book and investigate them for accuracy, to determine if a book is worth my time. This month’s subject is The Fate of Rome, by Kyle Harper, which advocates for the view that Rome was done in by climate change and infectious diseases (which were exacerbated by climate change).
This check is a little different than the others, because it arose from a collaboration with some folks in the forecasting space. Instead of just reading and evaluating claims myself, I took claims from the book and made them into questions on a prediction market, for which several people made predictions of what my answer would be before I gave it. In some but not all cases I read their justifications (although not numeric estimates) before making my final judgement.
I expect we’ll publish a post-mortem on that entire process at some point, but for now I just want to publish the actual spot check. Because of the forecasting crossover, this spot check will differ from those that came before in the following ways:
Claims are formatted as questions answerable with a probability. If a claim lacks a question mark, the implicit question is “what is the probability this is true?”.
Questions have a range of specificity, to allow us to test what kind of ambiguities we can get away with (answer: less than I used).
Some of my answers include research from the forecasters, not just my own.
Due to timing issues, I finished the book and a second on the topic before I did the research for spot check.
Due to our procedure for choosing questions, I didn’t investigate all the claims I would have liked to.
Original Claim: “Very little of Roman wealth was due to new technological discoveries, as opposed to diffusion of existing tech to new places, capital accumulation, and trade.” Question: What percentage of Rome’s gains came from technological gains, as opposed to diffusion of technical advantages, capital accumulation, and trade?
1%-30% log distribution
The Fall of Rome talks extensively about how trade degraded when the Romans left and how that lowered the standard of living.
https://brilliantmaps.com/roman-empire-gdp/ shows huge differences in GDP by region, implying there was a big opportunity to grow GDP through trade and diffusion of existing tech. That means potential growth just from catch up growth was > 50%.
Wikipedia doesn’t even show growth in GDP per capita (with extremely wide error bars) from 14AD to 150AD.
It also seems likely that expansion created a kind of Dutch disease, in which capable, ambitious people were drawn to fighting and/or politics, and not discovering new tech.
One potential place where Roman technology could have contributed greatly to the economy was lowering disease via sanitation infrastructure. According to Fate of Rome and my own research, this didn’t happen; sanitation was not end to end and therefor you had all the problems inherent in city living.
Original Claim: “The blunt force of infectious disease was, by far, the overwhelming determinant of a mortality regime that weighed heavily on Roman demography”
Question: Even during the Republic and successful periods of the empire, disease burden was very high in cities.
60%-90% normal distribution
The wide spread and lack of inclusion of 100% in the confidence interval stem from the lack of precision in the question. What distinguishes “high” from “very high”, and are we counting diseases of malnutrition or just infectious ones? I expected to knock this one out in two minutes, but ended up feeling the current estimates of disease mortality lack the necessary precision to answer it.
Original Claim: “The main source of population growth in the Roman Empire was not a decline in mortality but, rather, elevated levels of fertility”
Question: When Imperial Rome’s population was growing, it was due to a decline in death rates, rather than elevated fertility.
80-100%, c – log distribution
“Elizabeth, that rephrase doesn’t look much like that original claim” you might be saying quietly to yourself. You are correct- I misread the claim in the book, at least twice, and didn’t catch it until this write-up. This isn’t as bad as it seems. The claims are not quite opposite, because my rephrase was trying to explain variation in growth within Rome, and the book was trying to explain absolute levels, or possibly the difference relative to today.
Back when he was doing biology, Richard Dawkins had a great answer to the common question “how much is X due to genetics, as opposed to environment?”. He said asking that is like asking how much of a rectangle’s area is due to its length, as opposed to its width. It’s a nonsensical question. But you assign proportionate responsibility for the change in area between two rectangles.
Fate‘s original claim was much like asking how much of a trait is due to genetics. This is bad and it should feel bad, but it’s a very common mistake, and I give Fate a lot of credit for providing the underlying facts such that I could translate it into the “what causes differences between things” question without even noticing.
Since weak framing wasn’t a systemic problem in the book and it presented the underlying facts well enough for me to form my own, correct, model, I’m not docking Fate very harshly on this one.
Original Claim: “The size of Roman merchant ships was not exceeded until the 15th century, and the grain ships were not surpassed until the 19th.”
Question: “The size of Roman merchant ships was not exceeded until the 15th century, and the grain ships were not surpassed until the 19th.”
An inaccuracy in when ships that exceeded the size of Roman trade ships were built, and/or forgetting China was a thing. The inaccuracy does not invalidate the author’s point, which is that the Romans had better shipping technology than the cultures that followed them.
Bad but extremely common framing for the relative effects of disease mortality vs. birth rates.
These is well within tolerances for things a book might get wrong. I’m happy I read this book, and would read another by the same author (with perhaps more care when it refers to happenings outside of Europe), but they are not jumping to the of my list.
Is The Fate of Rome correct in its thesis that Rome was brought down by climate change and disease? I don’t know. It certainly seems plausible, but is clearly advocating for a position rather than trying to present all the relevant facts. There are obvious political implications to Fate even if it doesn’t spell them out, so I would want to read at least one of the 80 million other books on the Fall of Rome before I developed an opinion. I’m told some people think it had to do with something military, which Fate barely deigns to mention. In the future I hope to be a good enough prediction-maker to put a range on this anyways, however wide it must be, but for now I’m succumbing to the siren song of “but you could just get more data”.
PS. This book is the first step of an ongoing experiment with epistemic spot checks and prediction markets. If you would like to participate in or support these experiments, please e-mail me at elizabeth-at-this-domain-name. The next round is planned to start Saturday August 24th.
Tl;dr: would you like to exchange money for an extra-rigorous epistemic spot check or automating me out of a hobby? I have an opportunity for you.
Most of you reading this probably know the epistemic spot check series on this blog, in which I somewhat-arbitrarily check claims early in a book to calibrate my trust level in said book.
I’ve been approached by a pre-public prediction market org to see if we can scale ESCs using a forecasting tournament. As conceived of right now, I would extract claims from a book and put them in the tournament, where anyone could bet on how I would eventually rule on the claim. I then check a subset of the claims (the others resolve as “ambiguous”) and money is distributed to the winners. In order to standardize things, this will be done with more rigor and consistency than is usually seen in epistemic spot checks.
We currently have prize money to distribute to the winners, but not to cover my time. We’re looking for $1,000-$2,000 depending on the book and any particular requests you have. If you’re feeling generous, more prize money would not hurt either.
If you’re at all interested, e-mail me at elizabeth – at – this- domain and we can chat.
Epistemic spot checks typically consist of references from a book, selected by my interest level, checked against either the book’s source or my own research. This one is a little different that I’m focusing on a single paragraph in a single paper. Specifically as part of a larger review I read Ericsson, Krampe, and Tesch-Römer’s 1993 paper, The Role of Deliberate Practice in the Acquisition of Expert Performance (PDF), in an attempt to gain information about how long human beings can productivity do thought work over a time period.
This paper is important because if you ask people how much thought work can be done in a day, if they have an answer and a citation at all, it will be “4 hours a day” and “Cal Newport’s Deep Work“. The Ericsson paper is in turn Newport’s source. So to the extent people’s beliefs are based on anything, they’re based on this paper.
In fact I’m not even reviewing the whole paper, just this one relevant paragraph:
When individuals, especially children, start practicing in a given domain, the amount of practice is an hour or less per day (Bloom, 1985b). Similarly, laboratory studies of extended practice limit practice to about 1 hr for 3-5 days a week (e.g., Chase & Ericsson, 1982; Schneider & Shiffrin, 1977; Seibel, 1963). A number of training studies in real life have compared the efficiency of practice durations ranging from 1 -8 hr per day. These studies show essentially no benefit from durations exceeding 4 hr per day and reduced benefits from practice exceeding 2 hr (Welford, 1968; Woodworth & Schlosberg, 1954). Many studies of the acquisition of typing skill (Baddeley & Longman, 1978; Dvorak et al.. 1936) and other perceptual motor skills (Henshaw & Holman, 1930) indicate that the effective duration of deliberate practice may be closer to 1 hr per day. Pirolli and J. R. Anderson (1985) found no increased learning from doubling the number of training trials per session in their extended training study. The findings of these studies can be generalized to situations in which training is extended over long periods of time such as weeks, months, and years
Let’s go through each sentence in order. I’ve used each quote as a section header, with the citations underneath it in bold.
“When individuals, especially children, start practicing in a given domain, the amount of practice is an hour or less per day”
Generalizations about talent development, Bloom (1985)
“Typically the initial lessons were given in swimming and piano for about an hour each week, while the mathematics was taught about four hours each week…In addition some learning tasks (or homework) were assigned to be practiced and perfected before the next lesson.” (p513)
“…[D]uring the week the [piano] teacher expected the child to practice about an hour a day.” with descriptions of practice but no quantification given for swimming and math (p515).
The quote seems to me to be a simplification. “Expected an hour a day” is not the same as “did practice an hour or less per day.”
“…laboratory studies of extended practice limit practice to about 1 hr for 3-5 days a week”
This too is a book with no page number, but it was available online (thanks, archive.org) and I made an educated guess that the relevant chapter was “Economy in Learning and Performance”. Most of this chapter focused on recitation, which I don’t consider sufficiently relevant.
p800: “Almost any book on applied psychology will tell you that the hourly work output is higher in an eight-hour day than a ten-hour day.”(no source)
Offers this graph as demonstration that only monotonous work has diminishing returns.
p812: An interesting army study showing that students given telegraphy training for 4 hours/day (and spending 4 on other topics) learned as much as students studying 7 hours/day. This one seems genuinely relevant, although not enough to tell us where peak performance lies, just that four hours are better than seven. Additionally, the students weren’t loafing around for the excess three hours: they were learning other things. So this is about how long you can study a particular subject, not total learning capacity in a day.
Many studies of the acquisition of typing skill (Baddeley & Longman, 1978; Dvorak et al.. 1936) and other perceptual motor skills (Henshaw & Holman, 1930) indicate that the effective duration of deliberate practice may be closer to 1 hr per day
“Four groups of postmen were trained to type alpha-numeric code material using a conventional typewriter keyboard. Training was based on sessions lasting for one or two hours occurring once or twice per day. Learning was most efficient in the group given one session of one hour per day, and least efficient in the group trained for two 2-hour sessions. Retention was tested after one, three or nine months, and indicated a loss in speed of about 30%. Again the group trained for two daily sessions of two hours performed most poorly.It is suggested that where operationally feasible, keyboard training should be distributed over time rather than massed”
“We found that fact retrieval speeds up as a power function of days of practice but that the number of daily repetitions beyond four produced little or no impact on reaction time”
Many of the studies were criminally small, and typically focused on singular, monotonous tasks like responding to patterns of light or memorizing digits. The precision of these studies is greatly exaggerated. There’s no reason to believe Ericsson, Krampe, and Tesch-Römer’s conclusion that the correct number of hours for deliberate practice is 3.5, much less the commonly repeated factoid that humans can do good work for 4 hours/day.
Epistemic Spot Checks is a series in which I fact check claims a book makes, to determine its trustworthiness. It is not a book review or a check on every claim the book makes, merely a spot check of what I find particularly interesting or important (or already know).
Today’s subject is The Dorito Effect, which claims that Americans are getting fat because food is simultaneously getting blander and less nutritious, and then more intensely flavored through artificial means. This is leaving people fat and yet malnourished.
Claim: Humans did not get fatter over the last 100 years due to changes in genetics. True. People are fatter than their ancestors, indicating it’s not a change in genetics (although genetics still plays a role in an individual’s weight).
Claim: Casimir Funk discovered that an extract of brown rice could cure beriberi in chickens. True.
Claim: In 1932, the average farm produced 63 sacks of potatoes/acre. By the mid 1960s, it was 200 sacks/acre. True.
Claim: Everything is getting blander and more seasoned. More seasoned. Blander food.
Note that both sources were provided by the book itself.
Claim: “We eat for one reason: because we love the way food tastes. Flavor is the original craving”. This doesn’t jive with my personal experience. I definitely crave nutrients and am satisfied by them even without tasting them.
Claim: “In 1946 and 1947, regional Chicken Of Tomorrow contests were held.” True.
Claim: Over time the Chicken Of Tomorrow winners consistently weighed more, with less feed and less time to maturity. True.
Claim: Produce is getting less nutritious over time. True(source provided by author).
Extremely trustworthy, and therefore worrisome, given the implication that food is becoming inexorably worse. Dorito Effect is unfortunately light on solutions, so you might just freak yourself out to no purpose. On the other hand, if you’re looking for a kick to start eating better, this could easily be it.
As a reminder, epistemic spot checks are checking a book’s early claims for truth/scientific validity/coherent modeling, to determine whether it’s worth continuing. After a few books I concluded that scientific backing didn’t seem that predictive of a book’s helpfulness, and started focusing on modeling. But that wasn’t predictive either.
I never officially decided to quit this project, but I can no longer get excited about checking out a new book, because nothing short of trying it seems to have any predictive ability of whether or not it is helpful. This leads me to believe that most of the effects are placebo effect, not in the sense of “imagined” as people usually use the word, but in the sense that it’s your own brain doing most of the work, and people just have to try things until something clicks for them, starting with the cheapest. I find this answer deeply unsatisfying, but what are you gonna do?
I read part of the book The Polyvagal Theoryand went to a two day seminar by the author, Stephen Porges. I went because I thought there was a strong possibility EFT worked by affected the vagal nerve, and thought maybe polyvagal theory could explain how. I ended up pretty disappointed.
Once I was at the seminar I was very interested in a protocol Porges developed called Safe and Sound, which purports to cure a number of things including many symptoms of autism, plus misophonia (which I have), by playing songs with certain frequencies filtered. Porges showed very impressive videos of autistic children going from non-functional to neurotypical-passing. He bragged about a 50% improvement rate. He played a sound sample and even on hotel sound system speakers, it had a very definite affect on me, relaxing many muscles. So of course I ordered it.
In a failure of order of operations I didn’t look up the results until after I’d ordered it (I really wanted my misophonia fixed, plus the demo had been so impressive). The paper tries very hard to hide this, but what actually happened was not an average 50% improvement in some patient metric, but that 50% of patients showed any improvement. Given that autism is a high variance disease and children are often receiving multiple interventions, this basically means “didn’t make anything worse, probably”.
But I’d already ordered the thing, so I decided to try it. This was kind of an ordeal, btw. Safe and Sound is available only through “trained professionals”, even though the protocol consists in its entirety of listening to some songs on an MP3 player. And I checked, there’s nothing magic about the MP3 player or headphones they send you, you could do it with any reasonably good pair you had lying around. Based on this, I have to assume the 3-digit price tag and gatekeeping are entirely about prestige, because they’re certainly not about helping people or making money (I’m sure he could make more selling the CDs without the gatekeeping).
The protocol did have an effect, in that it consistently made me very sad. It didn’t have any effect on my misophonia, even though I tried it twice. The occupational therapist tried to insist it had worked because I was blunter and more confident in my last conversation with her, but no, sweety, that was because I was more sure your system was bullshit. Then she recommended I give them more money to do other protocols, which I inexplicably declined.
I am fighting the urge to get into the science of polyvagal theory, because it is really really interesting and has a lot of explanatory power. I put off writing this for five months because I wanted to do a more scientific review. But the empirical results are not just bad, they’re bad while proponents are claiming they are good. I can’t trust someone who does that.
For bonus points, when I asked some pointed questions during the seminar, Porges blew me off. So I’m not going to give polyvagal theory any more brain space, even though it would be so cool if it was true.
The scientific claims would be far less supported than the author implies. The best case scenario was “as terrible as your average therapy research.”
The book’s prescriptions work for me anyway, in the sense that they make me calmer and happier and enable me to take better actions.
This book is about EFT, which stands for emotional freedom technique. I write that in a very small font in the hopes you won’t notice how stupid it sounds. EFT is also known as tapping, because the primary action is tapping your fingers against your face.
I originally learned about EFT in a book that went full blown magic about it: you tap your fingers on your face, it changes energy currents in your body, and the universe magically gives you what you want. There’s no point evaluating the science in books like that; they are what they are. The Tapping Solution markets itself as the more studious cousin of that book. It keeps the energy channels but backs off the magic gifts claim, offering the much more defensible explanation that tapping changes something in you that lets you create better outcomes.
The basic idea of EFT is you tap out a pattern on your body, mostly your face, while repeating a statement about something with a lot of negative emotional affect for you, especially ones that activate the sympathetic nervous system (fight/flight/freeze). Repeat until you feel better.
[There’s a lot of different techniques claiming to be The Best EFT Script and, while I suspect there are individual variations in what works best for each person, I can’t possibly care about the intra-EFT wars. Any script you use should just be a starting point for making your own anyway.]
Why would tapping improve your mood? I have some guesses:
It makes anxiety et al. boring. There are a lot of activities where people deliberately activate their SNS (sky diving, horror movies, drugs), so there must be something fun rewarding about being activated. Plus, lots of the things that happen to you in response to anxiety are quite pleasant. People cuddle you and bring you ice cream. You put off doing the stressful thing. I don’t think many people deliberately push themselves into hysterics for the attention, but I do think these benefits bias how people handle their stress. Tapping does not offer those kinds of rewards; after two or three rounds of tapping, you are bored. There are times I have gone and done the stressful thing because I would rather deal with it than have to do another round of tapping. It’s nice to have my intolerance for boredom harnessed for good.
I suspect this is some of how cognitive behavioral therapy works as well. Having taught myself both, EFT is less work and yet harder to develop an immunity too, although hybrid systems do better still.
A sense of control lowers stress. Having A Thing You Can Do While Stressed that you think lowers your stress level is already lowering your stress level. You can dismiss this as a self-fulfilling prophecy, but that’s only the point if you’re actually evaluating the concept of energy meridians. If what you want is to calm down so you can respond to comments on your code review, it doesn’t matter if it’s a placebo.
Something something vagus nerve. The vagus nerve is this weird nerve that skips the spinal cord and runs all over your body, including most major organs and a lot of your face.
Its tasks include:
Parasympathetic (relaxing) stimulation of all major organs except the adrenal glands.
Parasympathetic stimulation of muscles around the mouth and larynx.
Possibly reduces systemic inflammation
Sympathetic (fight/flight/freeze) stimulation of blood vessels.
A bunch of sensory stuff around the face.
Activity on your face is already known to affect your body via the vagus nerve.
Cold water on the face slows down your heart, and this is attributed mainly to the vagus nerve.
Direct electrical stimulation of the nerve is touted as a cure for all kinds of stuff. My sense is the science on that is… optimistic, but there is a reason it is being done to the vagus nerve and not something else.
There’s an alternate EFT script that involves tapping only on the hands. I have fond this to be a calming distraction at best. Hands are also pretty innervated, so this points to the effects being due to something specifically in the face, as opposed to sensitivity in general.
So I don’t know what’s going on, but I suspect the effect of tapping is mediated via the vagus nerve.
It’s a framework for breaking your problem into bite sized chunks, which is the ideal size for problems to be. EFT practices vary in how much you work off a verbal script you’re given, vs introspect on your own issues and tap on what comes up. I predict script-style work to be at best competitive with relaxation exercises, and only introspective EFT leads to actual improvements.
Who knows, maybe energetic meridians are a real thing, or at least a workable metaphor for a real thing. Lots of things sound stupid until you know how they work.
In particular, if you mixed up the explanations for EFT and the much more legitimate EMDR (deliberate eye movements rewiring your brain), I’m not convinced anyone could tell which one was the Officially Sanctioned Therapy and which was the crackpot treatment.
How I evaluated this book: usually when doing these checks I evaluate any statement I find interesting. In this case, I’m sticking to the ones for which the author explicitly claims scientific backing. For stuff that is essentially running on placebos and metaphors, I find a calm, confident, made up explanation is better than a hedged, hesitant, literally true one, so I’m not going to investigate the obviously exaggerated claims. But if you’re going to claim scientific validity, I am going to check.
Claim: “The amygdala is the source of emotions and long term memories, and it’s where negative experiences are encoded (p4)”.
True. Simplified, but obviously trying to explain how the amygdala was relevant to a particular concept, not give a comprehensive overview of our friend the amygdala. The amygdala is in fact so good at emotional memory that it can be invoked by visual cues even in people blinded by brain damage. This confused me at first, so let me note that the amygdala is not involved in fight/flight/freeze, but the longer, cortisol-driven chronic kind of stress.
Claim: Stimulating acupoints calms down the amygdala, and this is observable in fMRI and PET machines (p5).
Misleading, either bad faith or credulous. Bothstudies cited were done with acupuncture, not acupressure or tapping. I consider that relevant evidence for EFT, but dislike that he tried to make it even stronger evidence by hiding that both studies involved needles. The effectiveness of acupuncture appears to have large if weak support; I very quickly pulled up many more studies demonstrating the exact same thing, all of which were tiny (the largest was 18), and used fMRIs, which are suspect.
In general, studies of acupuncture have shown that it kind of works, but Official Legitimate Chinese Medicine Points don’t do any better than a random spot, so this adds more legitimacy to randomly stabbing yourself than it does to meridian points.
Claim: Other studies show that pressure works just as well for stabbing, maybe even better for anxiety (p5).
Seems legit. I didn’t find any citation for this but I’m willing to spot him that touching works better than stabbing for anxiety.
Claim: A study demonstrated that EFT reduces cortisol levels in the saliva (p5).
True, evidence weak but better than I guessed. The study cited is real, and with some effort I even found a full PDF. EFT did better than both a support group and no treatment on both a symptoms assessment and cortisol levels (24% decrease vs 14%). The differences in symptoms between EFT and the other groups are small, and some were not statistically significant. OTOH, every one of them goes in the same direction. I find this pretty compelling, assuming they published every trait they recorded. As usual, small study, vulnerable to p-hacking, etc.
Claim: This John Hopkins approved doctor agrees with us (p7).
Misleading, possibly very. The named person (David Friedman) does exist, but he’s a doctor of psychology, not psychiatry. The level that JHU approves of him is unclear. On his CV (PDF) he lists himself as “research associate”, “instructor”, and “faculty.” None of these words are “professor”, which makes me think he was an adjunct and certainly didn’t have tenure.
Claim: Competing systems telling you to never think about the negative are idiotic. True things are true (p8). In particular The Secret is bullshit.
Seems legit. “Make bad things approachable” just seems like a better tactic than wishing really hard. I also enjoy watching different alt modalities fight with each other.
Claim: Meridians have been scientifically validated, they’re called Bonghan channels (p10).
False. The official name of Bonghan channels is the primo-vascular system, and there’s minimal evidence it exists. Given that it’s pretty hard to prove that there’s a link between them and meridians in any scientific sense. But it’s established fact within the meridian community, so it’s at least well sourced bullshit.
A few more notes on The Tapping Solution.
As expected, Tapping Solution has failed the RCT test. What about the model test?
Well, it’s a fairly vague model, and energy meridians can be used to power anything. On the other hand it avoids my biggest complaint about heal-yourself-with-the-placebo-effect books, and also religion, certain parts of medicine, and psychology, which is that the solution to failure is often do the same thing harder. Tapping by and large avoids that trap. For actual physical problems you’re encouraged to see a doctor first, then tap, and if that doesn’t work see a doctor again. If a particular tap isn’t working you’re given alternate prompts to try. Additionally, tapping claims that often it will work so well you’ll forget you will ever upset about something, and the solution is not to hand over money to the nice man to keep the good vibes flowing, it’s to keep track of how upset you are at the beginning of the session. That level of empiricism shouldn’t make a book stand out, but it does. Tapping Solution, although not every book on EFT, is also pretty clear that you’re not imposing your will on the universe, you’re calming down so you can take better actions.
I don’t want to write out instructions for tapping because I believe the process of reading a book adds a lot of value over a quick run through (the same way doing yoga is better for you than waving a magic wand and becoming more flexible). But to help you decide if even starting the book is worth your time, here are some genres of problems I think tapping is most appropriate for:
Somaticizations, especially back pain.
Emotions you find too overwhelming to deal with, especially anxiety.
Legit life problems that are just too big to deal with all at once and need to be broken into bite size pieces.
Simplicity: very low. “Magical energy currents” sounds simple in that you can explain it quickly, but it takes a very long time to explain what things it can’t do and why.
Explanation quality: poor. Merdians can power anything.
Explicit predictions: okay. You have to make your own explicit predictions, but the book very much encourages you to do so.
Acknowledging limitations: mixed.
Relative to other heal-yourself-with-the-placebo-effect systems, The Tapping Solution is modest in its claims about what your mind can do. It goes out of its way to establish that the mind-body connection is in fact a connection, it doesn’t mean your body is a hallucination you can will into whatever form you want.
And then on the next page there’s a story of how a woman cured her lung cancer with EFT. So it’s not amazing on this axis.
Measurability: extremely good. This is where EFT really shines. They claim it’s such a good technique you will forget you ever had a problem, and encourage you to keep track so you won’t forget.
I’m deliberately not giving a lot of details on how to do it yourself, because I think there might be value to going through the book beyond the technique.
I taught this technique to five people, one of whom had a good response to it. Counting myself, that’s 1/3 successes, which is not great. But it’s cheap enough and has high enough potential I still recommend trying it.
Full Catastrophe Living is a little weird, because between the first edition and the second a lot of science came out testing the thesis. For this blog post, I’m reviewing the new, scienced-up edition of FCL. However I have ordered the older edition of the book (thanks, Patreon supporters and half.com) and have dreams of reviewing that separately, with an eye towards identifying what could have predicted the experimental outcome. E.g. if the experimental outcome is positive, was there something special about the model that we could recognize in other self-help books before rigorous science comes in?
I originally planned on fact checking two chapters, the scientific introduction and one of the explanatory chapters. Doing the intro was exhausting and demonstrated a consistent pattern of “basically correct, from a small sample size, finding exaggerated”, so I skipped the second chapter of fact checking. I also skipped the latter two thirds of the book.
You’ve probably heard about mindfulness, but just in case: mindfulness is a meditation practice that involves being present and not holding on to thoughts, originally created within Buddhism. Mindfulness Based Stress Reduction is a specific class created by the author of this book, Jon Kabat-Zinn. The class has since spread across the country; he cites 720 programs in the introduction. Full Catastrophe Living contains both a playbook for teaching the class to yourself, the science of why it works (I’m guessing this is new?), a section on stress, and followup information on how to integrate meditation into your life.
Claim: Humans are happier when they focus on what they are doing than when they let their mind wander, which is 50% of the time.
Accurately cited, large effect size, possible confounding effects. (PDF). The slope of the regression between mind wandering and mind not-wandering was 8.79 out of a 100 point scale, and the difference between unpleasant mind wandering and any mind not-wandering task was ~30 points. Pleasant mind wandering was exactly as pleasant as focusing on the task at hand. Focusing accounting for 17.7% of the between-person variation in happiness, compared to 3.2% from choice of task.
People’s minds are more likely to wander when they’re doing something unpleasant, and when they are having trouble coping with that unpleasantness. The study could be identifying a symptom rather than a cause.
The study population was extremely unrepresentative, consisting of people who chose to download an iPhone app.
Claim: Loss of telomeres is associated with stress and aging; meditation lengthens telomeres by reducing stress (location 404).
Research slightly more theoretical than is represented, but theoretical case is strong. (Source).First, let’s talk about telomeres. Telomeres are caps on the ends of all of your chromosomes. Because of the way DNA is copied, they will shorten a bit on every division. There’s a special enzyme to re-lengthen them (telomerase), but leading thought right now is that stress inhibits it. Short telomeres are associated with the diseases of aging (heart issues, type two diabetes) independent of chronological age. This is hard to study because telomere length is a function of your entire life, not the last week, but is pretty established science at this point.
Mindfulness reduces stress, so it’s not implausible that it could lengthen telomeres and thus reduce aging. The authors also present some evidence that negative mood reduces the activity of telomerase. This is a very strong theoretical case, but is not quite proven.
Claim: Happiness research Dan Gilbert claims meditation is one of the keys to happiness, up there with sleep and exercise (location 461).
Confirmed that Gilbert is a happiness researcher and said the quote cited, although I can’t find where he personally researched this.
Claim: “Researchers at Massachusetts General Hospital and Harvard University have shown, using fMRI brain scanning technology, that eight weeks of MBSR training leads to thickening of a number of different regions of the brain associated with learning and memory, emotion regulation, the sense of self, and perspective taking. They also found that the amygdala, a region deep in the brain that is responsible for appraising and reacting to perceived threats, was thinner after MBSR, and that the degree of thinning was related to the degree of improvement on a perceived stress scale.” (location 502)
Accurate citation, but: small sample size (16/26), and for the first study the effect size was quite small (1%) for regions of a priori interest, and the second had quite wide error bands (source 1) (source 2). However the book does refer to these findings as preliminary.
Claim: “They also show that functions vital to our well-being and quality of life, such as perspective taking, attention regulation, learning and memory, emotion regulation, and threat appraisal, can be positively influenced by training in MBSR.” (location 508).
Misleading. These are really broad claims and no specific study is cited. However, source 2 above has the following quote: “The results suggest that participation in MBSR is associated with changes in gray matter concentration in brain regions involved in learning and memory processes, emotion regulation, self-referential processing, and perspective taking.” This is a very carefully phrased statement indicating that mindfulness is in the right ballpark for affecting these things, but is not the same as demonstrating actual change.
Claim: “Researchers at the University of Toronto, also using fMRI, found that people who had completed an MBSR program showed increases in neuronal activity in a brain network associated with embodied present-moment experience, and decreases in another brain network associated with the self as experienced across time. […] This study also showed that MBSR could unlink these two forms of self-referencing, which usually function in tandem.” (location 508).
Accurate citation, small sample size (36) that they made particularly hard to find (source). I can’t decipher the true size of the effect.
Claim: Relative to another health class, MSBR participants had smaller blisters in response to a lab procedure, indicating lower inflammation (location 529).
True, but only because the other class *raised* inflammation (source). Also leaves out the fact that both groups had the same cortisol levels and self-reported stress. So this looks less like MBSR helped, and more like the control program was actively counterproductive.
For the record, this is where I got frustrated.
Claim: “people who were meditating while receiving ultraviolet light therapy for their psoriasis healed at four times the rate of those receiving the light treatment by itself without meditating.” (location 534)
Accurate citation (of his own work), small sample size (pdf).
Claim: “we found that the electrical activity in certain areas of the brain known to be involved in the expression of emotions (within the prefrontal cerebral cortex) shifted in the MBSR participants in a direction (right-sided to left-sided) that suggested that the meditators were handling emotions such as anxiety and frustration more effectively. […]
This study also found that when the people in the study in both groups were given a flu vaccine at the end of the eight weeks of training, the MBSR group mounted a significantly stronger antibody response in their immune system”
Accurate citation (of his own work), slightly misleading, small sample size. Once again, he’s strongly implying a behavioral effect when the only evidence is that MSBR touches an area of the brain. On the other hand, the original paper gets into why they make that assumption, so either it’s correct or we just learned something cool about the brain.
Claim: MSBR reduced loneliness and a particular inflammatory protein among the elderly (location 551).
Not statistically significant. (source) More specifically; the loneliness finding was significant but uninteresting, since the treatment was “8 weeks with a regular social activity” and the control was “not.” The inflammation finding had p = .075. There’s nothing magic about p < .05 and I don’t want to worship it, but it’s not a strong result.
I also researched MBSR in general, and found it to have a surprisingly large effect on depression and anxiety.
To the extent Full Catastrophe Living has a model, it’s been integrated so fully into the cultural zeitgeist that I have a hard time articulating it. It could be summarized as “do these practices and some amount of good things from this list will happen to you.” Which kills my hypothesis that having a good model is necessary to getting good results.
You Might Like This Book If…
I don’t know. I found it a slog and only read the first third, but the empirical evidence is very much on mindfulness’s side and I don’t know what better thing to suggest.
Thanks to the internet for making it possible for me to do these kinds of investigations.