Quick look: cognitive damage from well-administered anesthesia 

Recently a client commissioned me to look at the potential cognitive impacts of general anesthesia. I was surprised to find out that it’s not obvious general anesthesia does more damage than spinal or local anesthesia, and my guess is most but not all of the damage is done by the illness or surgery themselves. 

Caveats and difficulties

I’m not a doctor. The following represents something like 5 hours of work, which obviously is not enough time to process even a fraction of the literature. I was focused on the dangers of median uses of anesthesia, where nothing goes obviously wrong and the anesthesiologist considers it a success; I didn’t even attempt to look at the rate of accidents, which can be pretty severe. My friend’s dad’s life was ruined by a fungal contaminant in a spinal injection. And of course, people die from excess general anesthesia. But for this post I only looked at damage done by routine anesthetic usage.

Like all client research, this was tailored to a particular person’s needs and budget, and shouldn’t be considered a general-purpose survey.

It’s pretty hard to tease out the difference between damage done by anesthesia, damage done by whatever necessitated the surgery, and damage done by having your body ripped open and bits moved around. Bodies hate that sort of thing. The few RCTs that exist by necessity focus on a narrow range of minimally invasive surgeries for which there exists a choice in type of anesthesia, and animal studies tended to focus on developing animals rather than adults. Even for procedures where multiple types are possible, patients tend to be pretty opinionated about what they want; one paper even announced they’d given up on reaching their sample size goal because recruiting was too hard.

Studies also often focused on cognition within a few hours of surgery (when people are still at the hospital to test). I think that’s less likely to be “damage” and more likely to be “it’s still wearing off” or “I’m sorry, I just had minor surgery and you want me to take an IQ test?”. This made me throw out a lot of studies.

Few if any of the papers attempted to control for post-operative condition or pain med usage, which seems like an enormous oversight to me.

Tl;dr

My overall take home is that:

  • Little or nothing that necessitates surgery is good for cognition and that needs to be factored into assessments.
  • Surgery itself is enormously stressful, physically and emotionally, and that stress impairs cognition, sometimes in lasting ways. This includes procedures that are not cutting new holes in you, like kidney stone treatments, although presumably it’s worse for open heart surgery.
  • Probably there are additional effects from anesthesia. At least general and spinal, maybe including local. On priors I still believe general and spinal are worse on a purely physical level.
  • Probably a lot of whatever damage there is heals in most people, although people who need surgery are already under heavy load and will be the worst at healing. 
  • There may be treatments that can prevent damage but they’re still in rodent trials right now.
  • I also believe that being awake and aware during surgery can be emotionally traumatic, and trauma is also bad for cognition, so include that in your math. 
    • But I’m not trustworthy on this, seeing as I was terrorized by a series of dentists and now can’t get myself through simple teeth cleaning without some sort of bribe, a human to guard me from the bad dentist monster, and a sedative. 

Papers

I didn’t rigorously track correlational studies, but my sense was they tended to show faster recovery from local and spinal anesthetic, relative to general, presumably because milder cases get milder anesthesia even when the procedures use the same billing code. Additionally a lot of studies were given too soon after surgery, which I don’t expect to predict long term damage

In the few studies that randomly assigned patients to spinal, local, or general anesthesia, and surveyed at least 7 days out, it’s really hard to pick a winner. 

Incidence of postoperative cognitive dysfunction after general or spinal anaesthesia for extracorporeal shock wave lithotripsy tries really hard to claim that spinal and general anesthetic are equally damaging to cognition, despite finding a 3x higher rate of cognitive issues after general anesthestic. I showed this paper to my statistician father and he gave a rant I wish I had recorded because it would make me famous in the right corner of Twitter. Hell hath no fury like a statistician forced to read a medical paper. He agreed with me that 19.6% (the rate of complications in the spinal group) was much larger than 6.8% (rate of complications in the general group), but dismissed that as merely a felony next to the war crimes against statistics they committed by using the wrong test for statistical significance.

Two meta-analyses both find a small difference in favor of spinal over general, with confidence intervals that overlap no-difference. One found spinal to be ~5% better (26 studies), the other 50% (but only 5 studies, so still overlapping with 0). The latter analysis is tiny in part because it is restricted to tests within a week of surgery. The analysis that looked also failed to find improvements from using local anesthetic.

On the other hand, animal studies of anesthesia without surgery regularly show impairment, although they can’t agree if post-anesthesia animals start off worse but catch up, or start off the same but fall behind. Also they found other medications could mediate the effects. I summarize the animal results in this spreadsheet. These are effect sizes we would clearly notice in humans so I assume they’re using much more anesthetic (although they claim it’s proportionate) or the animals, primarily rodents, are much more sensitive. Also the studies tended to be within days of anesthesia application, removing a chance to heal.

Tangent: UpToDate.com

The original commission was to investigate kidney stone treatments, and what I can say there is that the general medical site UpToDate is pretty good. Every claim I investigated checked out and I didn’t find anything at all established that they didn’t.

Thank you to Claire Zabel for commissioning the research and encouraging me to share the findings, and to my Patreon patrons for supporting the public write-up.

Follow up to medical miracle

The response to my medical miracle post has been really gratifying. Before I published I was quite afraid to talk about my emotional response to the problem, and worried that people would strong arm in the comments. The former didn’t happen and the latter did but was overwhelmed by the number of people writing to share their stories, or how the post helped them, or just to tell me I was a good writer. Some of my friends hadn’t heard about the magic pills or realized what a big deal it was, so I got some very nice messages about how happy they were for me.

However, it also became clear I missed a few things in the original post.

Conditions to make luck-based medicine work

In trying to convey the concept of luck-based medicine at all, I lost sight of traits I have that made my slot machine pulls relatively safe. Here is a non-exhaustive list of traits I’ve since recognized are prerequisites for luck based medicine:

  • I can reliably identify which things carry noticeable risks and need to be assessed more carefully. I feel like I’m YOLOing supplements, but that’s because it’s a free action to me to avoid combining respiratory depressants, and I know to monitor CYP3A4 enzyme effects. A comment on LessWrong that casually suggested throwing activated charcoal into the toolkit reminded me that not everyone does this as a free action, and the failure modes of not doing so are very bad (activated charcoal is typically given to treat poison consumption. Evidence about its efficacy is surprisingly equivocal, but to the extent it works, it’s not capable of distinguishing poison, nutrients, and medications).
    • This suggests to me that an easy lever might be a guide to obvious failure modes of supplements and medications, to lower the barrier to supplement roulette. I am not likely to have the time to do a thorough job of this myself, but if you would like to collaborate please e-mail me (elizabeth@acesounderglass.com).
  • A functioning liver. A lot of substances that would otherwise be terribly dangerous are rendered harmless by the human liver. It is a marvel. But if your liver is impaired by alcohol abuse or medical issues, this stops being true. And even a healthy liver will get overwhelmed if you pile the load high enough, so you need to incorporate liver capacity into your plans.
  • A sufficiently friendly epistemic environment. If it becomes common and known that everyone will take anything once, the bar for what gets released will become very low. I’m not convinced this can get much worse than it already has, but it is nonetheless the major reason I don’t buy the random health crap facebook advertises to me. The expected value of whatever it is probably is high enough to justify the purchase price, but I don’t want to further corrupt the system. 
  • Ability to weather small bumps. I’m self-employed and have already arranged my work to trade money for flexibility so this is not a big concern for me, but a few days off your game can be a big deal if your life is inflexible enough. Somehow I feel obliged to say this even though I’ve lost work due to side effects exactly once from a supplement (not even one I picked out; a doctor prescribed it) and at least three times from prescription medications.
  • A system for recognizing when things are helping and hurting, and phasing treatments out if they don’t justify the mental load. It’s good to get in the habit of asking what benefits you should see when, and pinning your doctor down on when they will give a medication up as useless.
    • Although again, I’ve had a bigger problem with insidious side effects from doctor-initiated prescription meds than I ever have with self-chosen supplements.
    • Probably there are other things I do without realizing how critical they are, and you should keep that in mind when deciding how to relate to my advice. 

Feel free to add your own conditions in the comments and I’ll add my favorites to this list.

Ketone Esters

Multiple people have asked for details on the ketone esters thing, and I sure hope that’s because I convinced them to try stuff rather than somehow sold ketone esters in particular as good. Answers to the common questions:

  • I use KE4, but I haven’t tried any others. I think when I originally looked it was the only one available without caffeine, but I could be wrong, or that could have changed.
  • When I first started and was doing longer intermittent fasting I’d do 10-15ml at night, 5-10 in the morning, and 5-15 before workouts (all on an empty stomach). I currently only do 5ml, before bed, to smooth out blood sugar issues whle sleeping.
    • The change is partially because I’m recovering from an injury and that does not mix with intermittent fasting, and partially because KE seems to have caused durable changes so there’s less point. I went from 3-4 sodas a day to none a few days after starting KE4 and it’s never reverted. The only caffeine I’ve had is incidentally in chocolate, and after the Bospro I’ve barely even had that.

Minimal Potato Diet

Again I am not recommending this, but if you would like to know what I’m doing:

  1. I use small potatoes- ideally the really tiny ones, but half-a-fist size at most. And I aim for a variety of color potatoes. These are out of a not particularly verified belief that skin has more vitamins than the core and that color means vitamins, or at least antioxidants. I also prefer the way the small ones cook.
  2. I cook the potatoes as soon as I receive them. If that’s not possible they might spend a few days in the fridge. When I let them age enough to get eyes they upset my stomach.
    1. A lot of people on the potato diet had to skin their potatoes to prevent feeling ill. I am curious if that would have been required if they’d used very new potatoes.
  3. I cook the potatoes by throwing them onto a cookie sheet and roasting at 350F for 45 minutes. I do this because it’s really quick and I prefer the dry texture.
    1. I cook 3 pounds a time because that is both the size of the bag they come in and about what my cookie sheet can hold.
    2. I tried gnocchi, but the additional flavor made me get tired of it faster. Also maybe my weight loss slowed around then but the potato weight loss has been weirdly punctuated so I dunno.
    3. I wish I could share a graph of just how weird the weight loss has been – same weight for 1-2 weeks, then 3 pounds in 4 days. Unfortunately, I keep changing my creatine dosage which ruins the aesthetics with a lot of water weight changes.
  4. The cooked potatoes spend at least a day in the fridge before eating, and ideally several. This is out of a slightly verified belief that the post-cooking cold converts some of the starches from digestible to indigestible, which lowers calories while doing something vaguely good for my digestive tract. But since I’m cooking much less often than eating they inevitably log a lot of fridge time anyway.
  5. Originally I ate about 100g/day, mostly in the morning but if I woke up craving something I’d start with that. For a few days now I’ve been experimenting with eating smaller amounts of potato more times per day and that’s maybe driven calorie consumption further down, but far too early to say for certain, and it’s not totally clear that would be desirable.
    1. This is based on my hypothesis that potatoes reduce calorie consumption in me by being a relatively bland food with (small amounts of) lots of different micronutrients, plus some help from the fiber. 
    2. Slime Mold Time Mold thinks it’s potassium and is testing that now. 
  6. I originally described myself as making no other changes. That was 100% true in the beginning, I will admit I now check in with my food diary calorie total and adjust a bit (including upwards, although not sure about the relative frequency). The point of the food diary is micronutrient tracking but it’s hard to avoid reacting to the calorie number once it’s there. I’m not sure that’s actually affecting things much – on days I happen to have a high count I eat much less the next few days without thinking about it. 
  7. My food diary is very clear I am not reliably hitting the RDA for most vitamins. I think you can do it on my calorie count but it would be a lot of effort and planning and I’m on vitamins anyway. Hopefully I get nutrition test results in the next month, although that will be much more a referendum on the Bospro than the potatoes. 
average nutritional intake for the last two weeks

A male friend lost 4 pounds on a 50% potato diet and then plateaued (but that could be from an injury). A female friend tried my minimal potato diet and experienced no change.  I think if that worked reliably we would already know about it.

Bonus

Shout to reader George who connected me with an offline friend who had similar symptoms with the same cure, who has done a ton of research into mechanisms and suggested some follow-ups. They’re not guaranteed to work but this feels like a rich vein to me. Thanks George and offline friend!

Luck-based medicine: My resentful story of becoming a medical miracle

You know those health books with “miracle cure” in the subtitle? The ones that always start with a preface about a particular patient who was completely hopeless until they tried the supplement/meditation technique/healing crystal that the book is based on? These people always start broken and miserable, unable to work or enjoy life, perhaps even suicidal from the sheer hopelessness of getting their body to stop betraying them. They’ve spent decades trying everything and nothing has worked until their friend makes them see the book’s author, who prescribes the same thing they always prescribe, and the patient immediately stands up and starts dancing because their problem is entirely fixed (more conservative books will say it took two sessions). You know how those are completely unbelievable, because anything that worked that well would go mainstream, so basically the book is starting you off with a shit test to make sure you don’t challenge its bullshit later?

Well 5 months ago I became one of those miraculous stories, except worse, because my doctor didn’t even do it on purpose. This finalized some already fermenting changes in how I view medical interventions and research. Namely: sometimes knowledge doesn’t work and then you have to optimize for luck.

I assure you I’m at least as unhappy about this as you are. 

Preface to the Preface

I’ve had nonspecific digestive issues since before I have memories. In pre-school my family joked that I would die as a caveman because there were so few things I would eat, and they were mostly grains. This caused a bunch of subclinical malnutrition issues that took a lot of time to manage and never got completely better. And while I couldn’t articulate this until it went away, food felt gross all the time

It’s hard to convey just how bad this was for me, because it feels like it undermines everything I did to work around it. I’ve always been functional but decidedly less healthy than my friends. I got sick more often and it hit me harder. I was slower to heal from injuries and scrapes and that limited my interest in the more athletic sort of hobbies.  I couldn’t work the same hours, and working hours traded off really sharply against energetic hobbies. I had to spend a lot of time managing food where other people can just show up and eat, which was a constant source of social stress. My genetics say I was destined to have anxiety issues, but the low level malnutrition and justified feelings of food insecurity despite apparent abundance did not help anything.

Eventually in my late 20s. I saw a nutrition-focused psychiatrist who listened to my observations (I could only eat protein with soda), immediately formed a hypothesis (I produced insufficient stomach acid), asked questions to rule it out (which I no longer remember), suggested a test (take stomach acid pills and see if they gave me heartburn), and when it came back positive (no heartburn) suggested a course of action (keep taking stomach acid pills) that showed immediate benefits in practice (indigestion removed, but only when I took the pills). My protein and produce intake increased enormously, and I felt overall much better. 

This is exactly how I want medicine to work. I gathered good data and took it to an expert who immediately formed a model, definitively tested it, and prescribed a course of action that made mechanistic sense.  If you forget that it took almost 30 years and I took those exact same symptoms to other doctors beforehand, it’s a stunning success. 

But it was not a total success. My protein intake maxed out at 50 grams/day, and that was if I made consuming protein a hobby and nothing went wrong. I was doing much better than I had been, but my nutrient tests showed I still had a lot of issues. Eventually the stomach acid pills stopped working, although that seems to be “my stomach started producing more acid and a different problem became the bottleneck”  rather than the pills ceasing to contain acid. But the problem was not solved, and more of the existing treatment did not help.

Standard Preface

I worked with a number of doctors on fixing the remaining digestive, for ~another decade. I had a lot of conversations like the following:

Me (over 20 pages of medical history and 30 minutes of conversation): I can’t digest protein or fiber, when I try it feels like something died inside me. 

Them: Oh that’s no good, you need to eat so much protein and vitamins

Me: Yes! Exactly!. That’s why I made an appointment with you, an expensive doctor I had to drive very far to get to. I’m so excited you see the problem and for the solution you’re definitely about to propose.

Them: What if you took a slab of protein and chewed it and swallowed it. But like a lot of that.

Me: Then I’d feel like something died inside me, and would still fail to absorb the nutrients which is the actual thing we want me to get from food.

Them: I can’t help you if you’re not willing to help yourself.

Or sometimes…

Me (over 20 pages of medical history and 30 minutes of conversation): I can’t digest protein or fiber, when I try it feels like something died inside me. If I make it my top priority I can get maybe 50 grams of protein a day.

Them: Oh that’s no good, you need 70 minimum, and really more like 100. Also because I’m a naturopath I’m morally obligated to tell you to give up eggs, dairy, and wheat.

Me: That’s gonna be hard seeing as those three are 90% of my protein intake and by far the easiest forms of protein to digest.
Them: What if you ate pea protein?

Me: Well that’s harder so…worse.

Them: What about hemp?

Me: That is even harder than pea protein.

Them: If you’re not going to try why are you even here?

These exchanges were incredibly draining for me, so I didn’t have them that often. Every year or two I’d get my hopes up for a new doctor, pay a shitton of money (these doctors are never covered by insurance) for several emotionally draining appointments, and then get told they couldn’t help me and this was a failure on my part.

After several years of that pattern I gave up and went back to my old PCP. She hadn’t solved the problem either, but she had solved other problems, had ideas to try for this one, and believed it was a physical rather than moral problem. Unfortunately she is very busy, and sometimes pawns me off on her assistant doctors, who are idiots. That second conversation was with one of those, although in the real conversation I was less witty, and was more like “*sob* no *sob* I told you *sob* I CAN’T”. 

I refused to see that doctor again, but this left me little leverage when they assigned me a different sub-doctor to handle a post-covid rash back in April. You know how naturopaths complain about western medicine being mechanical and reactive and not taking the time to reach a systemic understanding? Well this guy, who we will call Dr. Spray-n-pray, was determined to fight for equality by taking the same approach with unregulated supplements. He guessed I had an allergic reaction and threw 5 different antihistamines of varying legitimacy at me, with no mention of testing the hypothesis, monitoring my progress, expected changes, duration of treatment… 

And it worked.

Not on the rash; I eventually had to go to urgent care for that. But shortly after I started the pills, I found myself eating 50 grams of protein in a sitting and then going back for more the next meal. I also started chowing down on produce, and at some point realized I couldn’t remember the last time I’d had dessert. I had known I had some aversion issues with food but didn’t realize how gross I found it until the feeling went away and I could just eat without feeling contaminated. About here is when I started a food diary and found I was regularly hiting 100g of protein/day. When I crashed my scooter I ate 350 grams of protein over two days, suggesting I could do that any time I wanted but chose not to, suggesting my body was getting all the protein it felt it needed, all of the time.

I’m not sure I can convey what a big deal this is either. I would have paid several years’ salary for this cure without thinking. It is now possible for me to feel okay at an emotional level it wasn’t before. Plus, you know, I can actually get the nutrients I need to run my body and stuff.  My injuries after that scooter accident healed noticeably faster than past injuries. The fact that I haven’t caught an illness since April’s covid isn’t conclusive, since it’s summer and I haven’t done anything high risk, but it is interesting. 

[I do have covid antibody results from the December (8 months after my vaccine) and August (4 months after catching covid)  and my levels have gone way up, but that’s more likely due to the more recent and stronger immune stimulus.]

But that evidence came later. Back in May the timing of the miracle suggested that one of Dr. Spray-n-pray’s pills was responsible. This was more or less confirmed when I weaned off the various pills and the subtle grossness around food started to return. I could also feel growing sugar cravings. So it was important to figure out what the miracle pill was and get back on it immediately.

[If any of you are thinking “well it could have been a coincidence”: no it fucking couldn’t. I did not carry this around for 35 years and try everything to fix it only to have it suddenly go into remission for no reason. I’ll believe covid fixed it before I believe that.]

I had always assumed the reason doctors turned on me was that it was easier than accepting that they couldn’t solve my problem. But this one had fixed my problem! Not on purpose or anything, but I was fully prepared to pretend it was. Now we just had to figure out what had worked and why, in case it suggested any additional actions. I made a spreadsheet tracking the changes as best I could – when my diet changed (using grocery order data), when I’d started and stopped which pills. Surely my data plus his doctor ego would help us get to the bottom of this.

At the time of my follow-up appointment I had a strong guess which supplement had helped based on timing, but it didn’t make any sense. The active ingredient was Boswelia (specifically BosPro brand (affiliate link). I’m afraid to try another in case it breaks the spell). Boswelia is sometimes described by alt medicine websites as helping digestive issues, but in the same way they describe every supplement as helping digestive issues. “Helps anxiety, allergies, autoimmune disorders, inflammation, and digestion” should just be a stamp. This isn’t even necessarily illegitimate – the body is complicated and lots of things are entangled, especially with inflammation.

But I’ve tried a lot of these supplements at one point or another and there was absolutely no reason to predict this one would be different, even if I had researched it ahead of time. Examine.com is pretty positive on Boswelia but doesn’t list digestion as an issue it solves. Everything is connected to everything else in the body and it was still pretty hard for me to make a causal chain between Boswelia’s alleged mechanisms and improvements in my digestion. So I was extremely excited for Dr. Spray-n-pray to explain why it had worked.

All this was on my mind when I finally got to ask Dr. Spray-n-pray why his treatment had worked. He mumbled something about inflammation and moved on. He had zero interest in my spreadsheet or a more mechanistic understanding of what had changed. I confirmed the miracle was from BosPro when I resumed taking it and the digestive improvements returned (including the creeping feeling of grossness going away). It’s now 5 months since I started taking it and it still works but I have no idea why.

This is not how I want medicine to work, at all. A medic who clearly was not trying for a systemic understanding recommended a lot of stuff and one of them happened to fix a problem as unrelated as could be that I’d spent a decade+ searching for without success? Even knowing definitively that it works we have no idea why, and what would help or hinder it? And there’s ~0 evidence this would help other people with the same condition?

This is bullshit. But bullshit is working where logic feared to tread.

Other Evidence

This experience isn’t what got me on the path of luck-based medicine though. I was already at that point when the supplements were prescribed, which is why I took them instead of doing 5 hours of research and ignoring Dr. Spray-n-pray’s suggestions as the ravings of an idiot. There were a lot of contributors to my shift, but a few stand out.

A few years ago I ran a series of epistemic spot checks on various self-help books, and found that how helpful they were had no correlation with how rigorous or true their theoretical backing was.

Then last year I ran that ketone ester study. I and a handful of people I know get insane gains from using ketone esters – better than Ritalin with none of the side effects – but when I ran an RCT (n=8-12 depending on how you count) no one reported any benefits. 

Or take Slime Mold Time Mold’s all-potatoes-all-the-time diet study (which happened after I started on the magic pills, but is too good an example to pass up). I have an extremely long list of complaints about their hypothesis and follow up study:

  • They failed to contextualize it as a monodiet and discuss the classic monodiet problems.
  • Potatoes aren’t nutritionally complete and don’t have enough protein for people to thrive. They gestured at some of the nutritional deficiencies but I think not hard enough, and believe potatoes have more protein than reported but have not pointed to any evidence to that effect.
  • They tracked weight loss over 28 days but will not be doing a follow-up for six months. Since the default after rapid weight loss caused by an unsustainable diet is immediate regain, this is unconscionable.
  • I haven’t had time to dig into the object-level facts in the argument between SMTM and a persistent critic, but with my monkey social brain it sure does look like SMTM is blowing off well-founded criticism (given in a super aggressive manner).
  • They treat weight loss as an unalloyed good no matter how fast or what the person’s starting weight was.
    • I have not looked into the popular “weight loss not safe above 2 pounds per week” claim and it wouldn’t shock me if it were made up, but if I had an intervention with double that impact I’d spend an hour investigating the claim.
    • Weight loss beyond a certain body fat percentage is bad. You need that stuff.
  • They did warn people about solanine poisoning but I think they should be more concerned about it.
  • Analysis featured a lot of stories along the lines of “Did X on Wednesday and lost 2 pounds on Thursday” and fat loss does not work like that. Two pounds overnight is either water weight or has a lookback period longer than 24 hours.
    • I’m deeply confused about that second part, I don’t understand why or how weight-loss-that-is-definitely-not-changes-in-water-retention comes in chunks. If you have an answer I’m quite curious.

That’s a lot of epistemic sins. OTOH, their potato diet results inspired me to try the minimal potato diet, which consists of eating some potatoes every day (I started with ~100g of baby potatoes), and I’ve lost 15 pounds in 3 months. That level of weight loss with zero sacrifices buys you a lot of epistemic forgiveness, especially when my miraculous dramatic dietary improvements did fuck all to the number on the scale.

[ People already writing their “potatoes can’t possibly be the cause it must be psychosomatic” comments in their head: I see you. Your hypothesis is perfectly reasonable; in your position it would be my first reaction too. But in this particular case you’re going to need to explain why potatoes caused that magic mental shift when giving up soda, a dramatic improvement in diet and removal of dessert entirely, complete emotional reorientation to food, a mild prescription stimulant, and varying levels of exercise did nothing, and ketone esters worked better than all of those but much worse than potatoes. Comments not attempting this will be deleted or mocked as I see fit.]

If you are thinking “ah, but clearly those all did contribute and the potatoes were just the last step”: I agree that’s likely. If I’d started minimal potato diet before BosPro it either wouldn’t have worked or would have been extremely bad for me. But since it seems to work for at least some other people who didn’t have all this baggage I think we need to update in that direction.]

Or take every person who got a second opinion on their cancer and was recommended diametrically opposing treatment plans. Doctors as a class are not as epistemically virtuous as I’d like, but that’s not (always) why they propose wildly divergent treatment plans. In most cases it’s because the answer isn’t obvious, or at best has only been obvious for a few years.

And then there’s the absolute shitshow that is nutrition research. No one knows what the average optimum nutrient level is and even if we did it wouldn’t be that helpful for figuring out the optimum level for a given individual, because humans are so unbelievably variable.

I could go on here, but if you’re reading my blog you’re probably already on board with shit being extremely complicated and I don’t want to belabor the point.

Moral of the story: when intellect fails, try luck guided by intuition

Some medicine is very deterministic. Antibiotics, most of the time. That daylong IV drip when I had norovirus that probably turned the infection from deadly to a kind of annoying 36 hours. We may not know the optimum level of a given nutrient but most severe deficiency diseases can be solved by giving you the thing you’re severely deficient in. My impression is statins work pretty reliably.

But a lot of medicine just seems to be kind of random. People go through 10 antidepressants and then somehow the 11th one works great. Ketone esters increase my energy level so much I gave up soda and caffeine entirely but do nothing for most people. All those books where the cure was a miracle for someone, and it can’t just be a placebo because there’s no reason for the 35th placebo to be the one that works but nothing else makes sense.

All of which leads me to conclude that once you have exhausted the reliable part of medicine without solving your problem, looking for a mechanistic understanding or even empirical validation of potential solutions is a waste of time. The best use of energy is to try shit until you get lucky.

Not at random or anything. My guess is the world contains metis and you do better-than-chance preferentially trying things that helped one guy on a message board for your condition (even though it was shown to make no difference in real studies) or going to alt-modality practitioners (even the one with proactively stupid justifications they insist on sharing). The latter is especially true if you can find a practitioner that accepts that their treatments don’t always work and have a system to notice that and change course, but I think maybe even the really gung-ho ones sometimes have good ideas (you just have to set up your own system for deciding when to quit). Just don’t get hung up on “do we understand why this works?” or “does this work for other people?”

Also please remember that side effects and drug interactions are a thing. Anything with a real effect can hurt you. I gave a very caveated suggestion of BosPro to someone on Twitter and it caused something akin to niacin flush in them. This is the same brand that does nothing to me but makes me better at digestion and uninterested in sugar.

So I guess the full and accurate statement of my beliefs is “Try solving problems with understanding first, but accept when you’ve hit diminishing returns and consider if your energy isn’t better spent increasing your surface area to luck”.

Parting shots

Fuck you every doctor who told me my digestive problems were in my head or my fault for being a bad patient and you couldn’t help me until I solved the problem that drove me to you. You were factually incorrect and you should feel terrible.

For potential clients in particular

People sometimes approach me for medical literature reviews aimed at their specific problem. There are forms of these I will do, but those forms do not include producing a mechanistic model and high-probability treatment for someone’s persistent, sub-clinical, amorphous problem that medicine has failed to solve. There are a few reasons accepting these commisions would be wasting the clients’ money, and one of them is that by the time they come to me they have found all the low hanging deterministic fruit. The best I can do is spend a ton of time generating lists of things that might work. Sometimes I do offer that, but people tend to prefer my other offer of a referral to a researcher that’s better at individualized treatment.

The Balto/Togo theory of scientific development

Tragically I gave up on the Plate Tectonics study before answering my most important question: “Is Alfred Wegener the Balto of plate tectonics?”

Let me back up.

Balto

Balto is a famous sled dog. He got a statue in NYC for leading a team of dogs through a blizzard to deliver antibody serum to Nome, Alaska in 1925, ending a diphtheria outbreak. Later Disney made a movie about how great he was.

Except that run was a relay, and Balto only got famous because he did the last leg, which had the most press coverage but was also the easiest. The real hero was Togo, the dog who led the team through the hardest terrain and covered by far the most miles as well. Disney later made a movie about him that makes no mention of Balto for the first 90%, and then goes out of its way to talk about what a shit dog he was, that’s why he didn’t get included in any of the important teams, but Togo had had to do so many hard things they needed a backup team for the trivial last leg so Balto would have to do.

Togo’s owner died mad about the US mainland believing Balto was a hero. But since all the breeders knew who did the hard part Togo enjoyed a post-Nome level of reproductive success that Ghengis Khan could only dream about, so I feel like he was happy with his choices.

plus he did eventually get some statues

But it’s not like Togo did this alone either. He led one team in a relay, and there were 20 humans and 150 dogs that contributed to the overall run. Plus someone had to invent the serum, manufacture it, and get it to the start of the dog relay at Nenana, Alaska. So exactly how much credit should Togo get here?

The part with Wegener

I was pretty sure Alfred Wegener, popularly credited as the discoverer/inventor of continental drift and mentioned more prominently than any other scientist in discussions of plate tectonics, is a Balto.

First of all, continental drift is not plate tectonics. Continental drift is an idea that maybe some stuff happened one time. Plate tectonics is a paradigm with a mechanism that makes predictions and explains a lot of data no one knew was related until that moment.

Second, Wegener didn’t discover any of the evidence he cited, he wasn’t the first to have the idea, and it’s not even clear he did much of the synthesis of the evidence. His original paper refers to “Concerning South America and Africa, biologists and geologists are in close agreement that a Brazilian–African continent existed in the Mesozoic”

So he didn’t invent the idea, gather the data, or even really synthesize the evidence. His guess at the mechanism was wrong. But despite spending hours digging into the specific discovers and synthesizers that contributed to plate tectonics, the only name I remember is Wegener’s. Classic Balto.

On the other hand, some of the people who gathered the data used to discover plate tectonics were motivated by the concept of continental drift, and by Wegener specifically. That seems like it should count for something. My collaborator Jasen Murray thinks it counts for a lot

Jasen would go so far as to argue that shining a beacon in unknown territory that inspires explorers to look for treasure in the right place makes you the Togo, racing through fractured ice rapids social ridicule and self-doubt to do the real work of getting an idea considered at all. Showing up at the finish line to formalize a theory after there’s enough work to know it’s true is Balto work to him. This makes me profoundly uncomfortable because strongly advocating for something unproven terrifies me, but as counterargument arguments go that’s pretty weak.

One difficulty is it’s hard to distinguish “ahead of their time beacon shining” from “lucky idiot”, and even Jasen admits he doesn’t know enough to claim Wegener in particular is a Togo. But doing work that is harder to credit because it’s less legible is also very Togo-like behavior, so this proves nothing about the category. 

So I guess one of my new research questions is “how important are popularizers?” and I hate it.

Dependency Tree For The Development Of Plate Tectonics

This post is really rough and mostly meant to refer back to when I’ve produced more work on the subject. Proceed at your own risk.

Introduction

As I mentioned a few weeks ago I am working on a project on how scientific paradigms are developed. I generated a long list of questions and picked plate tectonics as my first case study. I immediately lost interest in the original questions and wanted to make a dependency graph/tech tree for the development of the paradigm, and this is just a personal project so I did that instead.

I didn’t reach a stopping point with this graph other than “I felt done and wanted to start on my second case study”. I’m inconsistent about the level of detail or how far back I go. I tried to go back and mark whether data collection was motivated by theory or practical issues but didn’t fill it in for every node, even when it was knowable. Working on a second case study felt more useful than refining this one further so I’m shipping this version. 

“Screw it I’m shipping” is kind of the theme of this blog post, but that’s partially because I’m not sure which things are most valuable. Questions, suggestions, or additions are extremely welcome as they help me narrow in on the important parts. But heads up the answer might be “I don’t remember and don’t think it’s important enough to look up”. My current intention is to circle back after 1 or 2 more case studies and do some useful compare and contrast, but maybe I’ll find something better.

(Readable version here)

And if you’re really masochistic, here’s the yEd file to play with.

Scattered Thoughts

Why I chose plate tectonics

  • It’s recent enough to have relatively good documentation, but not so recent the major players are alive and active in field politics.
  • It’s not a sexy topic, so while there isn’t much work on it what exists is pretty high quality. 
  • It is *the only* accepted paradigm in its field (for the implicit definition of paradigm in my head).
  • Most paradigms are credited to one person on Wikipedia, even though that one person needed many other people’s work and the idea was refined by many people after they created it. Plate tectonics is the first I’ve found that didn’t do that. Continental drift is attributed to Alfred Wegener, but continental drift is not plate tectonics. Plate tectonics is acknowledged as so much of a group effort wikipedia doesn’t give anyone’s name.

Content notes

  • This graph is based primarily on Plate Tectonics: An Insider’s History of the Modern Theory of the Earth, edited by Naomi Oreskes. It also includes parts from this lecture by Christopher White, and Oxford’s Very Short Introduction to Plate Tectonics.
  • Sources vary on how much Alfred Wegener knew when he proposed continental drift. Some say he only had the fossil and continental shape data, but the White video says he had also used synchronous geological layers and evidence of glacial travel.
    • I tried to resolve this by reading Wegener’s original paper (translated into English) but it only left me more confused. He predicted cracks in plates being filled in by magma, but only mentions fossils once. Moreover he only brings them up to point to fossils of plants that are clearly maladapted to the climate of their current location, not the transcontinental weirdnesses. He does casually mention “Concerning South America and Africa, biologists and geologists are in close agreement that a Brazilian–African continent existed in the Mesozoic”, but clearly he’s not the first one to make that argument.
    • I alas ran out of steam before trying Wegener’s book.
    • I was stymied in attempts to check his references by the fact that they’re in German. If you really love reading historic academic German and would like to pair on this, please let me know.
    • I stuck to just the fossil + fit data in the graph, because White is ambiguous when he’s talking about data Wegener had vs. data that came later.
    • White says the bathymetry maps showing the continental shelves had a much better fit than the continents themselves didn’t come out until after Wegener had published, but this paper cites sufficiently detailed maps of North America’s sea floor in 1884. It’s possible no one bothered with South America and Africa until later.
  • A lot of the data for plate tectonics fell out of military oceanography research. Some of the tools used for this were 100+ years old. Others were recently invented (in particular, magnetometers and gravimeters that worked at sea), but the tech those inventions relied on was not that recent. I think. It’s possible a motivated person could have gathered all the necessary evidence much earlier.
  • Sources also vary a lot on what they thought was relevant. The White video uses continental shelf fit (which is much more precise than using the visible coastline) as one of the five pillars of evidence, but it didn’t come up in the overview chapter of the Oreskes book at all.
  • This may be because evidence of continental drift (that is, that the continents used to be in different places, sometimes touching each other) is very different than evidence for plate tectonics (which overwhelmingly focuses on the structure of the plates and mechanism of motion). 

Process notes

  • At points my research got very bogged down in some of the specifics of plate tectonics (in particular, why were transform faults always shown perpendicular to mountain ridges, and how there could be so many parallel to each other?). This ended up being quite time consuming because I was in that dead zone where the question was too advanced for 101 resources to answer but advanced resources assumed you already knew the answer. In the end I had to find a human tutor.
  • This could clearly be infinitely detailed or go infinitely far back. I didn’t have a natural “done” condition beyond feeling bored and wanting to do something else. 
  • I only got two chapters into Oreskes and ⅔ through Very Short Introduction. 
  • I didn’t keep close track but this probably represents 20 hours of work, maybe closer to 30 with a more liberal definition of work. Plus 5-10 hours from other people.
  • In calendar time it was ~7 weeks from starting the Oreskes book to scheduling this for publishing.
  • You can see earlier drafts of the graph, along with some of my notes, on Twitter.

Acknowledgments

Thanks to several friends and especially Jasen Murray for their suggestions and questions, and half the people I’ve talked to in the last six weeks for tolerating this topic.

Thanks to Emily Arnold for spending an hour answering my very poorly phrased questions about transform faults.

Thanks to my Patreon patrons for supporting this work, you guys get a fractional impact share.

Review of Examine.com’s vitamin write-ups

There are a lot of vitamins and other supplements in the world, way more than I have time to investigate. Examine.com has a pretty good reputation for its reports on vitamins and supplements. It would be extremely convenient for me if this reputation was merited. So I asked Martin Bernstoff to spot check some of their reports. 

We originally wanted a fairly thorough review of multiple Examine write-ups. Alas, Martin felt the press of grad school after two shallow reviews and had to step back. This is still enough to be useful so we wanted to share, but please keep in mind its limitations. And if you feel motivated to contribute checks of more articles, please reach out to me (elizabeth@acesounderglass.com).

My (Elizabeth’s) tentative conclusion is that it would take tens of hours to beat an Examine general write-up, but they are not complete in either their list of topics nor their investigation into individual topics. If a particular effect is important to you, you will still need to do your own research.

Photo credit DALL-E

Write-Ups

Vitamin B12

Claim: “The actual rate of deficiency [of B12] is quite variable and it isn’t fully known what it is, but elderly persons (above 65), vegetarians, or those with digestion or intestinal complications are almost always at a higher risk than otherwise healthy and omnivorous youth”

Verdict: True but not well cited. Their citation merely asserts that these groups have shortages rather than providing measurements, but Martin found a meta-analysis making the same claim for vegetarians (the only group he looked for).

Toxicology

Verdict: Very brief. Couldn’t find much on my own. Seems reasonable.

Claim: “Vitamin B12 can be measured in the blood by serum B12 concentrations, which is reproducible and reliable but may not accurately reflect bodily vitamin B12 stores (as low B12 concentrations in plasma or vitamin B12 deficiencies do not always coexist in a reliable manner[19][26][27]) with a predictive value being reported to be as low as 22%”

Verdict: True, the positive predictive value was 22%, but with a negative predictive value of 100% at the chosen threshold. But that’s only the numbers at one threshold. To know whether this is good or bad, we’d have to get numbers at different threshold (or, preferably, a ROC-AUC).

Claim: B12 supplements can improve depression

Examine reviews a handful of observational studies showing a correlation, but includes no RCTs.  This is in spite of there actually being RCTs like Koning et al. 2016 and a full meta analysis, neither of which find an effect. 

The lack of effect in RCTs is less damning than it sounds. I (Elizabeth) haven’t checked all of the studies, but the Koning study didn’t confine itself to subjects with low B12 and only tested serum B12 at baseline, not after treatment. So they have ruled out neither “low B12 can cause depression, but so do a lot of other things” nor “B12 can work but they used the wrong form”.

I still find it concerning that Examine didn’t even mention the RCTs, and I don’t have any reason to believe their correlational studies are any better. 

Interactions with pregnancy

Only one study on acute lymphoblastic leukemia. Seems a weird choice. Large meta-analyses exist for pre-term birth and low birth weight, likely much more important. Rogne et al. 2016.

Overall

They don’t seem to be saying much wrong but the write-up is not nearly as comprehensive as we had hoped. To give Examine its best shot, we decided the next vitamin should be on their best write-up. We tried asking Examine which article they are especially confident in. Unfortunately, whoever handles their public email address didn’t get the point after 3 emails, so Martin made his best guess. 

Vitamin D

Upper respiratory tract infections.

They summarize several studies but miss a very large RCT published in JAMA, the VIDARIS trial. All studies (including the VIDARIS trial) show no effect, so they might’ve considered the matter settled and stopped looking for more trials, which seems reasonable.

Claim: Vitamin D helps premenstrual syndrome

”Most studies have found a decrease in general symptoms when given to women with vitamin D deficiency, some finding notable reductions and some finding small reductions. It’s currently not known why studies differ, and more research is needed”

This summary seemed optimistic after Martin looked into the studies:

  • Abdollahi 2019:
    • No statistically significant differences between groups.
    • The authors highlight statistically significant decreases for a handful of symptoms in the Vitamin D group, but the decrease is similar in magnitude to placebo. Vitamin D and placebo both have 5 outcomes which were statistically significant.
  • Dadkhah 2016:
    • No statistically significant differences between treatment groups
  • Bahrami 2018:
    • No control group
  • Heidari 2019:
    • Marked differences between groups, but absolutely terrible reporting by the authors – they don’t even mention this difference in the abstract. This makes me (Martin) somewhat worried about the results – if they knew what they were doing, they’d focus the abstract on the difference in differences.:
  • Tartagni 2015:
    • Appears to show notable differences between groups, But terrible reporting. Tests change relative to baseline (?!), rather than differences in trends or differences in differences. 

In conclusion, only the poorest research finds effects – not a great indicator of a promising intervention. But Examine didn’t miss any obvious studies.

Claim: “There is some evidence that vitamin D may improve inflammation and clinical symptoms in COVID-19 patients, but this may not hold true with all dosing regimens. So far, a few studies have shown that high dosages for 8–14 days may work, but a single high dose isn’t likely to have the same benefit.”

The evidence Martin found seems to support their conclusions. They’re missing one relatively large, recent study (De Niet 2022). More importantly, all included studies are about hospital patients given vitamin D after admission, which are useless for determining if Vitamin D is a good preventative, especially because some forms of vitamin D take days to be turned into a useful form in the body. 

  • Murai 2021:
    • The regimen was a single, high dose at admission.
    • No statistically significant differences between groups, all the effect sizes are tiny or non-existent.
  • Sabico 2021:
    • Compares Vitamin D 5000 IU/daily to 1000 IU/daily in hospitalized patients.
    • In the Vitamin D group, they show faster
      • Time to recovery (6.2 ± 0.8 versus 9.1 ± 0.8; p = 0.039)
      • Time to restoration of taste (11.4 ± 1.0 versus 16.9 ± 1.7; p = 0.035)
        • The Kaplan-Meier Plot looks weird here, though. What happens on day 14?!
    • All symptom durations, except sore throat, were lower in the 5000 IU group:

All analyses were adjusted for age, BMI and type of D vitamin – which is a good thing, because it appears the 5000 IU group was healthier at baseline:

  • Castillo 2020:
    • Huge effect – half of the control group had to go to the ICU, whereas only one person in the intervention group did so (OR 0.02).
    • Nothing apparently wrong, but I’m still highly suspicious of the study:
      • An apparently well-done randomized pilot trial, early on, published in “The Journal of Steroid Biochemistry and Molecular Biology”. Very worrying that it isn’t published somewhere more prestigious.
      • They gave hydroxychloroquine as the “best available treatment”, even though there was no evidence of effect at the time of the study.
      • They call the study “double masked” – I hope this means double-blinded, because otherwise the study is close to worthless since their primary outcomes are based on doctor’s behavior.
      • The follow-up study is still recruiting.

Conclusion

I don’t know of a better comprehensive resource than Examine.com. It is alas still not comprehensive enough for important use cases, but still a useful shortcut for smaller problems.

Thanks to the FTX Regrant program for funding this post, and Martin for doing most of the work.

Guesstimate Algorithm for Medical Research

This document is aimed at subcontractors doing medical research for me. I am sharing it in the hope it is more broadly useful, but have made no attempts to make it more widely accessible. 

Intro

Guesstimate is a tool I have found quite useful in my work, especially in making medical estimates in environments of high uncertainty. It’s not just that it makes it easy to do calculations incorporating many sources of data; guesstimate renders your thinking much more legible to readers, who can then more productively argue with you about your conclusions. 

The basis of guesstimate is breaking down a question you want an answer to (such as “what is the chance of long covid?”) into subquestions that can be tackled independently. Questions can have numerical answers in the form of a single number, a range, or a formula that references other questions. This allows you to highlight areas of relative certainty and relative uncertainty, to experiment with the importance of different assumptions, and for readers to play with your model and identify differences of opinion while incorporating the parts of your work they agree with.

Basics

If you’re not already familiar with guesstimate, please watch this video, which references this model. The video goes over two toy questions to help you familiarize yourself with the interface.

The Algorithm

The following is my basic algorithm for medical questions:

  1. Formalize the question you want an answer to. e.g. what is the risk to me of long covid?
  2. Break that question down into subquestions. The appropriate subquestion varies based on what data is available, and your idea of the correct subquestions is likely to change as you work.
    1. When I was studying long covid last year, I broke it into the following subquestions
      1. What is the risk with baseline covid?
      2. What is the vaccine risk modifier?
      3. What is the strain risk modifier?
      4. What’s the risk modifier for a given individual?
  3. In guesstimate, wire the questions together. For example, if you wanted to know your risk of hospitalization when newly vaccinated in May 2021, you might multiply the former hospitalization rate times a vaccine modifier. If you don’t know how to do that in guesstimate, watch the video above, it demonstrates it in a lot of detail.
  4. Use literature to fill in answers to subquestions as best you can. Unless the data is very good, these probably include giving ranges and making your best guess as to the shape of the distribution of values.
    1. Provide citations for where you got those numbers. This can be done in the guesstimate commenting interface, but that’s quite clunky. Sometimes it’s better to have a separate document where you lay out your reasoning. 
    2. The reader should be able to go from a particular node in the guesstimate to your reasoning for that node with as little effort as possible.
    3. Guesstimate will use log-normal distribution by default, but you can change it to uniform or normal if you believe that represents reality better.
  5. Sometimes there are questions literature literally can’t answer, or aren’t worth your time to research rigorously. Make your best guess, and call it out as a separate variable so people can identify it and apply their own best guess.
    1. This includes value judgments, like the value of a day in lockdown relative to a normal day, or how much one hates being sick.
    2. Or the 5-year recovery rate from long covid- no one can literally measure it, and while you could guess from other diseases, the additional precision isn’t necessarily worth the effort.
  6. Final product is both the guesstimate model and a document writing up your sources and reasoning.

Example: Trading off air quality and covid.

The final model is available here.

Every year California gets forest fires big enough to damage air quality even if you are quite far away, which pushes people indoors. This was mostly okay until covid, which made being indoors costly in various ways too. So how do we trade those off? I was particularly interested in trading off outdoor exercise vs the gym (and if both had been too awful I might have rethought my stance on how boring and unpleasant working out in my tiny apartment is).

What I want to know is the QALY hit from 10 minutes outdoors vs 10 minutes indoors. This depends a lot on the exact air quality and covid details for that particular day, so we’ll need to have variables for that.

For air quality, I used the calculations from this website to turn AQI into cigarettes. I found a cigarette -> micromort converter faster than cigarette -> QALY so I’m just going to use that. This is fine as long as covid and air quality have the same QALY:micromort ratio (unlikely) or if the final answer is clear enough that even large changes in the ratio would not change our decision (judgment call). 

For both values that use outside data I leave a comment with the source, which gives them a comment icon in the upper right corner.

But some people are more susceptible than others due to things like asthma or cancer, so I’ll add a personal modifier.  I’m not attempting to define this well: people with lung issues can make their best guess. They can’t read my mind though, so I’ll make it clear that 1=average and which direction is bad.

Okay how about 10 minutes inside? That depends a lot on local conditions. I could embed those all in my guesstimate, or I could punt to microcovid. I’m not sure if microcovid is still being maintained but I’m very sure I don’t feel like creating new numbers right now, so we’ll just do that. I add a comment with basic instructions.

How about microcovids to micromorts? The first source I found said 10k per infection, which is a suspiciously round number but it will do for now. I device the micromorts by 1 million, since each microcovid is 1/1,000,000 chance of catching covid.

They could just guess their personal risk modifier like they do for covid, or they could use this (pre-vaccine, pre-variant) covid risk calculator from the Economist, so I’ll leave a note for that.

But wait- there are two calculations happening in the microcovids -> micromorts cell, which makes it hard to edit if you disagree with me about the risk of covid. I’m going to move the /1,000,000 to the top cell so it’s easy to edit.

But the risk of catching covid outside isn’t zero. Microcovid says outdoors has 1/20th the risk. I’m very sure that’s out of date but don’t know the new number so I’ll make something up and list it separately so it’s easy to edit

But wait- I’m not necessarily with the same people indoors and out. The general density of people is comparable if I’m deciding to throw a party inside or outside, but not if I’m deciding to exercise outdoors or at a gym. So I should make that toggleable.

Eh, I’m still uncomfortable with that completely made up outdoor risk modifier. Let’s make it a range so we can see the scope of possible risks. Note that this only matters if we’re meeting people outdoors, which seems correct.

But that used guesstimate’s default probability distribution (log normal). I don’t see a reason probability density would concentrate at the low end of the distribution, so I switch it to normal.

Turns out to make very little difference in practice.

There are still a few problems here. Some of the numbers are more or less made up, and others have sources but I’ve done no work to verify them, which is almost as bad.

But unless the numbers are very off, covid is a full order of magnitude riskier than air pollution for the scenarios I picked. This makes me disinclined to spend a bunch of time tracking down better numbers.

Full list of limitations:

  • Only looks at micromorts, not QALYs
  • Individual adjustment basically made up, especially for pollution
  • Several numbers completely made up
  • Didn’t check any of my sources

Example: Individual’s chance of long covid given infection

This will be based on my post last year, Long covid is not necessarily your biggest problem, with some modification for pedagogical purposes. And made up numbers instead of real ones because the specific numbers have long been eclipsed by new data and strains. The final model is available here

Step one is to break your questions into subquestions. When I made this model a year ago, we only had data for baseline covid in unvaccinated people. Everyone wanted to know how vaccinations and the new strain would affect things. 

My first question was “can we predict long covid from acute covid?” I dug into the data and concluded “Yes”, the risk of long covid seemed to be very well correlated with acute severity. This informed the shape of the model but not any particular values. Some people disagreed with me, and they would make a very different model. 

Once I made that determination, the model was pretty easy to create: It looked like [risk of hospitalization with baseline covid] * [risk of long covid given hospitalization rate] * [vaccination risk modifier] * [strain hospitalization modifier] * [personal risk modifier]. Note that the model I’m creating here does not perfectly match the one created last year; I’ve modified it to be a better teaching example. 

The risk of hospitalization is easy to establish unless you start including undetected/asymptomatic cases. This has become a bigger deal as home tests became more available and mild cases became more common, since government statistics are missing more mild or asymptomatic cases. So in my calculation, I broke down the risk of hospitalization given covid to the known case hospitalization rate and then inserted a separate term based on my estimate of the number of uncaught cases. In the original post I chose some example people and used base estimates for them from the Economist data. In this model, I made something up.

Honestly, I don’t remember how I calculated the risk of long covid given the hospitalization rate. It was very complicated and a long time ago. This is why I write companion documents to explain my reasoning. 

Vaccination modifier was quite easy, every scientist was eager to tell us that. However, there are now questions about vaccines waning over time, and an individual’s protection level is likely to vary. Because of that, in this test model I have entered a range of vaccine efficacies, rather than a point estimate. An individual who knew how recently they were vaccinated might choose to collapse that down. 

Similarly, strain hospitalization modifiers take some time to assess, but are eventually straightforwardly available. Your estimate early in a new strain will probably have a much wider confidence interval than your estimate late in the same wave. 

By definition, I can’t set the personal risk modifier for every person looking at the model. I suggested people get a more accurate estimate of their personal risk using the Economist calculator, and then enter that in the model.

Lastly, there is a factor I called “are you feeling lucky?”. Some people don’t have anything diagnosable but know they get every cold twice; other people could get bitten by a plague rat with no ill effects. This is even more impossible to provide for an individual but is in fact pretty important for an individual’s risk assessment, so I included it as a term in the model. Individuals using the model can set it as they see fit, including to 1 if they don’t want to think about it.

When I put this together, I get this guesstimate. [#TODO screenshot]. Remember the numbers are completely made up. If you follow the link you can play around with it yourself, but your changes will not be saved. If anyone wants to update my model with modern strains and vaccine efficacy, I would be delighted.

Tips and Tricks

I’m undoubtedly missing many, so please comment with your own and I’ll update or create a new version later.

When working with modifiers, it’s easy to forget whether a large number is good or bad, and what the acceptable range is. It can be good to mark them with “0 to 1, higher is less risky”, or “between 0 and 1 = less risk, >1 = more risk”

If you enter a range, the default distribution is log-normal. If you want something different, change it. 

The formulas in the cells can get almost arbitrarily complicated, although it’s often not worth it. 

No, seriously, write out your sources and reasoning somewhere else because you will come back in six months and not remember what the hell you were thinking. Guesstimate is less a tool for holding your entire model and more a tool for forcing you to make your model explicit.

Separate judgment calls from empirical data, even if you’re really sure you are right. 

Acknowledgements

Thanks to Ozzie Gooen and his team for creating Guesstimate.

Thanks to the FTX Regrant program and a shy regrantor for funding this work.

Impact Shares For Speculative Projects

Introduction

Recently I founded a new project with Jasen Murray, a close friend of several years. At founding the project was extremely amorphous (“preparadigmatic science: how does it work?”) and was going to exit that state slowly, if it at all. This made it a bad fit for traditional “apply for a grant, receive money, do work” style funding. The obvious answer is impact certificates, but the current state of the art there wasn’t an easy fit either. In addition to the object-level project, I’m interested in advancing the social tech of funding. With that in mind, Jasen and I negotiated a new system for allocating credit and funding.

This system is extremely experimental, so we have chosen not to make it binding. If we decide to do something different in a few months or a few years, we do not consider ourselves to have broken any promises. 

In the interest of advancing the overall tech, I wanted to share the considerations we have thought of and tentative conclusions. 

DALL-E rendering of impact shares

Considerations

All of the following made traditional grant-based funding a bad fit:

  • Our project is currently very speculative and its outcomes are poorly defined. I expect it to be still speculative but at least a little more defined in a few months.
  • I have something that could be called integrity and could be called scrupulosity issues, which makes me feel strongly bound to follow plans I have written down and people have paid me for, to the point it can corrupt my epistemics. This makes accepting money while the project is so amorphous potentially quite harmful, even if the funders are on board with lots of uncertainty. 
  • When we started, I didn’t think I could put more than a few hours in per week, even if I had the time free, so I’m working more or less my regular freelancing hours and am not cash-constrained. 
  • The combination of my not being locally cash-constrained, money not speeding me up, and the high risk of corrupting my epistemics, makes me not want to accept money at this stage. But I would still like to get paid for the work eventually.
  • Jasen is more cash-constrained and is giving up hours at his regular work in order to further the project, so it would be very beneficial for him to get paid.
  • Jasen is much more resistant to epistemic pressure than I am, although still averse to making commitments about outcomes at this stage.

Why Not Impact Certificates?

Impact certificates have been discussed within Effective Altruism for several years, first by Paul Christiano and Katja Grace, who pitched it as “accepting money to metaphorically erase your impact”. Ben Hoffman had a really valuable addition with framing impact certificates as selling funder credits, rather than all of the credit. There is currently a project attempting to get impact certificates off the ground, but it’s aimed at people outside funding trust networks doing very defined work, which is basically the opposite of my problem. 

What my co-founder and I needed is something more like startup equity, where you are given a percentage credit for the project, and that percentage can be sold later, and the price is expected to change as the project bears fruit or fails to do so. If six months from now someone thinks my work is super valuable they are welcome to pay us, but we have not obligated ourselves to a particular person to produce a particular result.

Completely separate from this, I have always found the startup practice of denominating stock grants in “% of company”, distributing all the equity at the beginning but having it vest over time, and being able to dilute it at any time, kind of bullshit. What I consider more honest is distributing shares as you go and everyone recognizes that they don’t know what the total number of shares will be. This still provides a clean metric for comparing yourself to others and arguing about relative contributions, without any of the shadiness around percentages. This is mathematically identical to the standard system but I find the legibility preferable. 

The System

In Short

  • Every week Jasen and I accrue n impact shares in the project (“impact shares” is better than the first name we came up with, but probably a better name is out there). n is currently 50 because 100 is a very round number. 1000 felt too big and 10 made anything we gave too anyone else feel too small. This is entirely a sop to human psychology; mathematically it makes no difference.
  • Our advisor/first customer accrues a much smaller number, less than 1 per week, although we are still figuring out the exact number. 
  • Future funders will also receive impact shares, although this is an even more theoretical exercise than the rest of it because we don’t expect them to care about our system or negotiate on it. Funding going to just one of us comes out of that person’s share, funding going to both of us or the project at large, probably gets issued new shares. 
  • Future employees can negotiate payment in money and impact shares as they choose.
  • In the unlikely event we take on a co-founder level collaborator in the future, probably they will accrue impact shares at the same rate we do but will not get retroactive shares. 

Details

Founder Shares

One issue we had to deal with was that Jasen would benefit from a salary right away, while I found a salary actively harmful, but wouldn’t mind having funding for expenses (this is not logical but it wasn’t worth the effort to fight it). We have decided that funding that is paying a salary is paid for with impact shares of the person receiving the salary, but funding for project expenses will be paid for either evenly out of both of our shared pools, or with new impact shares. 

We are allowed to have our impact shares go negative, so we can log salary payments in a lump sum, rather than having to deal with it each week.

Initially, we weren’t sure how we should split impact shares between the two of us. Eventually, we decided to fall back on the YCombinator advice that uneven splits between cofounders is always more trouble than it’s worth. But before then we did some thought experiments about what the project would look like with only one of us. I had initially wanted to give him more shares because he was putting in more time than me, but the thought experiments convinced us both that I was more counterfactually crucial and we agreed on 60/40 in my favor before reverting to a YC even split at my suggestion. 

My additional value came primarily from being more practical/applied. Applied work without theory is more useful than theory without application, so that’s one point for me. Additionally all the value comes from convincing people to use our suggestions, and I’m the one with the reputation and connections to do that. That’s in part because I’m more applied, but also because I’ve spent a long time working in public and Jasen had to be coaxed to allow his name on this document at all. I also know and am trusted by more funders, but I feel gross including that in the equation, especially when working with a close friend. 

We both felt like that exercise was very useful and grounding in assessing the project, even if we ultimately didn’t use its results. Jasen and I are very close friends and the relationship could handle the measuring of credit like that. I imagine many can’t, although it seems like a bad sign for a partnership overall. Or maybe we’re both too willing to give credit to other people and that’s easier to solve than wanting too much for ourselves. I think what I recommend is to do the exercise and unless you discover something really weird still split credit evenly, but that feels like a concession to practicality humanity will hopefully overcome. 

We initially discussed being able to give each other impact shares for particular pieces of work (one blog post, one insight, one meeting, etc). Eventually, we decided this was a terrible idea. It’s really easy to picture how we might have the same assessment of the other’s overall or average contribution but still vary widely in how we assess an individual contribution. For me, Jasen thinking one thing was 50% more valuable than I thought it was, did not feel good enough to make up for how bad it would be for him to think another contribution was half as valuable as I thought it was. For Jasen it was even worse because having his work overestimated felt almost as bad as having it underestimated. Plus it’s just a lot of friction and assessment of idea seeds when the whole point of this funding system is getting to wait to see how things turn out. So we agreed we would do occasional reassessments with months in between them, and of course we’re giving each other feedback constantly, but to not do quantified assessments at smaller intervals.

Neither of us wanted to track the hours we were putting into the project, that just seemed very annoying. 

So ultimately we decided to give ourselves the same number of impact shares each week, with the ability to retroactively gift shares or negotiate for a change in distribution going forward, but those should be spaced out by months at a minimum. 

Funding Shares

When we receive funding we credit the funder with impact shares. This will work roughly like startup equity: you assess how valuable the project is now, divide that by the number of outstanding shares, and that gets you a price per share. So if the project is currently $10,000 and we have 100 shares outstanding, the collaborator would have to give up 1 share to get $100.

Of course, startup equity works because the investors are making informed estimates of the value of the startup. We don’t expect initial funders to be very interested in that process with us, so probably we’ll be assessing ourselves on the honor system, maybe polling some other people. This is a pretty big flaw in the plan, but I think overall still a step forward in developing the coordination tech. 

In addition to the lack of outside evaluation, the equity system misses the concept of funder’s credit from Ben Hoffman’s blog post which I think is otherwise very valuable.  Ultimately we decided that impact shares are no worse than the current startup equity model, and that works pretty well. “No worse than startup equity” was a theme in much of our decision-making around this system. 

Advisor Shares

We are still figuring out how many impact shares to give our advisor/first customer. YC has standard advice for this (0.25%-1%), but YC’s advice assumes you will be diluting shares later, so the number is not directly applicable. Advisor mostly doesn’t care right now, because he doesn’t feel that this is taking much effort from him. 

It was very important to Jasen to give credit to people who got him to the starting line of this project, even if they were not directly involved in it. Recognizing them by giving them some of his impact shares felt really good to him, way more tangible than thanking mom after spiking a touchdown.

Closing

This is extremely experimental. I expect both the conventions around this to improve over time and for me and Jasen to improve our personal model as we work.  Some of that improvement will come from saying our current ideas and hearing the response, and I didn’t want to wait on starting that conversation, so here we are. 

Thanks to several people, especially Austin Chen and Raymond Arnold, for discussion on this topic.

Cognitive Risks of Adolescent Binge Drinking

The takeaway

Our goal was to quantify the cognitive risks of heavy but not abusive alcohol consumption. This is an inhernetly difficult task: the world is noisy, humans are highly variable, and institutional review boards won’t let us do challenge trials of known poisons. This makes strong inference or quantification of small risks incredibly difficult. We know for a fact that enough alcohol can damage you, and even levels that aren’t inherently dangerous can cause dumb decisions with long term consequences. All that said… when we tried to quantify the level of cognitive damage caused by college level binge drinking, we couldn’t demonstrate an effect. This doesn’t mean there isn’t one (if nothing else, “here, hold my beer” moments are real), just that it is below the threshold detectable with current methods and levels of variation in the population.

Motivation

In discussions with recent college graduates I (Elizabeth) casually mentioned that alcohol is obviously damaging to cognition. They were shocked and dismayed to find their friends were poisoning themselves, and wanted the costs quantified so they could reason with them (I hang around a very specific set of college students). Martin Bernstorff and I set out to research this together. Ultimately, 90-95% of the research was done by him, with me mostly contributing strategic guidance and somewhere between editing and co-writing this post. 

I spent an hour getting DALL-E to draw this

Problems with research on drinking during adolescence

Literature on the causal medium- to long-term effects of non-alcoholism-level drinking on cognition is, to our strong surprise, extremely lacking. This isn’t just our poor research skills; in 2019, the Danish Ministry of Health attempted a comprehensive review and concluded that:

“We actually know relatively little about which specific biological consequences a high level of alcohol intake during adolescence will have on youth”.

And it isn’t because scientists are ignoring the problem either. Studying medium- and long-term effects on brain development is difficult because of the myriad of confounders and/or colliders for both cognition and alcohol consumption, and because more mechanist experiments would be very difficult and are institutionally forbidden anyway (“Dear IRB: we would like to violently poison some teenagers for four years, while forbidding the other half to engage in standard college socialization”). You could randomize abstinence, but we’ll get back to that.

One problem highly prevalent in alcohol literature is the abstinence bias. People who abstain from alcohol intake are likely to do so for a reason, for example chronic disease, being highly conscientious and religious, or a bad family history with alcohol. Even if you factor out all of the known confounders, it’s still vanishingly unlikely the drinking and non-drinking samples are identical. Whatever the differences, they’re likely to affect cognitive (and other) outcomes. 

Any analysis comparing “no drinking” to “drinking” will suffer from this by estimating the effect of no alcohol + confounders, rather than the effect of alcohol. Unfortunately, this rules out a surprising number of studies (code available upon request). 

Confounding is possible to mitigate if we have accurate intuition about the causal network, and we can estimate the effects of confounders accurately. We have to draw a directed acyclic graph with the relevant causal factors and adjust analyses or design accordingly. This is essential, but has not permeated all of epidemiology (yet), and especially for older literature, this is not done. For a primer, Martin recommends “Draw Your Assumptions” on edX here.

Additionally, alcohol consumption is a politically live topic, and papers are likely to be biased. Which direction is a coin flip: public health wants to make it seem scarier, alcohol companies want to make it seem safer. Unfortunately, these biases don’t cancel out, they just obfuscate everything.

What can we do when we know much of the literature is likely biased, but we do not have a strong idea about the size or direction?

Triangulation

If we aggregate multiple estimates that are wrong, but in different (and overall uncorrelated) directions, we will approximate the true effect. For health, we have a few dimensions that we can vary over: observational/interventional, age, and species.

Randomized abstinence studies

Ideally, we would have strong evidence from randomized controlled trials of abstinence. In experimental studies like this, there is no doubt about the direction of causality. And, since participants are randomized, confounders are evenly distributed between intervention and control groups. This means that our estimate of the intervention effect is unbiased by confounders, both measured and unmeasured.

However, we were only able to find two such studies, both from the 80s, among light drinkers (mean 3 standard units per week), and of a duration of only 2-6 weeks (Bimbaum et al., 1983; Hannon et al., 1987)

Bimbaum et al. did not stick to the randomisation when analyzing their data, opening the door to confounding:

Which should decrease our confidence in their study. They found no effect of abstinence on their 7 cognitive measures.

In Hannon et al., instruction to abstain vs. maintain resulted in a difference in alcohol intake of 12.5 units pr. week over 2 weeks. On the WAIS-R vocabulary test, abstaining women scored 55.5 ± 6.7 and maintaining women scored 51.0 ± 8.8 (both mean ± SD). On the 3 other cognitive tests performed, they found no difference.

Especially due to the short duration, we should be very wary of extrapolating too much from these studies. However, it appears that for moderate amounts of drinking over a short time period, total abstinence does not provide a meaningful benefit in the above studies.

Observational studies on humans

Due to their observational nature (as opposed to being an experiment), these studies are extremely vulnerable to confounders, colliders, reverse causality etc. However, they are relatively cheap ways of getting information, and are performed in naturalistic settings.

One meta-analysis (Neafsey & Collins, 2011) compared moderate social drinking (< 4 drinks/day) to non-drinkers (note: the definition of moderate varies a lot between studies). They partially compensated for the abstinence bias by excluding “former drinkers” from their reference group, i.e. removing people who’ve stopped drinking for medical (or other) reasons. This should provide a less biased estimate of the true effect. They found a protective effect of social drinking on a composite endpoint, “cognitive decline/dementia” (Odds Ratio 0.79 [0.75; 0.84]).

Interestingly, they also found that studies adjusting for age, education, sex and smoking-status did not have markedly different estimates from those that did not (ORadjusted 0.75 vs. ORun-adjusted 0.79). This should decrease our worry about confounding overall.

Observational studies on alcohol for infants

Another angle for triangulation is the effect of moderate maternal alcohol intake during pregnancy on the offspring’s IQ. The brain is never more vulnerable than during fetal development. There are obviously large differences between fetal and adolescent brains, so any generalization should be accompanied with large error bars. However, this might give us an upper bound.

(Zuccolo et al., 2013) perform an elegant example of what’s called Mendelian randomization.

A SNP variant in a gene (ADH1B) is associated with decreased alcohol consumption. Since SNP are near-randomly assigned (but see the examination of assumptions below), one can interpret it as the SNP causing decreased alcohol consumption. If some assumptions are met, that’s essentially a randomized controlled trial! Alas, these assumptions are extremely strong and unlikely to be totally true – but it can still be much better than merely comparing two groups with differing alcohol consumption.

As the authors very explicitly state, this analysis assumes that:

1. The SNP variant (rs1229984) decreases maternal alcohol consumption. This is confirmed in the data. Unfortunately, the authors do this by chi-square test (“does this alter consumption at all?”) rather than estimating the effect size. However, we can do our own calculations using Table 5:

If we round each alcohol consumption category to the mean of its bounds (0, 0.5, 3.5, 9), we get a mean intake in the SNP variant group of 0.55 units/week and a mean intake in the non-carrier of 0.88 units/week (math). This means that SNP-carrier mothers drink, on average, 0.33 units/week less. That’s a pretty small difference! We would’ve liked the authors to do this calculation themselves, and use it to report IQ-difference per unit of alcohol per week.

2. There is no association between the genotype and confounding factors, including other genes. This assumption is satisfied for all factors examined in the study, like maternal age, parity, education, smoking in 1st trimester etc. (Table 4), but unmeasured confounding is totally a thing! E.g. a SNP which correlates with the current variant and causes a change in the offspring’s IQ/KS2-score.

3. The genotype does not affect the outcome by any path other than maternal alcohol consumption, for example through affecting metabolism of alcohol.

If we believe these assumptions to be true, the authors are estimating the effect of 0.33 maternal alcohol units per week on the offspring’s IQ and KS2-score. KS2-score is a test of intellectual achievement (similar to the SAT) for 11-year-olds with a mean of 100 points and a standard deviation of ~15 points. 

They find that the 0.33 unit/week decrease does not affect IQ (mean difference -0.01 [-2.8; 2.7]) and causes a 1.7 point (with a 95% confidence interval of between 0.4 and 3.0) increase in KS2 score. 

This is extremely interesting. Additionally, the authors complete a classical epidemiological study, adjusting for typical confounders:

This shows that the children of pre-pregnancy heavy drinkers, on average, scored 8.62 (with a standard error of  1.12) points higher on IQ than non-drinkers, 2.99 points (SE 1.06) after adjusting for confounders. However, they didn’t adjust for alcohol intake in other parts of the pregnancy! Puzzlingly, first trimester drinking has an effect in the opposite direction: -3.14 points (SE 1.64) on IQ. However, this was also not adjusted for previous alcohol intake. This means that the estimates in table 1 (pre-pregnancy and first trimester) aren’t independent, but we don’t know how they’re correlated. Good luck teasing out the causal effect of maternal alcohol intake and timing from that.

Either way, the authors (and I) interpret the effects as being highly confounded; either residual (the confounder was measured with insufficient accuracy for complete adjustment) or unknown (confounders that weren’t measured). For example, pre-pregnancy alcohol intake was strongly associated with professional social class and education (upper-class wine-drinkers?), whereas the opposite was true for first trimester alcohol intake. Perhaps drinking while you know you’re pregnant is low social status?

If you’re like Elizabeth you’re probably surprised that drinking increases with social class. I didn’t dig into this deeply, but a quick search found that it does appear to hold up.

This result is in conflict with that of the Mendelian randomization, but it makes sense. Mendelian randomization is less sensitive to confounding, so maybe there is no true effect. Also, the study only estimated the genetic effect of a 0.33 units/week difference, so the analyses are probably not sufficiently powered. 

Taken together, the study should probably update towards a lack of harm from moderate (whatever that means) levels of alcohol intake, although how big an update that is depends on your previous position. We say “moderate” because fetal alcohol syndrome is definitely a thing, so at sufficient alcohol intake it’s obviously harmful! .

Rodents

There is a decently sized, pretty well-conducted literature on adolescent intermittent ethanol exposure (science speak for “binge drinking on the weekend”). Rat adolescence is somewhat similar to human adolescence; it’s marked by sexual maturation, increased risk-taking and increased social play (Sengupta, 2013). The following is largely based on a deeper dive into the linked references from (Seemiller & Gould, 2020).

Adolescent intermittent ethanol exposure is typically operationalised as a blood-alcohol concentration of ~10 standard alcohol units, 0.5-3 times/day every 1-2 days during adolescence.

To interpret this, we make some big assumptions. Namely:

  1. Rodent blood-alcohol content can be translated 1:1 to human
  2. Effects on rodent cognition at a given alcohol concentration are similar to those on human cognition 
  3. Rodent adolescence can mimic human adolescence

Now, let’s dive in!

Two primary tasks are used in the literature:

The 5-choice serial reaction time task. 

Rodents are placed in a small box, and one of 5 holes is lit up. Rodents are measured at how good they are at touching the hole. 

Training in the 5-CSRTT varies between studies, but the two studies below consist of 6 training sessions at age 60 days. Initially, rats were rewarded with pellets from the feeder in the box to alert them to the possibility of reward. 

Afterwards, training sessions had gradually increasing difficulty. To begin with, the light stays on for 30 seconds to start, but the duration gradually decreases to 1 second. Rats progressed to the next training schedule based on either of 3 predefined criteria: 100 trials completed, >80% accuracy or <20% omissions. 

Naturally, you can measure a ton of stuff here! Generally, focus is on accuracy and omissions, but there are a ton of others:

From (Boutros et al., 2017) sup. table 1, congruent with (Semenova, 2012)

Now we know how they measured performance; but how did they imitate adolescent drinking?

Boutros et al. administered 5 g/kg of 25% ethanol through the mouth once per day in a 2-day on/off pattern, from age 28 days to 57 days – a total of 14 administrations. Based on blood alcohol content, this is equivalent to 10 standard units at each administration – quite a dose! Surprisingly, they found a decrease in omissions with the standard task, but no other systematic changes, in spite of 50+ analyses on variations of the measures (accuracy, omissions, correct responses, incorrect responses etc.) and task difficulty (length of the light staying on, whether they got the rats drunk etc.). We’d chalk this up to a chance finding.

Semenova et al. used the same training schedule, but administered 5 g/kg of 25% ethanol through the mouth every 8h for 4 days – a total of 12 administrations. They found small differences in different directions on different measures, but have the same multiple comparisons problem. Looks like noise to us.

The Barnes Maze 

Rodents are placed in the middle of an approximately 1m circle with 20-40 holes at the perimeter and are timed on how quickly they arrive at the hole with a reward (and escape box) below it. For timing spatial learning, the location of the hole is held constant. In (Coleman et al., 2014) and (Vetreno & Crews, 2012), rodents were timed once a day for 5 days. They were then given 4 days of rest, and the escape hole was relocated exactly 180° from the initial location. They were then timed again once a day, measuring relearning.


Figure: Tracing of the route taken by a control mouse right after the location was reversed, from Coleman et al., 2014.

Both studies found no effect of adolescent intermittent ethanol exposure on initial learning rate or errors. 

Vetreno found alcohol-exposed rats took longer to escape on their first trial but did equally well in all subsequent trials:

Whereas Coleman found a ~3x difference in performance on the relearning task, with similar half-times:

Somewhat suspiciously, even though Vetreno et al. is performed 2 years later than Coleman et al. and they share the same lab, they do not reference Coleman et al..

This does, technically, show an effect. However given the small size of effect, the number of metrics measured, file drawer effects, and the disagreement with the rest of the literature, we believe this is best treated as a null result.

Conclusion

So, what should we do? From the epidemiological literature, if you care about dementia risk, it looks like social drinking (i.e. excluding alcoholics) reduces your risk by ~20% as compared to not drinking. All other effects were part of a heterogenous literature with small effect sizes on cognition. Taking together, long-term cognitive effects of conventional alcohol intake during adolescence should play only a minor role in determining alcohol-intake.

Thanks to an FTX Future Fund regrantor for funding this work.

Bimbaum, I. M., Taylor, T. H., & Parker, E. S. (1983). Alcohol and Sober Mood State in Female Social Drinkers. Alcoholism: Clinical and Experimental Research, 7(4), 362–368. https://doi.org/10.1111/j.1530-0277.1983.tb05483.x

Boutros, N., Der-Avakian, A., Markou, A., & Semenova, S. (2017). Effects of early life stress and adolescent ethanol exposure on adult cognitive performance in the 5-choice serial reaction time task in Wistar male rats. Psychopharmacology, 234(9), 1549–1556. https://doi.org/10.1007/s00213-017-4555-3

Coleman, L. G., Liu, W., Oguz, I., Styner, M., & Crews, F. T. (2014). Adolescent binge ethanol treatment alters adult brain regional volumes, cortical extracellular matrix protein and behavioral flexibility. Pharmacology Biochemistry and Behavior, 116, 142–151. https://doi.org/10.1016/j.pbb.2013.11.021

Hannon, R., Butler, C. P., Day, C. L., Khan, S. A., Quitoriano, L. A., Butler, A. M., & Meredith, L. A. (1987). Social drinking and cognitive functioning in college students: A replication and reversibility study. Journal of Studies on Alcohol, 48(5), 502–506. https://doi.org/10.15288/jsa.1987.48.502

Neafsey, E. J., & Collins, M. A. (2011). Moderate alcohol consumption and cognitive risk. Neuropsychiatric Disease and Treatment, 7, 465–484. https://doi.org/10.2147/NDT.S23159

Seemiller, L. R., & Gould, T. J. (2020). The effects of adolescent alcohol exposure on learning and related neurobiology in humans and rodents. Neurobiology of Learning and Memory, 172, 107234. https://doi.org/10.1016/j.nlm.2020.107234

Semenova, S. (2012). Attention, impulsivity, and cognitive flexibility in adult male rats exposed to ethanol binge during adolescence as measured in the five-choice serial reaction time task: The effects of task and ethanol challenges. Psychopharmacology, 219(2), 433–442. https://doi.org/10.1007/s00213-011-2458-2

Sengupta, P. (2013). The Laboratory Rat: Relating Its Age With Human’s. International Journal of Preventive Medicine, 4(6), 624–630. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3733029/

Vetreno, R. P., & Crews, F. T. (2012). Adolescent binge drinking increases expression of the danger signal receptor agonist HMGB1 and toll-like receptors in the adult prefrontal cortex. Neuroscience, 226, 475–488. https://doi.org/10.1016/j.neuroscience.2012.08.046

Zuccolo, L., Lewis, S. J., Davey Smith, G., Sayal, K., Draper, E. S., Fraser, R., Barrow, M., Alati, R., Ring, S., Macleod, J., Golding, J., Heron, J., & Gray, R. (2013). Prenatal alcohol exposure and offspring cognition and school performance. A ‘Mendelian randomization’ natural experiment. International Journal of Epidemiology, 42(5), 1358–1370. https://doi.org/10.1093/ije/dyt172

Quick Look: Asymptomatic Herpes Shedding

Tl;dr: Individuals shed and thus probably spread oral HSV1 while completely asymptomatic.

Introduction

“Herpes virus” can refer to several viruses in the herpes family, including chickenpox and Epstein-Barr (which causes mono). All herpesviridae infections are for life: once infected, the virus will curl up in its cell of choice, possibly to leap out and begin reproduction again later. If the virus produces visible symptoms, it is called symptomatic. If the virus is producing viable virions that can infect other people, it’s called shedding. How correlated symptoms and shedding are is the topic of this post. 

When people say “herpes” without further specification, they typically mean herpes simplex 1 or 2. HSV1 and 2 are both permanent infections of nerve cells that can lay dormant forever, or intermittently cause painful blisters on mucous membranes (typically mouth or genitals, occasionally eyes, very occasionally elsewhere). There are also concerns about subtle long-term effects, which I do not go into here.

There are two conventional pieces of conventional wisdom on HSV: “you can shed infectious virus at any time, even without a sore. Most people who catch herpes catch it from an asymptomatic individual” and “99.9% of shedding occurs during or right before a blister and there are distinct signs you can recognize if you’re paying attention. If you can recognize an oncoming blister the chances of infecting another human are negligible.” At the request of a client I performed two hours of research to judge between these.

It is definitely true that doctors will only run tests looking for the virus directly (as opposed to antibodies) if you have an active sore. However when researchers proactively sampled asymptomatic individuals using either genetic material tests (PCR/NAAT, which look for viral DNA in a sample) or viral culture (which attempt to breed virus from your test sample in a petri dish), they reliably found some people are shedding virus. 

HSV1 prefers the mouth but is well known to infect genitals as well. HSV2 is almost exclusively genital. Due to a dearth of studies I’ve included some HSV2 and genital HSV1 studies. 

Studies

Tronstein et al: This paper stupidly lumped in “0% shedding” with “>0% shedding” and I hate them. Ignoring that, they found that 10% of all days recorded from individuals with asymptomatic genital HSV2 involved shedding, and these were distributed on a long tail, with the peak at 0-5%. I cannot tell if they lumped 0% and 0.1% together because 0% never happens, or because they hate science. 

your buckets are bad and you should feel bad

Bowman et al: 14% of previously symptomatic genital HSV2 patients shed isolate-able virus (sampled every 8 weeks over ~3 years) while on antivirals. This study reports “isolating” virus without further details; I expect this means viral culture. 

Sacks et al: citing another paper: shedding across 6% of days in oral HSV1 patients (using viral culture). It also found the following asymptomatic shedding rates for genital herpes

Spruance: oral HSV1 patients shed isolatable virus 7.4% of the time (including while symptomatic). 60% of this occurred while experiencing mild symptoms that could have indicated an upcoming sore, but never developed into a sore.

Tateish et al: tested 1000 samples from oral surgery patients (not filtered for HSV infection status). 4.7% had PCR-detectable herpes DNA, and 2.7% had culturable virus. This includes patients without herpes (about 50% of people in Japan, where the research was done), but oral surgery is stressful and often stems from issues that make it easier to shed herpes, so I consider those to ~cancel out. 

Conclusion

My conclusion: it is definitely possible to shed HSV while asymptomatic, including if you are never symptomatic. The daily shedding rate is something like 3-12%, although with lots of interpersonal variability. This doesn’t translate directly to an infectiousness rate: human mouths might be harder or easier to infect than petri dishes (my guess is harder, based on the continued existence of serodiscordant couples). It may be possible for people who are antibody positive for HSV to never shed virus but we don’t know because no one ran the right tests. 

Thanks to anonymous client for funding the initial research and my Patreon patrons for supporting the public write-up.