IQ Tests and Poverty

Recently I read Poor Economics, which is excellent at doing what it promises: explaining the experimental data we have for what works and does not work in alleviating third world poverty, with some theorizing as to why.  If that sounds interesting to you, I heartily recommend it.  I don’t have much to add to most of it, but one thing that caught my eye was their section on education and IQ tests.

In Africa and India, adults believe that the return to education is S-shaped (meaning each additional unit of education is more valuable than the one before, at least up to a point).  This leads them to concentrate their efforts on the children that are already doing the best.  This happens at multiple levels- poor parents pick one child to receive an education and put the rest to work much earlier, teachers put more of their energy into their best students.  Due to a combination of confirmation bias and active maneuvering, the children of rich parents are much more likely to be picked as The Best, regardless of their actual ability.   Not only does this get them more education, but education is viewed as proof one is smart, so they’re double winners.  This leaves some very smart children of poor parents operating well below their potential.

One solution to this is IQ tests.  Infosys, an Indian IT contractor, managed to get excellent workers very cheaply by giving IQ tests to adults and hiring those who scored well, regardless of education.  The authors describe experiments in Africa giving IQ tests to young children so that teachers will invest more in the smart but poor children.  This was one of the original uses of the SATs in America- identifying children who were very bright but didn’t have the money or connections to go to Ivy League feeder high schools.

This is more or less the opposite of how critics view standardized testing the US.  They believe the tests are culturally biased such that a small sliver of Americans will always do better, and that basing resource distribution on those tests disenfranchises the poor and people outside the white suburban subculture.  What’s going on here?

One possible explanation is that one group or the other is wrong, but both sides actually have pretty good evidence.  The IQ tests are obviously being used for the benefit of very smart poor children in the 3rd world.  And even tests without language can’t get around the fact that being poor takes up brainspace, and so any test will systematically underestimate poor children. So let’s assume both groups are right at least some of the time.

Maybe it’s the difference in educational style that matters?  In the 3rd world, teachers are evaluated based on their best student.  In the US, No Child Left Behind codified the existing emphasis on getting everyone to a minimum bench mark.    Kids evaluated as having lower potential than they actually do may receive less education than they should, but they still get some, and in many districts gifted kids get the least resources of any point on the bell curve.

Or it could be because the tests are trying to do very different things.  The African and Indian tests are trying to pick out the extremely intelligent who would otherwise be overlooked.  The modern US tests are trying to evaluate every single student and track them accordingly.  When the SATs were invented they had a job much like the African tests; as more and more people go to college its job is increasingly to evaluate the middle of the curve.  It may be that these are fundamentally different problems.

This has to say something interesting about the meaning of intelligence or usefulness of education, but I’m not sure what.

Review: Mindset (Carol Dweck)

I went into this pretty skeptical, based on Scott Alexander’s analysis of the science.  But the reality was worse than I imagined.  First, she never even defines terms like talent or ability.  I would use ability to mean “current level of performance” and talent to mean something like “innate propensity to excel at task, as manifested in initial ability, ease of learning, or ceiling on ability.”  She… maybe uses ability to mean both those things?  She’ll talk about initial ability or talent and then increased ability or talent after practice, but that doesn’t mean the same amount of effort will get everyone to the same place, or all places are reachable by all people.  For that matter, she never defines mindset.  She talks about it like a fairly fixed trait (meaning it stays constant from one situation to another), but her own studies show it being changed by a four second speech.

Second, you can’t just make a list of good things and a list of bad things and wrap all the good things under your label and bad things under it’s opposite.  Here is a list of statements I believe will be uncontroversial:

  • A person who treats failure as a learning opportunity will learn more and be happier than a person who treats it as a mandate to curl into a ball and cry.
  • Ditto for viewing feedback as a source of information, rather than a referendum on you as a person.
  • Sometimes people start out bad at a task, practice, and then get really good at it.
  • This is more likely to happen if the person believes practice can improve their skill.
  • Children (and probably all people) tend to do better when their successes are ascribed to something they can control than to forces outside their control.

These things don’t necessarily go together.  For example, it is entirely possible to believe almost anything is learnable, and then beat yourself up for failure because you should have learned it already.   I’ve seen me do it.  “I’ll do better next time” can just as easily become a mantra to avoid mindfulness as to encourage it.

Third, I can’t even with the chapter on corporations.  Jack Welch brought stack ranking aka “rank and yank” to the masses, and she uses him as an example of not only having growth mindset, but fostering it throughout his company.

[An artist’s rendering of working at GE]

After this I refused to trust her anecdotes, and Scott already took down the studies.   You might think that left the book with nothing, but surprisingly it didn’t.  Her descriptions of the individual facets of growth or fixed mindset and how they affect people were useful and informative, even if I don’t think they have anything to do with each other.  And I think growth vs. fixed mindset might actually be a useful schema for institutions.  It certainly captures a lot of what’s wrong with American schools.

And as inspirational reading, it’s pretty good.  I would love to live in a world where one determined teacher takes 40 students from illiterate to Shakespeare, and stereotype threat is countered with a short speech.  In a world that overvalues innate talent, a push too far in the other direction may still leave us better off.  But that doesn’t make it correct.

The Efficacy of Deliberate Practice

I am really, really trying to stay away from takedown pieces, but multiple books on the importance of practice and irrelevance of talent cite Ericsson, Krampe, and Tesch-Romer’s  “The Role of Deliberate Practice in the Acquisition of Expert Performance” (PDF) as proof of the importance of deliberate practice (and the irrelevance of innate talent, but that’s a different issue).  This study compared the best, excellent, and good violinists, and their methods of study.  They claim the most accomplished students had more accumulated hours of practice than the least, and while they all currently practiced the same amount, the most accomplished students spent more of it in deliberate practice, thus proving its importance.

Let me make this quick: the study had an n of 30, spread out over three treatment groups, all of which were recruited from a music school, and the measure of success was not “successful career” but “professor prediction of successful career”.  So if the sample size was big enough to prove anything (which it wasn’t) and the sample was representative of the population (which it wasn’t), they still only proved that professors like people who study a lot.  We’re not even getting into how they estimated cumulative lifetime hours of practice for people who started picked up an instrument at four years old.  Having this be a foundational study for multiple books is embarrassing.

My most recent encounter with this study was from The Talent Code, which cites many studies showing the best people in a field engage in deliberate practice and zero experiments showing people improved after being taught deliberate practice.  I looked on google scholar and found mostly papers on how to deliberately practice teaching, not teach deliberate practice, and a few that taught using it (without a control group).  Deliberate practice looks extremely plausible and I plan on applying more of it myself, but seriously, how has this idea been around for 20 years and no one has done the most basic experiments on it?

Educational Games

There’s a lot of games that attempt to be educational out there.  I break them down into the following categories.


You do the exact same work, but receive stickers or points or badges for doing so.

Example: Kahn Academy badges, arguably all grades.  Extra Credits describes an intricate system here.

I was a grade grubber for years, and I’ll admit I still kind of miss the structure of school.  But gamification wears off really quickly, and Alfie Kohn has made a career out of arguing that extrinsic rewards are inherently harmful.  The one benefit I see in Extra Credit’s system is that it would reinforce students for other students’ performance, cutting down on bullying the smart kids.  It may also encourage the strongest students to help the weakest ones.  Or it might make everyone hate the weak students or help them cheat so they can get a pizza party.  Kids will do a lot for a pizza party.  And they may start to resent the smart kids for not helping enough.  So I guess I’m against this, but that may stem from years in the worst possible educational environment.

In that same video EC suggests the much less likely to backfire benefits of tailored difficulty curves and immediate feedback.  These strike me as much more valuable, but they will mostly fall in another category.

Familiarity Builders:

These are games that don’t really teach you anything you could use on a test, and often have fictionalized elements, but do build conceptual fluency, which can make it easier to learn real things later. Pretty much any game set in the real world qualifies for this, but my personal favorite is Oregon Trail for introducing millions of school children to the concept of dysentery.

Counterpoint: I misspelled dysentery when uploading this photo. Twice.

A lot of the games on Extra Credits’s steam shelf fall into this category.  I was initially pretty dismissive of this, but I’ve changed my mind.

There are a lot of reasons that middle class + white children do better at school than poor + minority children, but one of them is the amount and type of knowledge they’re exposed to at home.  Poor parents flat out talk to their children less, which gives them less time to transmit knowledge.  They’re also less likely to have the kind of knowledge their children will be tested on at school.  As Sharon Astyk so heartbreakingly puts it, her foster children needed to be taught how to be read to but had a highly developed internal map of food-containing garbage cans.

There’s no video game for learning to not chew on books.  But there are lots of video games with maps.  A big part of my 6th grade social studies class was blank map tests, where we would be given a blank map and have to label all the countries.  We had a decent teacher, so I suspect this was fluency building and not drilling for drilling’s sake: when we read about Egypt and Greece and Rome, she wanted us to be able to put events in geographical context.  I didn’t know where every country was, but I did know, or at least recognize, the names of most countries.  This put me strictly ahead of the girl who called Syria “cereal”.  When we took tests I only had to put effort into remembering locations, she had to put effort into locations, and names, and possibly what a country was.  And it’s really hard to put in that effort when you don’t see a point and this other girl is doing so much better than you without even trying.  Carmen San Diego, or any video game with a strong sense of real-world place, could have given her a way to catch up.  Even if it wasn’t fun, it wouldn’t have had the same ugh field around it that studying did.

This is related but not identical to what Extra Credits describes as “familiarity builders“, where the goal is basically to make something interesting enough people look up the actual facts on wikipedia later.

Drill and Killers:

These games overlay what they’re trying to teach on top of typical game mechanics.  These are more than fluency builders because they use the same skills you’d use on a test or in real life, but they don’t teach you anything new, they just give you practice with what you already know.  Examples: Math Blasters, Mario Teaches Typing.

How good these are depends on who you ask.  I suspect you need a certain minimal fluency to make them at all fun, which makes the difficulty curve important.  And they’re less fun or game-like than the other types on this list.  But some things just have to be drilled, and video games are a more fun way to do that than flash cards.

Abstract Skill Builders:

Games that teach useful skills.  They would need translation to be anything useful on a test or in real life, but it does build up some part of the brain.

Example: Logical Journey of The Zoombinis, which teaches pattern matching, logical thinking, and arguably set theory.

On one hand, abstract skills like these are very hard to teach.  On the other hand, people are very bad at transferring skills from one domain to another (which is why some people can make change just fine but have trouble with contextless arithmetic, or can do arithmetic but not word problems).  On the third hand, people are very bad at transferring skills from one domain to another, so if there’s a tool that helps than learn that, it would be very valuable.


You don’t technically have to learn anything to play these games, but you will be rewarded if you do.

Example: Sine Rider. Technically you can get by with guess and check, but you will win faster if you actually understand trigonometric equations.

Seamless Integration:

A step further than skill builders or drillers, these games have both flavor and mechanics based in the subject you are trying to teach.   These games are not necessarily complete substitutes for textbooks because it’s hard for them to be comprehensive, but they do completely teach whatever it is they’re trying to teach.

Examples: Immune Defense, from which I actually learned several things about the immune system.  Dragonbox, which teaches algebra.

I am *this close* to remembering which is antibody and which is antigen
I am *this close* to remembering which is antibody and which is antigen

Humans are complicated, children are even more complicated

[Had more dental surgery this week and am currently suffering from pain-induced ADD.  Expect less research and more wild speculation]

Consider pre-emptive testing for psychiatric or developmental issues in children.  If you’re too aggressive, you end up misdiagnosing a lot of perfectly normal deviations from the exact median as development issues in need of treatment.  Development is complicated, different systems come on line at different rates and in different orders in different kids, and they should be allowed to do that without being corralled into fitting a predetermined schedule .

But if you’re not aggressive enough, the kids develop coping mechanisms that hide the disability, making it harder to diagnose and treat.  Sometimes people treat this as solving the problem (especially for conditions that are often conflated with character flaws, like ADHD or some forms of depression), but they are wrong.  At best lack of treatment holds people back from their true potential, at worst it twists up their internal structure in ways that break at the worst possible time (usually grad school).  It’s a big problem with twice exceptional children, who have both brain-based deficiencies and a lot of raw intelligence, and I suspect for people with atypical presentations of their disabilities.  E.g. girls with ADHD or autism spectrum issues, boys with depression* or trauma from sexual abuse.**

Even perfectly accurate testing won’t fix this, because developmental asynchronies do not necessarily indicate a future problem, and treating them can prevent the issue from fixing itself.  The real issue is distinguishing natural, healthy leveling out from the development of costly compensation mechanisms, and we don’t know how to do that.

*Assuming the comomn adult male pattern of depression being expressed as anger holds true for boys as well.

**I think, couldn’t actually find data on this.

The Talent Code: Two Truths and a Lie

The Talent Code (Daniel Coyle) makes three claims: that myelination is instrumental in learning, that skill is built by by methodically breaking down actions into component parts and perfecting them, and that these two facts have anything to do with each other.

In TED talk form:

Some background: your brain is made of nerve cells, which connect to each other and to other nerves outside the skull.  We have only the foggiest idea what brain cells do, but we’re pretty sure the external nerve cells are for controlling muscle movement and reporting sensory data.  Nerve cells communicate with each other by extending a long arm (called an axon) from their body to meet an axon from another nerve.  Signals travel down an axon electrically, and between axons chemically.  Like any electrical charge, nerve signals are subject to resistance and decay.  To prevent this they are wrapped in myelin, a mostly fatty substance that insulates the axon.

I had never heard of myelin being involved in learning, and in fact it’s not on the wiki page, but deeper googling reveals that there is some fairly compelling research to back this up.  Einstein had an unusually high number of glial cells (which, among other things, produce myelin).  White matter (made up mostly of myelin and glial cells) volume in fine-motor-control areas in the brains of pianists correlates with self-reported practice hours.  Most compellingly, mice prevented from producing new myelin are unable to learn a new task but maintain previous learning.  And it makes a certain amount of intuitive sense that a substance that protects and speeds up nerves would be involved in learning.   However, I don’t see anything here that tells us how a specific act of learning affects myelination of specific cells.  Coyle’s explanation of this is so dumbed down I immediately want to trounce it, but as far as I can tell it’s a reasonable summary of the data for his purposes.

His recommendation to practice by breaking down a skill into component parts and refining them to perfection seems entirely reasonable to me.  He cites a little bit of science for this, but mostly it’s just his observations of various talent hotbeds (The Spartak Tennis Club in Russia, KIPP schools, ).  He believes these hotbeds stem from a combination of this cultivated practice and “ignition”, the ability to make a kid believe they can be successful at something.  No doubt those are both helpful, but I don’t see any evidence that those factors and only those factors distinguish the talent hotbeds

This was originally going to be part of a longer series on several books with “talent” in the title, but there is only so much “intelligence is irrelevant, practice is everything” followed by absolutely no guidance on practice I can read.  So, here you go.

Bug or Feature? SAT edition

A few weeks ago there was a Less Wrong thread about truly brilliant people, especially mathematicians, who often got good but not perfect SAT scores.  The consensus was that the SATs were a better test of how long you can go without making a mistake than of genius.  At the time I read this I (who got good but not perfect SAT scores) was all “yeah, the SATs are bad at measuring brilliance.  And I did better in more advanced classes than I did in the intro ones, because the intro ones were about how close you came to matching their expectations, and the advanced ones were about original thought.  In fact the smartest people will do worse, because this is so trivial to them it is boring.  I sure hope the SATs feel bad for failing to recognize my their brilliance.”

I was about 10% of the way through Safe Patients, Smart Hospitals when I realized that if I am recovering from dangerous surgery and need a central line*, it is more important that my doctor can follow the safety checklist without getting bored than that he be capable of original thought.  Like, way, way more important.  We need doctors capable of original thought somewhere, so they can invent new procedures and drugs and things, but outside of their magesteria they do more harm than good.

Gregory House would be terrible at inserting central lines. That’s why he has Taub.

So maybe the SATs are doing a valuable service by injecting a little bit of what it takes to succeed in the real world into their otherwise-pretty-much-an-IQ-test.  And maybe we should start selecting doctors for what they actually do most of the time.  Alternately, maybe we should move central-line-type work to techs and computer algorithms and use doctors for research and cases weird enough to be on TV.  But what we should definitely not do is select people for brilliance and make lives depend on their ability to work methodically.

*Central lines deliver fluids better than IVs but are more vulnerable to infections, which can be fatal, especially in people recently weakened by trauma or illness, which is everyone who is getting a central line.  You can greatly reduce the chance of an infection by following a fairly simple list of steps like “use gloves” and “sterilize skin”, but these are often skipped.

Gender-based variation in grading and teacher attitudes.

Jezebel (via NYT): “Girls Outscore Boys on Math Tests, Unless Teachers See Their Names”
New York Times:  “How Elementary School Teachers’ Biases Can Discourage Girls From Math and Science”
Study Abstract:  “We’re going to skip explaining how we proved gender bias and just talk about its effects”
Actual Study (no public link): “Young Israeli girls outscore boys on anonymously graded national math exams but receive lower classroom grades, but eventually begin to underscore them in national exams as well.  The size of the discrepancy in scores is positively correlated with discrepancy in teacher attitude reported by boys and girls.  This pattern does not hold for English or Hebrew.”

I went in to reading this study pretty guns blazing, but it actually looks quite well done and robust.  You could argue that the teachers and tests are evaluating different things and the teachers’ goals are not necessarily worse, but

  1. Stereotypically, girls are better at pleasing teachers than boys.  And that is in fact the pattern we see in Hebrew and English.
  2. Low-biased teacher grades was correlated with a decrease in performance among girls in later grades (beyond that that would be predicted by low grades alone). The best case scenario is that the teachers are spotting some hidden weakness in the girls that the lower grade tests didn’t cover.  Except…
  3. Grade bias was positively correlated with negative student reports of the teachers attitude, and specifically discrepancies in the attitude reported by girls and boys.

So the actual study is pretty impressive, and astonishingly so for being in the field of education.    Touche, Lavy and Sand.  I also found it interesting that bias against girls was strongly correlated with the socioeconomic status of girls in the class as a whole, but not with any individual girl’s SES.  E.g. having a poor girl from a large family with uneducated parents lowered the grades of other girls in the same class, regardless of their own status, which suggests all kinds of unpleasant things.

The popular reporting on this paper is less impressive.  Jezebel flat out lies, implying that the same test was graded blindly and with the name (but no other data) available, which led to 100 comments asking how math grades could even vary that much, and 100 other comments saying “partial credit for showing work”.  The New York Times isn’t quite so egregious but does describe the input as “The students were given two exams, one graded by outsiders who did not know their identities and another by teachers who knew their names.”  That’s technically true, but implies that the two exams were much more similar than they actually were.   I expect this kind of crap from Jezebel, but the New York Times shouldn’t have to sensationalize results that are already this interseting.