Read Labyrinths of Reason Online
Authors: William Poundstone
The unexpected hanging is a cautionary tale of deduction. The prisoner deduces that it is impossible to hang him on Sunday and is left only with Saturday. His fatal error is thinking that eliminating the impossible guarantees that something possible will be left over. Sometimes every road leads to contradiction.
The lawyer glimpsed more of the truth in concluding that the order couldn’t be carried out. Neither lawyer nor prisoner took the crucial final step: If the prisoner accepts the impossibility of carrying out the order, then the executioner can hang him any day, even the last, and it will be unexpected.
1
There are a number of formulas that produce primes for a while and then fail. One of the best-known is
n
2
—
79
n
+ 1601, which works for all values of
n
up to 79, then produces a nonprime number at 80. Such are the dangers of inductive generalizations in mathematics.
2
To which Alan L. MacKay replied: “How can we have any new ideas or fresh outlooks when ninety percent of all the scientists who have ever lived have not yet died?”
Y
OU ARE THE HEAD of a university psychology department conducting a bizarre experiment on human subjects. Person A sits at a desk, working on a psychological test. Person B sits opposite him, watching his progress. In front of B is a push button. B has been told that pushing the button causes A to receive an excruciatingly painful electric shock (though no permanent injury). Periodically, Professor Jones walks over to A’s desk, notes an incorrect answer, and instructs B to press the button.
A is really Jones’s confederate. The button is not connected to anything, and A is faking his pain when the button is pressed. Jones is conducting the experiment only to see if B will go along with his instructions to “punish” A. Jones’s pet theory is that most people will countenance cruelty if it is approved by an authority figure.
Jones has tried the experiment with ten different B’s and eight of them have pressed the button.
Professor Jones is himself unaware of this Kafkaesque irony:
He
, Jones, is the real subject of
your
experiment. You are interested in the “fudge factor”—or “experimenter bias effect”—in psychological experiments. When a researcher
expects
a certain result in a psychological experiment, he is more likely to get that result. Research tends to support the researcher’s pet theories—which means something is wrong.
In other lines of research, the experimenter bias effect can be reduced or eliminated. Tests of new drugs are “double-blind” experiments in which some subjects receive the drug, some receive a placebo, and neither subject nor experimenter knows which is which until after the results are in. That prevents the experimenter from communicating enthusiasm for the new drug to those receiving it.
But double-blind controls are almost impossible in some psychological studies. The experimenter necessarily knows what’s going on. Take Jones. He expects his subjects to “turn Nazi” and therefore most of them do. Whereas Professor Smith, who believes people are basically decent, did the same experiment and reported that only one person in ten would push the button. The problem isn’t conscious fraud; it’s subconscious fudging. Both Smith and Jones tend to interpret ambiguous outcomes in favor of their desired conclusion. When Jones tells his subjects to push the button, he is harsher, more imperative than Smith was. Possibly Smith and Jones selected their B’s so as to get the wanted outcome. Neither researcher is aware of it, but they are creating self-fulfilling prophecies.
If the experimenter bias effect is widespread, it will have drastic implications for research on human subjects. So you convinced a big foundation to fund your experiment. The subjects of your experiment are other psychologists who have no idea what’s really going on. The foundation gave you enough money to fund Jones’s experiment, and Smith’s, and many others. You don’t care one whit about what Jones and Smith and all your other subjects find out in their experiments. The idea is solely to measure any suspicious correlation between a researcher’s preconceptions and his results. You have observed many, many psychologists, of varied personalities, running all conceivable types of experiments on unknowing human subjects. The evidence is clear: The experimenter bias effect is both overwhelming
and universal. In 90 percent of all cases, the outcome of psychological experiments is whatever the experimenter expected.
And that’s the problem.
This
result is exactly what
you
expected. If your study is correct, then the results of psychological experiments on human beings are invalid.
Your
study is a psychological experiment on human subjects. Therefore, your study is invalid. But if your study is invalid, then there is no reason to believe in the experimenter bias effect, and quite possibly your study is valid, in which case it’s invalid …
“All generalizations are dangerous, even this one,” goes an epithet of Alexandre Dumas
fils
with more than a passing resemblance to the above. The “expectancy paradox” is also reminiscent of the paradoxical situation in Joseph Heller’s novel
Catch-22:
There was only one catch and that was Catch-22, which specified that a concern for one’s own safety in the face of dangers that were real and immediate was the process of a rational mind. Orr was crazy and could be grounded. All he had to do was ask; and as soon as he did, he would no longer be crazy and would have to fly more missions. Orr would be crazy to fly more missions and sane if he didn’t, but if he was sane he had to fly them. If he flew them he was crazy and didn’t have to; but if he didn’t want to he was sane and had to.
Compare that with the famous and probably apocryphal story about Protagoras (c. 480–411
B.C
.), founder of sophism. Protagoras was the first teacher in ancient Greece to charge money for his lessons. One student of the law struck a bargain with Protagoras: He would pay his tuition upon winning his first law case. If the student lost his first case, he would pay nothing. The student tried to get out of the deal by refusing to accept cases. Protagoras had to sue the student to get his money—and the student defended himself. If the student lost, he would not have to pay, and if he won, he would not have to pay.
(So the story goes, anyway. One may imagine that if the student prevailed on the matter of whether he could postpone taking his first case, Protagoras could immediately demand his fee, and if necessary sue him again for cut-and-dried breach of contract.)
An element common to each of these paradoxes is categories or sets that can contain themselves as members. The crux of the expectancy paradox is that the experiment pertains to the class of experiments
on humans, and the experiment is itself in that class. The classic illustration of sets containing themselves as members is Bertrand Russell’s “barber paradox”: In a certain town, the barber shaves everyone who doesn’t shave himself. That is, he shaves
only
those men who don’t shave themselves, and
every
such man. Does the barber shave himself? There is no way the barber can live up to his reputation. If the barber doesn’t shave himself, he must shave himself, and if he does shave himself, he can’t shave himself.
All the above are paradoxes in puzzle’s clothing. It sounds at first like there is some resolution to be found, and that once you find it, you’ll be able to say, “Aha! This is what would really happen.” Then you realize that it’s hopeless. No matter what you assume, you end up with an impossibility.
One common reaction to the type of paradoxes above is to wonder if they are “possible”—that is, if they could ever occur in the real world. In some cases, the answer is certainly yes. Protagoras’ lawsuit could have taken place (presenting the judge with a difficult decision); the military could (probably does) have confusing and contradictory rules. A barber could shave
everyone else
in town who doesn’t shave himself—leading townsfolk to make Russell’s claim about him—though he still could not truly fulfill the claim.
Real experiments have supported the experimenter bias effect (which has even rated an acronym: EBE). In 1963, Robert Rosenthal and K. Fode reported three studies showing a significant effect. Rosenthal and Fode assigned a number of college students to conduct sham experiments on human subjects. The subjects were shown photographs of assorted individuals and asked to decide if the individuals had been “experiencing success” or “experiencing failure.” About half the student experimenters were led to believe that their subjects would favor “success” responses; the other group was told to expect “failure” responses. Then the reported results of the sham experiments were compared. Since the sham experiments should have produced the same results each time, the differences were presumed due to the experimenter’s expectations. Later studies by Rosenthal further investigated the effect. Rosenthal went so far as to suggest that future experiments on humans might have to be conducted via automated procedures to avoid the taint of bias.
Other researchers were unable to replicate Rosenthal’s findings. The matter came to a head in a 1969 issue of the
Journal of Consulting
and Clinical Psychology
. The journal published, back to back, a study by Theodore Xenophon Barber and colleagues carefully duplicating Rosenthal’s experiments but finding absolutely no evidence for the bias effect; a defensive rebuttal by Rosenthal; and a peevish counterrebuttal by Barber. The undercurrent of irritability sublimated in scientific nitpicking resulted in such deadpan statements as the following (from Barber, in response to Rosenthal’s objection that Barber had replicated the experiment at an all-female school): “If Rosenthal is seriously contending the Experimenter Bias Effect is more readily obtained in coeducational state universities than in other types of colleges or universities, he should present data to support the contention.”
Subsequent studies have further weakened the case for a widespread bias effect. At least forty studies published from 1968 to 1976 found no statistically significant experimenter expectancy effect, and six others provided but weak evidence of it.
For the expectancy
paradox
to exist in the real world, it would have to be determined that the expectancy effect is both universal and unavoidable. There would be no problem if it’s just
some
psychologists who fall victim to the effect. Then the experimenter could be a careful, coolheaded psychologist measuring the foibles of his sloppy colleagues. Just as paradox requires that a Cretan utter “All Cretans are liars,” it is necessary that an experiment of a certain type assert the unreliability of
all
experiments of that type.
In reality, it is unlikely that the expectancy effect would be ubiquitous. For that reason, even the actual studies purporting to demonstrate the effect are not necessarily caught up in the maelstrom of paradox.
Okay. But what would it mean if it was indeed determined that the results of all experiments on human beings are invalid, including the experiment that determined that fact? Could that happen?
There is a distinction between falsehood and invalidity. If an experimental result is false, it’s false, but if the experiment is merely invalid (through careless procedure, lack of controls, etc.), its results may be true or false. An invalid experiment may support a hypothesis that happens to be true (call this a “Gettier experiment”).
In the liar paradox, an assumption of truth leads to falsehood, and an assumption of falsehood leads to truth. But are we talking about the truth/falsehood or validity/invalidity of the expectancy effect experiment here? It is not immediately clear. Let’s list all the possibilities, as we might do with a logic puzzle.
(a) Assume the study’s results are true. If they are, then psychological experiments on humans can’t be trusted. (The study does not purport to show that the results of psychological experiments are invariably
wrong
, just that you can’t go by them.) Therefore the experiment can’t be trusted either. Its conclusion
could
be true—and in fact
is
true by our assumption—but the study is not valid evidence for it. The study is a Gettier experiment, and this is a possible if ironic state of affairs.
(b) Assume the study’s conclusion is false. Then there is no universal expectancy effect. The study’s conclusion could be and presumably is false for some other reason. (If the conclusion is false, then the study must also be invalid.) Again, a possible state of affairs.
(c) Assume the study is valid. Then its conclusion is true and the experiment is invalid: a contradiction.
(d) Assume the study is invalid. Then its conclusion may be true or false: no contradiction there.
In short, if someone did a study purporting to show that the experimenter expectancy effect is universal, the tenable criticisms would be that either (a) the conclusion expresses a serendipitous truth that is, however, not justified by the study, which is invalid; or (b) the conclusion is false, and the study invalid; or (d) the study is invalid, period. No matter what, you are forced to conclude that the study is invalid.