feeling the future
Most science papers don’t begin with a description of psi, those “anomalous processes of information or energy transfer” that have no material explanation. (Popular examples of psi include telepathy, clairvoyance and psychokinesis.) It’s even less common for a serious science paper, published in an elite journal, to show that psi is a real phenomenon. But that’s exactly what Daryl Bem of Cornell University has demonstrated in his new paper, “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect,” which was just published in The Journal of Personality and Social Psychology.
Bem’s experimental method was extremely straightforward. He took established psychological protocols, such as affective priming and recall facilitation, and reversed the sequence, so that the cause became the effect. For instance, he might show students a long list of words and ask them to remember as many as possible. Then, the students are told to type a selection of words which had been randomly selected from the same list. Here’s where things get really weird: the students were significantly better at recalling words that they would later type.
Or consider this experiment, which is a direct test of precognition. Bems provided the following instructions to subjects:
This is an experiment that tests for ESP. It takes about 20 minutes and is run completely by computer. First you will answer a couple of brief questions. Then, on each trial of the experiment, pictures of two curtains will appear on the screen side by side. One of them has a picture behind it; the other has a blank wall behind it. Your task is to click on the curtain that you feel has the picture behind it. The curtain will then open, permitting you to see if you selected the correct curtain. There will be 36 trials in all. Several of the pictures contain explicit erotic images (e.g., couples engaged in nonviolent but explicit consensual sexual acts). If you object to seeing such images, you should not participate in this experiment.
The location of the image was selected at random by the computer, which means that students should have correctly guessed the location of the pornography 50 percent of the time. However, it turned out that over 100 sessions, the subjects consistently performed above chance, and correctly located the porn 53.1 percent of the time. Interestingly, their hit rate on “non-erotic pictures” did not deviate from chance. (They found neutral pictures, for instance, 49.8 percent of the time.)
The power of Bem’s paper is cumulative. In total, he describes the results of nine different experiments, conducted on more than 1000 subjects. All of the experiments revealed slight yet statistically significant psi anomalies, with an average effect size of 0.21 across all experiments.
However, the real contribution of this paper isn’t even these statistically significant results. Instead, it’s Bem’s attempt to create rigorous, well-controlled tests of psi that can be replicated by independent investigators. Because here is the dirty secret of anomalous phenomena like telepathy and clairvoyance: They’ve been demonstrated dozens of times, often by reputable scientists. (Bem is an extremely well-respected psychologist, best known for his work on self-perception.) Why, then, do serious scientists dismiss the possibility of psi? Why do rational people assume that parapsychology is bullshit? Because these exciting results have consistently failed the test of replication.
And this is why Bem’s paper is so important: It provides the first testable framework for the investigation of anomalous psychological properties. Unlike most tests of psi or ESP, Bem’s research builds upon well-known experimental paradigms, and minimizes the contact between the experimenter and the subject. The data collection was automated and accurate; the paper passed peer-review. (Charles Judd, who oversaw the review process at JPSP, said: “This paper went through a series of reviews from some of our most trusted reviewers.”) Only time will tell if the data holds up. But at least time will tell us something. Bem ends the paper with a reference to Lewis Carroll:
Near the end of her encounter with the White Queen, Alice protests that “one can’t believe impossible things,” a sentiment with which the 34% of academic psychologists who consider psi to be impossible would surely agree. The White Queen famously retorted, “I daresay you haven’t had much practice. When I was your age, I always did it for half-an-hour a day. Why, sometimes I’ve believed as many as six impossible things before breakfast.”
A Replication of the Procedures from Bem (2010, Study 8) and a Failure to Replicate the Same Results
Carnegie Mellon University
Leif D. Nelson
University of California, Berkeley - Haas School of Business
October 29, 2010
We replicated the procedure of Experiment 8 from Bem (2010), which had originally demonstrated retroactive facilitation of recall. We failed to replicate the result. The paper includes a description of our procedure and analysis as well as a brief discussion for some reasons why we obtained a different result than in the original paper.
Number of Pages in PDF File: 12A replication of the procedures from Bem (2010, Study 8) and a failure to replicate the same results.Jeff GalakCarnegie Mellon UniversityLeif D. NelsonUniversity of California, BerkeleyFirst Posted Online: October 29, 2010Most Recently Updated: October 29, 2010Address correspondence to either author at jgalak@... or leif_nelson@....Electronic copy available at: http://ssrn.com/abstract=1699970AbstractWe replicated the procedure of Experiment 8 from Bem (2010), which had originally demonstrated retroactive facilitation of recall. We failed to replicate the result. The paper includes a description of our procedure and analysis as well as a brief discussion for some reasons why we obtained a different result than in the original paper.A replication of the procedures from Bem (2010, Study 8) and a failure to replicate the same results.Recently, Bem (2010) published an extremely thought-provoking article demonstrating the existence of precognition, “the conscious cognitive awareness… of a future event that could not otherwise be anticipated through any known inferential process.” Through nine meticulously constructed experiments, using a range of tasks, Bem finds consistent support for the idea that people have precognitive abilities. As Bem suggests, the purpose of the paper was not exclusively to simply report evidence relevant to precognition, but also to develop procedures “that can be replicated by independent investigators (p. 3)”. We sought out to replicate one of those procedures.For our experiment we chose to do a replication of Experiment 8, the retroactive facilitation of recall. Below we detail the exact procedure, but in a rough sketch, people were shown a list of words and then asked to freely recall as many as possible. Participants were then randomly assigned to practice half of the list of words. Evidence of precognition would be observed if people freely recalled more of the words that they subsequently practiced than of the words that they subsequently did not practice.It is worth giving a brief consideration to why we chose that particular paradigm. Seven of the nine experiments hinge on an affective response; arousal to erotic images, a preference to avoid a negative image, etc. There is nothing wrong with any of those procedures, but they require some judgment calls from the experimenter in selecting stimuli. As Bem reports, for example, finding stimuli that lead to sensitization or habituation can be tricky. It requires pretesting, of course, but it also leaves open a pretty easy explanation for a null finding: “maybe the stimuli weren’t chosen correctly to be sensitive to the presence or absence of the effect.” The comparatively simple retroactive memory experiments (8-9) seem to offer an easier set-up: choose 12 familiar words from four categories and let randomization take care of the rest. Even this procedure has its problems however. Most notably,humans are helpful in scoring the output of the participants. This raises two issues. First, it requires a little bit more labor from the experimenter, and therefore makes it a little more difficult to run larger samples or run multiple tests. Second, it leaves room for bias. As far as we can tell, this is trivial or non-existent. A participant who recalls the word Apple but types it as Aple, has clearly recalled the word. Furthermore, as we describe below, all coding is done blind to whether the words were from the practice or control sets, and is therefore unbiased. Nevertheless, given that the history of Psi research is marked by subtle influences of experimenter bias, it would be preferable to have a procedure which removed even that small human element from the analysis.MethodParticipants from an online participant pool (n = 112; Median Age = 38, 88 Females, 73% White, 7% Chinese, 5% Black or African American, 5% Hispanic, 10% Other) were recruited to participate in an experiment on extrasensory perception (ESP). Participants were compensated by earning entry into a lottery for $100, a standard incentive offered to participants in this pool1.Participants first read and agreed to a consent form mentioning again that the study was investigating ESP and then read a brief introductory statement almost identical to the one used by Bem (2010). “This experiment tests for ESP (extra sensory perception) by administering several tasks involving common everyday words. The experiment takes about 15 minutes to complete. The program will give you specific instructions as you go. At the end of the session, the computer will explain to you how this procedure tests for ESP.” When participants had finished reading the statement (after a forced time delay), they clicked continue and proceeded to the next screen.On the two subsequent screens participants answered the same stimulus-seeking items that Bem administered. Both were phrased as “To what extent is the following statement true of you:”, and the first statement was “I am easily bored” and the second as “I often enjoy seeing movies I’ve seen before.” Responses were collected on a 5-point scale anchored at 1 (Very Untrue) and 5 (Very True).Participants then went through a 3-minute relaxation procedure as described in the original paper: people looked at an astronomical photograph while listening to relaxing music. When the 3-minutes had ended, participants clicked a button to acknowledge they were ready, and received instructions about the task. Participants were told:“Next, we would like you to look at a list of 48 common nouns one at a time, for 3 seconds. While looking at each word, please visualize the corresponding object. For example, if the word is "house", please imagine a house. It is absolutely critical that you focus on only this task and do not perform any other tasks (e.g. check email). When you are ready to begin, please click continue.”After participants clicked continue they were shown the series of words, each for 3 seconds. As with Bem, the words were drawn from 4 categories: food, animals, occupations, and clothes (see table 1 for a full list of the words). Mirroring Bem's procedure, the words were presented in a predetermined random order (the same order for all participants). After all 48 words had been presented, participants were asked to type any words that they recalled. They had as much time as they wanted, and when they were finished they clicked a button to go to the next stage.At that point the program randomly assigned 24 words to be practiced; 6 randomly chosen from each of the 4 groups of 12 words. The practice sessions asked people to look at the list of 24 words, and on successive screens, first click on the six words from a specified category (at which point the words became highlighted) and then to retype those words in six boxes below. They could not continue the experiment until they correctly clicked on the appropriate six words and typed the six words in the corresponding boxes.Table 1List of Words Used by CategoryFoodAnimalsOccupationsClothesapplealligatoraccountantcoatbagelcatathletedressbreadcowbartenderhathamburgerdogdoctorjeanslasagnadolphinengineerpantsomeletfrogfiremanshirtorangegoatfishermanshoespizzahorsejanitorshortssaladlionmusicianskirtsandwichmonkeyplumbersocksspaghettipigpolicemansuitsteakrabbitteacherunderwearNote—Words are presented alphabetically in this Table, but were presented randomly (across and within categories) to participants.When the practice session was complete, participants answered one more question: “It is very important for us to know if you were not paying 100% attention to this study (e.g. checking email, going to the bathroom). You will not be penalized in any way if you did other tasks and you will be entered into the lottery regardless of how you respond. So please be honest! Did you, at any point during this study, do something else (e.g. check email)?”. Participants could check a box corresponding to either “No, I paid 100% attention to the study” or “Yes, I did other things during the study”.ResultsIn order to assess whether or not we observed retroactive facilitation of recall we first had to determine which words were recalled as a function of the ones that were practiced or not. On the surface, this seems like a trivial task, but given that spelling errors were rather prevalent, complete computerized automation could not be used. Instead, we coded the words in a two-stage process. First,all entered words that perfectly matched any of the 48 words from the set were coded as either coming from the practice set of words or coming from the control set of words (about 90% of all words fell into one of these two categories). This was done automatically by a computer program. Next, any listed words that did not match any of the 48 words from the set were manually checked, one at a time, to assess whether they were simply misspelled words (e.g. spageti) or words that were not in the main set of words (e.g. home). In all cases, the determination of whether a word was a misspelling was entirely clear, and furthermore, in all cases the coder was entirely blind as to whether the words were drawn from the practice set or the control set.Bem (2010) computed a weighted differential recall score (DR) for each participant using the formula:DR = (Recalled Practiced Words - Recalled Control Words) ×(Recalled Practice Words + Recalled Control Words)In the paper, for descriptive purposes, Bem frequently reports this number as DR%, which is the percentage the score deviated from random chance towards the highest or lowest scores possible (-576 to 576). We conducted the identical analysis on our data and also report DR% (see Table 2). In addition to using the weighted differential recall score, we also report the results from using a simple unweighted recall score, the difference between recalled practice words and recalled control words. For both of these measures, a score of 0 is predicted by random chance, and analysis was conducted using a one-sample t-test.We did not find any evidence of precognition, as people recalled slightly fewer words from the practice set than from the control set (see Table 2). One concern with the experiment was that it was conducted over the internet and it is unclear the extent to which people fully attended to the keyelements of the procedure. We used two methods for excluding these hypothetical inattentive participants. First, we asked them to self-report if they had stopped paying attention at some point during the experiment. Eight people said that they had. Second, we looked at how long people spent on the recall task. Our reasoning was that if people went through that task unusually quickly, it might reflect that they were not particularly focused on the task. The distribution of time was necessarily skewed (i.e., people could take as long as they wanted, but they couldn’t go any faster than 0 seconds) so there were no participants who were more than two standard deviations below the mean (which would have reflected negative time on task). We instead used a cutoff of 1 standard deviation below the mean, and this cutoff excluded 7 people from the sample. As can be see in Table 2, neither of these exclusions (either alone or in combination) had any appreciable influence on the effect.Bem (2010) reported a relationship between sensation seeking and precognitive abilities. He reports a positive correlation across the nine experiments in the paper (r = .18) and in Experiment 8 in particular (r = .22, p = .014). We did not replicate that result. With increases in sensation seeking there was a tiny, and entirely nonsignificant, decrease in precognitive ability (as reflected by DR%), r = -.063, p = .51.Table 2Experiment ResultsWeighted Differential RecallSimple Differential RecallPercentage of Participants differentially recalling Practice and Control wordsNP1C2Mean (DR%)Statistic3MeanStatisticP>CP = CP<CBem (2010, Study 8) Results1002.27%t(99) = 1.92, p = .029Full Sample1128.098.43-1.35%t(111) = -1.31, p = .194-.34t(111) = -1.12, p = .26538.4%(43 of 112)13.4%(15 of 112)48.2%(54 of 112)Removing Self-Identified Inattentive People1048.308.60-1.34%t(103) = -1.21, p = .230-.30t(103) = -.93, p = .35739.4%(41 of 104)13.5%(14 of 104)47.1%(49 of 104)Removing People who were too fast on the recall portion (<1SD)1038.598.99-1.52%t(102) = -1.36, p = .178-.40t(102) = -1.22, p = .22637.9%(39 of 103)12.6%(13 of 103)49.5%(51 of 103)Removing people from either of those two categories.958.869.22-1.52%t(94) = -1.26, p = .212-.36t(94) = -1.02, p = .31038.9%(37 of 95)12.6%(12 of 95)48.4%(46 of 95)1 P = the number of practice words correctly recalled (out of 24 possible)2 C = the number of control words correctly recalled (out of 24 possible)3 Bem uses 1-tailed tests (with good justification) in his paper. Because our replication was not after a specific hypothesis (we were equally open to evidence for precognition and anti-precognition), we report the p-values from two-tailed tests.DiscussionWhy do we not see any evidence of precognition? There are obviously a multitude of possibilities for why we failed to obtain a result similar to Bem, ranging from the mundane (e.g., our sample was more heterogeneous than Bem’s) to the exotic (e.g., the quantum mechanics that allow for the detection of future events are also contingent on the specific physical features of the original experiment rooms). For the purposes of this paper we really only care about one possibility: Do we fail to detect precognition because precognition does not exist? In answer to this question we emphatically say, “We don't know. On the one hand, we fail to replicate the effect, but on the other hand, our single failure to replicate is hardly sufficient to seriously undermine an entire paper.”Bem presented nine experiments confirming the existence of precognition. We present one experiment (similarly powered, but burdened by other idiosyncrasies) which seems to show a null effect. In every other psychological domain, that should rightfully be identified as a mild challenge to the original hypothesis, but hardly a severe threat. If we knew for certain that precognition did exist, pure randomness would frequently produce a null effect in this experiment, or even the mild reversal we document.Rather, we interpret our finding as providing exactly the sort of publicly available evidence Bem called for: an effort to scientifically investigate the existence of precognition using a generally valid and agreed upon methodology. His hope, and ours, is that perhaps a handful of researchers will make similar efforts to clarify the effect. Without a doubt, if the effects are real and valid, they would constitute a substantial advance in psychology. Even if that finding feels unlikely2, its importance would seem to merit the effort of investigation.For simplicity, we offer two easy subheadings:What do we claim? That we conducted a very close replication of Bem (2010, Study 8) and failed to obtain a reliable result.What do we NOT claim? That we have disproven Bem (2010). (We are merely trying to add more data relevant to the question.)3ReferencesBem D. J. (2010), Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect. Journal of Personality and Social Psychology, in press.Hróbjartsson, A. & Gøtzsche, P. C. (2010), Placebo interventions for all clinical conditions. Cochrane Database of Systematic Reviews 2010, Issue 1. Art. No.: CD003974.Geiger, H. & Marsden, E. (1909). On a Diffuse Reflection of the α –Particles. The Royal Society, 82 (557), 495-500.