An Empirical Evaluation of a Set of Recommendations for Extrasensory Perception Experimental Research

One of the main criticisms of extrasensory perception (ESP) research is the lack of replication of positive results across laboratories. In this paper we report a study (N=100) where we tested a set of practices recommended by researchers in the area in order to develop a robust 'recipe' for ESP experimental research. In an experimental condition that included these practices we observed a 30% rate of correct guesses (z=0.82, p=0.21, one-tailed) compared to a 22% rate observed in a control condition (z=-0.49, p=0.31, one-tailed). It is discussed how results obtained so far, with free-response protocols, are not strong enough to fully satisfy mainstream science.

that researchers so far have not been able to outline an experimental protocol to replicate the phenomenon consistently across laboratories (Hyman, 2010, Milton & Wiseman, 2001).Researchers in the area, in contrast, argue that although 'total' replicability may not have been achieved, there have been more significant studies than expected by chance (Bem, Palmer, & Broughton, 2001;Storm & Ertel, 2001, 2002;Storm, Tressoldi, & Di Risio, 2010a, 2010b).
Critics also show concern that a file-drawer problem may exist in the field.In any area of science, successful studies that report significant results are more likely to be published than non-significant studies.This can make a null effect look significant in a meta-analysis.Researchers in the area of parapsychology were among the first to become sensitive to this problem and in 1975 the Parapsychological Association (PA) adopted a policy against withholding of non-significant results.Nowadays the main peer-review journals in the area maintain a strict policy of no discrimination among significant and non-significant studies.Rosenthal (1979) suggested a method, known as the file-safe file-drawer analysis, for estimating how resistant a finding is to the filedrawer problem.Honorton (1985) used this technique in the first meta-analysis conducted on a Ganzfeld dataset.This author reported that 423 non-significant studies would have been needed to cancel out the significance of the Ganzfeld database.The most recent meta-analysis (Storm, Tressoldi, & Di Risio, 2010a) included 59 free-response ESP studies conducted from 1992 to 2008.The authors calculated that the number of non-significant, unpublished studies needed to bring this database to chance level was 293.It is quite unlikely that such a high number of non-significant, unpublished studies exists given how time and resource consuming this type of experiment is (a single Ganzfeld session may take as long as one hour, not to mention questionnaire scoring, data analysis, etc.) and also given the PA"s and specialised journals' policy against selective reporting.
Several studies have been conducted in search of neurological signals concomitant to ESP.In 1979, Rao and Feola conducted an early review of the literature on the relationship between ESP and the brain's electroencephalographic (EEG) activity.
These researchers concluded that EEG alpha levels and ESP are associated and that, therefore, non-effortful or relaxed attention may be conducive to ESP. McDonough, Don, and Warren (2002) detected gamma activity in association to ESP in a forced-choice task in a replication of a previous study on event-related brain potentials (Don, McDonough, & Warren, 1998).More recently, researchers have used EEG along with functional magnetic resonance imaging (fMRI) techniques to study correlations between the brain activities of pairs of participants placed in separate rooms.In a recent literature review, Charman (2006)

34
EEG analyses found evident changes in the receiver's brain activity in response to sensory stimulation of the sender.However, there are also studies that did not find evidence for this effect.May, Spottiswoode and James (1994) conducted an experiment to detect event-related desynchronization resulting from an ESP stimulus.
However, after 70 trials contributed by three subjects, these researchers found no evidence of response to the ESP stimulus.Moulton and Kosslyn (2008), using functional magnetic resonance imaging (fMRI), failed to find any neurological response to ESP stimuli in a study that he refers to as the strongest evidence so far against ESP.
Throughout the years, researchers have explored a range of procedures in an effort to achieve unequivocal evidence of ESP.Participant selection seems to be one of the favourite practices.Meta-analyses of previous studies (e.g.Bem & Honorton, 1994;Broughton, Kanthamani, & Khilji, 1989;Honorton & Schechter, 1987;Storm & Ertel, 2001, 2002;Storm, Tressoldi, and Di Risio, 2010a) have identified a series of factors that appear to influence success.Some of these factors may be superficial.
For example, extraversion may relate merely to the ability to be at ease in the testing situation, while practice of mental disciplines may reflect a general interest in inner experiences and introspection.Other factors such as subjective paranormal experiences and high scores on the feeling and perception poles of the Myers-Briggs type indicator may be more central if an ESP process exists in humans.The possible effect of many of these variables can be understood in relation to the noise reduction model (Honorton, 1977(Honorton, , 1978)).In the noise reduction model, ESP is conceptualised as a weak signal that is frequently masked by internal somatic and external sensory "noise".Reducing the signal-to-noise ratio should therefore help detect any psi signal, and this can be achieved by reducing internal and external stimulation.In relation to this, one of the conditions most commonly believed to be desirable in ESP experiments is relaxation as a means of enhancing the signal-tonoise ratio by reducing somatic and cognitive noise.The experimental evidence, however, is not as clear as can be expected from the theory.Several researchers report a positive association between the participants" performance and their degree of relaxation (Braud & Braud, 1973;Sargent, 1982;or Stanford & Mayer, 1974).Braud (1977) found a curvilinear relationship between these two variables and argued that there seems to be an optimum level of arousal for successful performance in this type of experiment.However, other authors have failed to find a significant association between these two variables (George, 1982;Morris & Morrell, 1985;Musso & Granero, 1982).
Based on the noise reduction model, Honorton and Harper (1974) recommend the use of a sensory attenuation technique, the Ganzfeld.The Ganzfeld technique is nowadays the experimental procedure most commonly used to test the existence of ESP.The Ganzfeld is a sensory isolation technique originally used for the study of perception by Gestalt psychologists (e. g.Avant, 1965).Ganzfeld experiments commonly involve two participants (one in the role of a telepathic sender and the other of a receiver) located in separate rooms.The receiver is placed in a sensory attenuation environment, while the sender is shown a target stimulus such as a picture, postcard or video clip that has been randomly selected from a large pool of possible targets.The sender is asked to "silently communicate" this target to the receiver.At the same time, the receiver reports spontaneous mental images, feelings, and subjective impressions that come into his or her mind.Then, a randomly ordered target set containing the actual target and three decoy targets are shown to the receiver, who is asked to rate the degree to which each matches the thoughts, feelings, and images he or she experienced during the response period.
Using the direct-hit measure of scoring, the receiver scores a hit if he or she chooses the actual target and a miss if he or she selects a decoy.By chance alone, receivers should select the actual target 25% of the time.A statistically significant deviation above this baseline is taken to indicate a communication anomaly.Meta-analyses of Ganzfeld studies show a small but highly significant effect of information transfer between a sender and a receiver (Bem & Honorton, 1994;Bem, Palmer, & Broughton, 2001;Storm & Ertel, 2001, 2002).Furthermore, some studies suggest that the Ganzfeld may be more conducive than non-Ganzfeld conditions.In a recent paper, Storm, Tressoldi, and Di Risio (2010a) report a meta-analysis on three types of study: those that used the standard Ganzfeld technique, studies that used non-Ganzfeld noise reduction techniques (such as meditation, relaxation, or hypnosis), and other non-Ganzfeld, no noise reduction studies.The authors report that the mean effect size value of the Ganzfeld database (mean ES=0.14, 95% CI: 0.07, 0.02; Stouffer's z=5.48, p=2.13x10 -8 ) was significantly higher than the mean effect size of the non-Ganzfeld no noise reduction (mean ES=-0.029,95% CI: -0.07, 0.01; Stouffer's z=-2.29,p=0.98) but not significantly higher than non-Ganzfeld, noise reduction database (mean ES=0.11, 95% CI: 0.01, 0.21; Stouffer's z=3.35, p=2.08x10 -4 ).
They also found that those studies that selected participants (believers in the paranormal, meditators, etc.) showed higher hit rates than studies with unselected participants, but only in the Ganzfeld condition.In a reply, Hyman (2010) criticises the methodology of the authors and accuses them of making a largely heterogeneous database appear homogeneous.Hyman remarks, once more, that evidence in this kind of research has not reached yet a level of consistency to meet scientific criteria.Storm, Tressoldi, and Di Risio (2010b) claim that Hyman presents a one-sided account and argue that they followed the standard statistical procedure to find out and deal with outliers in the database.
Characteristics of the information that participants of ESP tests are required to retrieve also seems to play a role in the outcome of the experiment.Bem and Honorton (1994) report significantly higher scores for trials where video clips were used instead of art prints.Other studies that have used multisensory targets also returned successful results.In one of the Maimonides dream studies (Krippner, Ullman, & Honorton, 1971) the researchers used an experimental design in which the participant "experienced" a multisensory target.The researchers used an elaborate random number system to choose a word from Hall and Van de Castle's (1966) manual, Content Analysis.This was then matched with an art print, and a multisensory experience was designed around it.Although these authors report highly successful results, no control condition was used in this study.In a previous Ganzfeld study (Pérez-Navarro, Lawrence, & Hume, 2009) we compared our participants' performance using objects and pictures as targets.We observed significantly higher scores for those trials in which objects were used.Some other ideas and strategies in order to improve laboratory ESP results can also be found in the literature.Regarding the social aspects of the experiment, for example, holding an informal chat with participants prior to the testing session in order to relax or motivate them, etc. is among the most recurrent practices in the literature (see Dalton, 1997, for a review).From a series of visits to different laboratories, Delanoy (1997) points out four broad categories of practices that ESP researchers tended to adopt, as follows: 1) procedures concerned with laboratory design, 2) orientation towards participants, 3) participants-experimenters interaction, and 4) experimenter orientation and preparation.Delanoy reports that, in general, a comfortable and reassuring environment that, at the time, conveys an image of professionalism was desired by the researchers at the laboratories visited.The creation of a comfortable sitting area where participants would be welcomed prior to the experiment was another important practice pointed out by the researchers.
Delanoy also notices that participant-oriented behaviours, such as waiting for their arrival, not leaving them unattended, offering them refreshments, and other courtesies would make participants feel valued and could help to decrease anxiety or any worries towards the experiment.A good participant-experimenter interaction was also viewed as an important factor that contributed to experimental success.Parker (2000) recommends feeding the receiver"s on-going mentation back to the sender.This could contribute to experimental success by providing the sender real feedback on his task and/or by diminishing external distractions.Parker reports one non-auditory monitored study and four monitored ones, showing a substantial difference in hit rates (20% for the non-auditory monitored study vs. 40% average for the monitored ones).Nevertheless, these results cannot be conclusive, as there was only one non-auditory monitored study.
There could also be a vast amount of knowledge, inspired from informal practice or discussion, not quite suitable for formal publication, latent in the research community.In a previous study (Pérez-Navarro, 2005) I contacted a large number of active researchers and academics in the area, through conventional post or email, to invite them to put forward their views on potential means of improving experimental ESP results.A considerable set of viable strategies was collected.
Mainly, these referred to psychological management and preparation of participants, experimental design, data treatment, targets, ecological validity, and instrumental measures.Although this work did not draw a "recipe" for experimental success per se, it provided a starting point for further systematic research.In the present study we compare two experimental conditions.In one (experimental condition A) we integrated a set of practices recommended by these researchers, and in the other (experimental condition B) we followed a similar protocol that did not include these practices.We hypothesised that the hit rate in both experimental conditions would be significantly higher than expected by chance.We also expected that the integration of the researcher's advice in experimental condition A would result in a significantly higher hit rate than that achieved in experimental condition B. When the implementation of an item of recommendation could be quantified, a correlation was calculated between the degree to which each such practice was adopted in the session and the experimental outcome.All items of recommendation were hypothesised to contribute positively to the participants' performance.All hypotheses were one-tailed and alpha levels were kept at 0.01 due to multiplicity of contrasts planned for this study.

Method Design
In this study we used a between-subjects design.Participants were randomly assigned to either an experimental condition that integrated a set of practices recommended by researchers in the area (experimental condition A) or to an experimental condition that did not include any of these practices (experimental condition B).Where feasible, the association between a recommended practice and the participants" ESP scores was quantified and explored in a correlation analysis.The session outcome (dependent variable) was defined using direct hits.
The participant was asked to indicate which of the four pictures resembled the most his/her experience during the period of sensory attenuation.If the participant pointed at the stimulus that the sender was trying to communicate, one hit was counted.Otherwise, the trial was coded as a miss.

Participants
A sample of 100 volunteers was recruited through advertisement of the study at the University of Greenwich campus.The study was advertised as an ESP study, though no further information about the characteristics of the experiment was provided at this stage apart from its estimated duration.Participants were enrolled in a variety of courses, though most of them were psychology students.Individuals were scheduled for the session and encouraged to come along with a friend or relative so that one could act as receiver and the other as sender.Thirty-three participants were males and sixty-seven females, with ages ranging from 18 to 45.The mean age of participants was 21.6 with a standard deviation of 5.3 years.

Measures, Apparatus, and Materials
A thirty-minute white noise soundtrack was created with the software CoolEdit.This was played to the receiver, via headphones, through a PC.Visual attenuation was achieved by projecting a red lamp on a pair of translucent acetate eye covers from approximately 40 cms from the individual"s face.A wireless radio transmitter system set at the receiver"s room fed back the receiver"s report to the sender in the modified experimental condition.The system received the input through the PC and transmitted it to the sender"s headset.Also a random number generator (RNG) was used to randomise target selection, experimental condition, and order of presentation of the series of stimuli to the receiver after the session.The same RNG was used in both experimental conditions.
Target Stimuli: Two pools of stimuli were used in this study: pictures (Experimental condition B) and objects (experimental condition A).Forty photographs were selected from a larger pool by the experimenter so that they contained elements and themes that could be interesting and attention-catching to the participants.These pictures were randomly organised into ten sets of four pictures each.Each set was kept in an envelope.Pictures were labelled on the back with the set number (a number from 1 to 10) and a letter (from a to d) for later random selection.The ten envelopes were labelled each with the set number they contained.In experimental condition A we used forty objects organized randomly into ten sets of four from the Recommendations for ESP Research 39 most successful in previous studies.They consisted of small toys, souvenirs, and daily utensils.Each set was kept in a small box labelled with the set number.Each object within each set was labelled with a letter from a to d for later random selection.The sets (pictures and objects) were originally arranged to make sure that no member in a set resembled any other from the same set.We took into account class (e. g. two toys should not be in the same set), colour brightness, shape, etc.

Procedure
When individuals approached the experimenter with an interest in taking part in the study, they were scheduled for an ESP test.On their arrival to the laboratory, they were randomly assigned to either experimental condition A or B. Experimental condition A included the characteristics outlined below, which were not included in experimental condition B.
Targets: We used multisensory targets (objects) instead of pictures.We selected the more successful sets of objects from our previous studies according to the number of participants who had achieved a 'hit' compared to the number of times the set had been used.

Pre-experiment informal chat:
On the arrival of our participants we spent between 10 and 20 minutes in an informal chat in order to establish a rapport and reduce any anxieties by clarifying any questions they could have about the experiment.Above all we tried to be welcoming and friendly.

Relaxation techniques:
We included 15 minutes of guided relaxation exercises based on Jacobson's (1962) progressive relaxation technique prior to the sensory attenuation.
Feedback to the sender: We provided the sender with feedback of the on-going receiver"s report, through a radio transmitter, during the sensory attenuation.We took into account the number of times the receiver spoke to describe his/her mental imagery or subjective impressions during the sensory attenuation.
Personalised setting: If participants were not completely happy with the experimental setting we allowed them to make slight changes until they felt comfortable.Some of the most frequent concerned the lighting, volume of the white noise, and position of the chair.We quantified this variable through the number of changes requested by the participant.
Sender-receiver pairings: When possible, we used males as receivers and females as senders (Dalton, 1994).The four possible pairings were dummy coded for analysis according to this author, being male (receiver) and female (sender) coded as 4, female (receiver) and male (sender) as 3, female-female as 2, and male-male as 1.
Post-session review: After the sensory attenuation, prior to judging, we took time with the participants to review their report, allowing them to make changes and/or extend their comments.The number of amendments and/or additions each participant made were counted.

Time of the session:
We avoided running the sessions around 18.50 ± 4hs (local sidereal time), as recommended by the researchers.This recommendation was based on a series of studies published by Spottiswoode (1997).
Two experimenters were involved in the study: the first author of this article (experimenter A) and a co-experimenter (experimenter B).At the time of the session, experimenter A accompanied the receiver to the laboratory while experimenter B gave the instructions to the sender in a distant room.Experimenter B then opened an envelope containing a randomly generated code for set and target selection, and gave the corresponding stimulus to the sender.At the same time, experimenter A, in the laboratory, gave the instructions to the receiver in a standard manner, set up the PC and radio transmitter and started the session.Experimenter A remained in a room next to the receiver"s room listening to the individual"s report through headphones and writing down his/her comments.In 30 minutes from the beginning of the session, experimenter B let experimenter A know the set number (but not the target number a, b, c, or d) that contained the target stimulus via SMS.Experimenter A ignored this until the period of sensory attenuation was completed.At the end of the sensory attenuation, in the optimised protocol, the experimenter reviewed the individual"s report adding any further clarifications and comments from the participant.Then, experimenter A displayed on a table (in randomised order) a duplicate of the set of stimuli previously revealed by experimenter B to contain the target.The individual was then asked to examine these four choices, named A, B, C and D, and indicate which one resembled most closely his/her mental imagery and subjective experience during the sensory attenuation.At this time, experimenter A was only aware of the set of stimuli that contained the target, but kept blind to which of these choices was the right one.It was a requirement of the protocol, at this point, that the experimenter would not help the individual in his decision in any way.Nobody was allowed to enter the laboratory until the participant"s response had been registered.
Finally, when the judging process had been completed, experimenter A accompanied the receiver to the sender's room to find out the identity of the target.

Results
Target selection was tested for equiprobability of target, set number, and order of presentation of the target in the judging sequence.The distribution of targets for the 50 sessions in the experimental condition A proved to be random for the four target alternatives (i.e.A, B, C, D;  2 =1.20, p=0.75) and set number (1 to 4;  2 =4.4,p=0.88).In experimental condition B, target alternatives as well as set numbers appeared also to be randomly distributed ( 2 =1.52, p=0.67 and  2 =3.6, p=0.93, respectively).The position of the target stimulus and decoys in the judging sequence was also random ( 2 =2.32, p=0.51 and  2 =1.04, p=0.79, for the experimental conditions A and B respectively), ruling out the possibility that participants could have chosen the right stimulus due to position preferences.
Overall, participants were more successful under the experimental condition that integrated the researchers' recommendations (15 direct hits, 30%, z=0.82, p=0.21) than under the one that did not (11 direct hits, 22%, z=-0.49, p=0.31).Although this difference did not reach statistical significance (z=0.92,p=0.18), it was in the expected direction.The power of this analysis, for an expected effect size of approximately 0.15 (as suggested from previous meta-analyses), would be 0.07 with an alpha level of 0.01 and the sample size used in this study.The percentage of participants who pointed at the target stimulus as either their first or second choice was not significantly different from chance expectation (50%) in either of the experimental conditions (64% in experimental condition A and 48% in experimental condition B) using an alpha level of 0.01 (z=1.97, p=0.02 and z=-0.28, p=0.61, respectively).The difference between the two experimental conditions in this analysis was not significant either (z=1.61,p=0.05).In this case the power of the analysis was 0.21.Fifty-six percent of the total sample chose the target stimulus as either their first or second choice.This difference is not significantly different from chance expectation either (z=1.20,p=0.11).
Among the measures that could be quantified in the modified condition only the degree of success of the target stimulus in previous studies correlated significantly with the session outcome at an alpha level of 0.01 (rxy=0.39,p=0.004).Two other variables: feedback to the sender and post-session review showed correlation indices in the expected direction with p-values below 0. 05 (rxy=0.36, p=0.01 and rxy=0.32, p=0.02, respectively).Variables male-female pairing and personalised setting showed small, non-significant coefficients (rxy=0.11,p=0.44 and rxy=0.10,p=0.47, respectively) (see table 1).

Discussion
The Ganzfeld has been the result of long efforts towards the development of an experimental protocol to replicate systematically the phenomenology claimed in spontaneous case reports.However, studies that have used this technique in their design do not seem to have produced results strong enough to convince the scientific community.
In this study we designed and tested a new protocol based on a series of recommendations given by active researchers in the area.Although the difference between this experimental protocol and a control condition was observed in the hypothesised direction, it did not reach statistical significance at alpha=0.01.
Nevertheless, the percentage of correct guesses obtained with the integration of the researchers recommendations in the experimental protocol (30%) is comparable to the ones previously reported in meta-analytic work (32.2% reported by Bem & Honorton, 1994;31% by Bem et al., 2001;31.6% by Storm & Ertel, 2001;and 32.2% by Storm et al., 2010a).An average hit rate of 32% would correspond to an effect size of 0.14 that, according to Cohen (1992), would be classified as a 'small' effect.The integration of the researchers" recommendations in our experimental protocol has not produced any clear results.Therefore, we must be either far from understanding the underlying mechanisms of ESP that would help us to unfold a fully visible version of the phenomenon in the laboratory or we must be simply dealing with a very weak or non-existent effect.
A regression analysis conducted with five recommended practices reveals that the degree to which the participant was allowed to review and extend his/her report after the sensory attenuation was the only item that contributed significantly to the relative success of the improved protocol.However, it was not an aim of this study to evaluate the efficacy of these practices individually, which would have required a different type of design.Instead, we tried to estimate the global gain of adopting these recommendations through the comparison of the two experimental conditions.Thus, other pieces of advice like holding an informal chat prior to the experiment, target type (pictures vs. objects), including relaxation techniques, or mode of data analysis (direct hits vs. z-scores) were not included in the regression analysis due that they could not be quantified for being all present in the improved experimental condition.It must also be noticed that, despite the fact that participant pre-selection on the basis of personality traits (e. g. extraversion, openness, paranormal believer, etc.) was one of the most recommended items in our 2005 survey, we did not select participants for this study mainly because the individuals were to be assigned randomly to the experimental conditions.
The multiple regression coefficient R was significant at an alpha level of 0.01.Despite the fact that this can be genuinely interpreted as evidence for ESP even if the individual correlations or the percentage of hits are not significant, this is not different from what is reported in the literature.Our concern is that, up to date, Ganzfeldbased protocols have not taken us too far because, at best, we just keep accumulating slightly significant or at-chance results.Even if we assumed that metaanalysis has proven ESP, there would still be a problem of visibility, which seems to be nowadays the main obstacle for research in this area in terms of financial support, interdisciplinary co-operation, and effective dissemination and acceptance of findings.We encourage researchers to keep exploring alternative features of the experimental protocol in order to achieve consistently strong results in the laboratory, for example, using more ecologically valid designs like remote viewing studies or dream studies, using neurological indicators, or studying selected populations like artists or emotionally bonded subjects.

Table 1 :
Pearson's correlation coefficients between participants" performance and the degree to which the measures were present in the optimised protocol.