- Explain the difference between between-subjects and within-subjects experiments, list some of the pros and cons of each approach, and decide which approach to use to answer a particular research question.
- Define random assignment, distinguish it from random sampling, explain its purpose in experimental research, and use some simple strategies to implement it.
- Define what a control condition is, explain its purpose in research on treatment effectiveness, and describe some alternative types of control conditions.
- Define several types of carryover effect, give examples of each, and explain how counterbalancing helps to deal with them.
In this section, we look at some different ways to design an experiment. The primary distinction we will make is between approaches in which each participant experiences one level of the independent variable and approaches in which each participant experiences all levels of the independent variable. The former are called between-subjects experiments and the latter are called within-subjects experiments.
In a between-subjects experiment, each participant is tested in only one condition. For example, a researcher with a sample of 100 college students might assign half of them to write about a traumatic event and the other half write about a neutral event. Or a researcher with a sample of 60 people with severe agoraphobia (fear of open spaces) might assign 20 of them to receive each of three different treatments for that disorder. It is essential in a between-subjects experiment that the researcher assign participants to conditions so that the different groups are, on average, highly similar to each other. Those in a trauma condition and a neutral condition, for example, should include a similar proportion of men and women, and they should have similar average intelligence quotients (IQs), similar average levels of motivation, similar average numbers of health problems, and so on. This is a matter of controlling these extraneous participant variables across conditions so that they do not become confounding variables.
The primary way that researchers accomplish this kind of control of extraneous variables across conditions is called random assignment, which means using a random process to decide which participants are tested in which conditions. Do not confuse random assignment with random sampling. Random sampling is a method for selecting a sample from a population, and it is rarely used in psychological research. Random assignment is a method for assigning participants in a sample to the different conditions, and it is an important element of all experimental research in psychology and other fields too.
In its strictest sense, random assignment should meet two criteria. One is that each participant has an equal chance of being assigned to each condition (e.g., a 50% chance of being assigned to each of two conditions). The second is that each participant is assigned to a condition independently of other participants. Thus one way to assign participants to two conditions would be to flip a coin for each one. If the coin lands heads, the participant is assigned to Condition A, and if it lands tails, the participant is assigned to Condition B. For three conditions, one could use a computer to generate a random integer from 1 to 3 for each participant. If the integer is 1, the participant is assigned to Condition A; if it is 2, the participant is assigned to Condition B; and if it is 3, the participant is assigned to Condition C. In practice, a full sequence of conditions—one for each participant expected to be in the experiment—is usually created ahead of time, and each new participant is assigned to the next condition in the sequence as he or she is tested. When the procedure is computerized, the computer program often handles the random assignment.
One problem with coin flipping and other strict procedures for random assignment is that they are likely to result in unequal sample sizes in the different conditions. Unequal sample sizes are generally not a serious problem, and you should never throw away data you have already collected to achieve equal sample sizes. However, for a fixed number of participants, it is statistically most efficient to divide them into equal-sized groups. It is standard practice, therefore, to use a kind of modified random assignment that keeps the number of participants in each group as similar as possible. One approach is block randomization. In block randomization, all the conditions occur once in the sequence before any of them is repeated. Then they all occur again before any of them is repeated again. Within each of these “blocks,” the conditions occur in a random order. Again, the sequence of conditions is usually generated before any participants are tested, and each new participant is assigned to the next condition in the sequence. Table 6.2 “Block Randomization Sequence for Assigning Nine Participants to Three Conditions” shows such a sequence for assigning nine participants to three conditions. The Research Randomizer website (http://www.randomizer.org) will generate block randomization sequences for any number of participants and conditions. Again, when the procedure is computerized, the computer program often handles the block randomization.
Table 6.2 Block Randomization Sequence for Assigning Nine Participants to Three Conditions
Random assignment is not guaranteed to control all extraneous variables across conditions. It is always possible that just by chance, the participants in one condition might turn out to be substantially older, less tired, more motivated, or less depressed on average than the participants in another condition. However, there are some reasons that this is not a major concern. One is that random assignment works better than one might expect, especially for large samples. Another is that the inferential statistics that researchers use to decide whether a difference between groups reflects a difference in the population takes the “fallibility” of random assignment into account. Yet another reason is that even if random assignment does result in a confounding variable and therefore produces misleading results, this is likely to be detected when the experiment is replicated. The upshot is that random assignment to conditions—although not infallible in terms of controlling extraneous variables—is always considered a strength of a research design.
Treatment and Control Conditions
Between-subjects experiments are often used to determine whether a treatment works. In psychological research, a treatment is any intervention meant to change people’s behavior for the better. This includes psychotherapies and medical treatments for psychological disorders but also interventions designed to improve learning, promote conservation, reduce prejudice, and so on. To determine whether a treatment works, participants are randomly assigned to either a treatment condition, in which they receive the treatment, or a control condition, in which they do not receive the treatment. If participants in the treatment condition end up better off than participants in the control condition—for example, they are less depressed, learn faster, conserve more, express less prejudice—then the researcher can conclude that the treatment works. In research on the effectiveness of psychotherapies and medical treatments, this type of experiment is often called a randomized clinical trial.
There are different types of control conditions. In a no-treatment control condition, participants receive no treatment whatsoever. One problem with this approach, however, is the existence of placebo effects. A placebo is a simulated treatment that lacks any active ingredient or element that should make it effective, and a placebo effect is a positive effect of such a treatment. Many folk remedies that seem to work—such as eating chicken soup for a cold or placing soap under the bedsheets to stop nighttime leg cramps—are probably nothing more than placebos. Although placebo effects are not well understood, they are probably driven primarily by people’s expectations that they will improve. Having the expectation to improve can result in reduced stress, anxiety, and depression, which can alter perceptions and even improve immune system functioning (Price, Finniss, & Benedetti, 2008).
Placebo effects are interesting in their own right (see Note 6.28 “The Powerful Placebo”), but they also pose a serious problem for researchers who want to determine whether a treatment works. Figure 6.2 “Hypothetical Results From a Study Including Treatment, No-Treatment, and Placebo Conditions” shows some hypothetical results in which participants in a treatment condition improved more on average than participants in a no-treatment control condition. If these conditions (the two leftmost bars in Figure 6.2 “Hypothetical Results From a Study Including Treatment, No-Treatment, and Placebo Conditions”) were the only conditions in this experiment, however, one could not conclude that the treatment worked. It could be instead that participants in the treatment group improved more because they expected to improve, while those in the no-treatment control condition did not.
Figure 6.2 Hypothetical Results From a Study Including Treatment, No-Treatment, and Placebo Conditions
Fortunately, there are several solutions to this problem. One is to include a placebo control condition, in which participants receive a placebo that looks much like the treatment but lacks the active ingredient or element thought to be responsible for the treatment’s effectiveness. When participants in a treatment condition take a pill, for example, then those in a placebo control condition would take an identical-looking pill that lacks the active ingredient in the treatment (a “sugar pill”). In research on psychotherapy effectiveness, the placebo might involve going to a psychotherapist and talking in an unstructured way about one’s problems. The idea is that if participants in both the treatment and the placebo control groups expect to improve, then any improvement in the treatment group over and above that in the placebo control group must have been caused by the treatment and not by participants’ expectations. This is what is shown by a comparison of the two outer bars in Figure 6.2 “Hypothetical Results From a Study Including Treatment, No-Treatment, and Placebo Conditions”.
Of course, the principle of informed consent requires that participants be told that they will be assigned to either a treatment or a placebo control condition—even though they cannot be told which until the experiment ends. In many cases the participants who had been in the control condition are then offered an opportunity to have the real treatment. An alternative approach is to use a waitlist control condition, in which participants are told that they will receive the treatment but must wait until the participants in the treatment condition have already received it. This allows researchers to compare participants who have received the treatment with participants who are not currently receiving it but who still expect to improve (eventually). A final solution to the problem of placebo effects is to leave out the control condition completely and compare any new treatment with the best available alternative treatment. For example, a new treatment for simple phobia could be compared with standard exposure therapy. Because participants in both conditions receive a treatment, their expectations about improvement should be similar. This approach also makes sense because once there is an effective treatment, the interesting question about a new treatment is not simply “Does it work?” but “Does it work better than what is already available?”
The Powerful Placebo
Many people are not surprised that placebos can have a positive effect on disorders that seem fundamentally psychological, including depression, anxiety, and insomnia. However, placebos can also have a positive effect on disorders that most people think of as fundamentally physiological. These include asthma, ulcers, and warts (Shapiro & Shapiro, 1999). There is even evidence that placebo surgery—also called “sham surgery”—can be as effective as actual surgery.
Medical researcher J. Bruce Moseley and his colleagues conducted a study on the effectiveness of two arthroscopic surgery procedures for osteoarthritis of the knee (Moseley et al., 2002). The control participants in this study were prepped for surgery, received a tranquilizer, and even received three small incisions in their knees. But they did not receive the actual arthroscopic surgical procedure. The surprising result was that all participants improved in terms of both knee pain and function, and the sham surgery group improved just as much as the treatment groups. According to the researchers, “This study provides strong evidence that arthroscopic lavage with or without débridement [the surgical procedures used] is not better than and appears to be equivalent to a placebo procedure in improving knee pain and self-reported function” (p. 85).
Research has shown that patients with osteoarthritis of the knee who receive a “sham surgery” experience reductions in pain and improvement in knee function similar to those of patients who receive a real surgery.
Army Medicine – Surgery – CC BY 2.0.
In a within-subjects experiment, each participant is tested under all conditions. Consider an experiment on the effect of a defendant’s physical attractiveness on judgments of his guilt. Again, in a between-subjects experiment, one group of participants would be shown an attractive defendant and asked to judge his guilt, and another group of participants would be shown an unattractive defendant and asked to judge his guilt. In a within-subjects experiment, however, the same group of participants would judge the guilt of both an attractive and an unattractive defendant.
The primary advantage of this approach is that it provides maximum control of extraneous participant variables. Participants in all conditions have the same mean IQ, same socioeconomic status, same number of siblings, and so on—because they are the very same people. Within-subjects experiments also make it possible to use statistical procedures that remove the effect of these extraneous participant variables on the dependent variable and therefore make the data less “noisy” and the effect of the independent variable easier to detect. We will look more closely at this idea later in the book.
Carryover Effects and Counterbalancing
The primary disadvantage of within-subjects designs is that they can result in carryover effects. A carryover effect is an effect of being tested in one condition on participants’ behavior in later conditions. One type of carryover effect is a practice effect, where participants perform a task better in later conditions because they have had a chance to practice it. Another type is a fatigue effect, where participants perform a task worse in later conditions because they become tired or bored. Being tested in one condition can also change how participants perceive stimuli or interpret their task in later conditions. This is called a context effect. For example, an average-looking defendant might be judged more harshly when participants have just judged an attractive defendant than when they have just judged an unattractive defendant. Within-subjects experiments also make it easier for participants to guess the hypothesis. For example, a participant who is asked to judge the guilt of an attractive defendant and then is asked to judge the guilt of an unattractive defendant is likely to guess that the hypothesis is that defendant attractiveness affects judgments of guilt. This could lead the participant to judge the unattractive defendant more harshly because he thinks this is what he is expected to do. Or it could make participants judge the two defendants similarly in an effort to be “fair.”
Carryover effects can be interesting in their own right. (Does the attractiveness of one person depend on the attractiveness of other people that we have seen recently?) But when they are not the focus of the research, carryover effects can be problematic. Imagine, for example, that participants judge the guilt of an attractive defendant and then judge the guilt of an unattractive defendant. If they judge the unattractive defendant more harshly, this might be because of his unattractiveness. But it could be instead that they judge him more harshly because they are becoming bored or tired. In other words, the order of the conditions is a confounding variable. The attractive condition is always the first condition and the unattractive condition the second. Thus any difference between the conditions in terms of the dependent variable could be caused by the order of the conditions and not the independent variable itself.
There is a solution to the problem of order effects, however, that can be used in many situations. It is counterbalancing, which means testing different participants in different orders. For example, some participants would be tested in the attractive defendant condition followed by the unattractive defendant condition, and others would be tested in the unattractive condition followed by the attractive condition. With three conditions, there would be six different orders (ABC, ACB, BAC, BCA, CAB, and CBA), so some participants would be tested in each of the six orders. With counterbalancing, participants are assigned to orders randomly, using the techniques we have already discussed. Thus random assignment plays an important role in within-subjects designs just as in between-subjects designs. Here, instead of randomly assigning to conditions, they are randomly assigned to different orders of conditions. In fact, it can safely be said that if a study does not involve random assignment in one form or another, it is not an experiment.
There are two ways to think about what counterbalancing accomplishes. One is that it controls the order of conditions so that it is no longer a confounding variable. Instead of the attractive condition always being first and the unattractive condition always being second, the attractive condition comes first for some participants and second for others. Likewise, the unattractive condition comes first for some participants and second for others. Thus any overall difference in the dependent variable between the two conditions cannot have been caused by the order of conditions. A second way to think about what counterbalancing accomplishes is that if there are carryover effects, it makes it possible to detect them. One can analyze the data separately for each order to see whether it had an effect.
When 9 Is “Larger” Than 221
Researcher Michael Birnbaum has argued that the lack of context provided by between-subjects designs is often a bigger problem than the context effects created by within-subjects designs. To demonstrate this, he asked one group of participants to rate how large the number 9 was on a 1-to-10 rating scale and another group to rate how large the number 221 was on the same 1-to-10 rating scale (Birnbaum, 1999). Participants in this between-subjects design gave the number 9 a mean rating of 5.13 and the number 221 a mean rating of 3.10. In other words, they rated 9 as larger than 221! According to Birnbaum, this is because participants spontaneously compared 9 with other one-digit numbers (in which case it is relatively large) and compared 221 with other three-digit numbers (in which case it is relatively small).
Simultaneous Within-Subjects Designs
So far, we have discussed an approach to within-subjects designs in which participants are tested in one condition at a time. There is another approach, however, that is often used when participants make multiple responses in each condition. Imagine, for example, that participants judge the guilt of 10 attractive defendants and 10 unattractive defendants. Instead of having people make judgments about all 10 defendants of one type followed by all 10 defendants of the other type, the researcher could present all 20 defendants in a sequence that mixed the two types. The researcher could then compute each participant’s mean rating for each type of defendant. Or imagine an experiment designed to see whether people with social anxiety disorder remember negative adjectives (e.g., “stupid,” “incompetent”) better than positive ones (e.g., “happy,” “productive”). The researcher could have participants study a single list that includes both kinds of words and then have them try to recall as many words as possible. The researcher could then count the number of each type of word that was recalled. There are many ways to determine the order in which the stimuli are presented, but one common way is to generate a different random order for each participant.
Between-Subjects or Within-Subjects?
Almost every experiment can be conducted using either a between-subjects design or a within-subjects design. This means that researchers must choose between the two approaches based on their relative merits for the particular situation.
Between-subjects experiments have the advantage of being conceptually simpler and requiring less testing time per participant. They also avoid carryover effects without the need for counterbalancing. Within-subjects experiments have the advantage of controlling extraneous participant variables, which generally reduces noise in the data and makes it easier to detect a relationship between the independent and dependent variables.
A good rule of thumb, then, is that if it is possible to conduct a within-subjects experiment (with proper counterbalancing) in the time that is available per participant—and you have no serious concerns about carryover effects—this is probably the best option. If a within-subjects design would be difficult or impossible to carry out, then you should consider a between-subjects design instead. For example, if you were testing participants in a doctor’s waiting room or shoppers in line at a grocery store, you might not have enough time to test each participant in all conditions and therefore would opt for a between-subjects design. Or imagine you were trying to reduce people’s level of prejudice by having them interact with someone of another race. A within-subjects design with counterbalancing would require testing some participants in the treatment condition first and then in a control condition. But if the treatment works and reduces people’s level of prejudice, then they would no longer be suitable for testing in the control condition. This is true for many designs that involve a treatment meant to produce long-term change in participants’ behavior (e.g., studies testing the effectiveness of psychotherapy). Clearly, a between-subjects design would be necessary here.
Remember also that using one type of design does not preclude using the other type in a different study. There is no reason that a researcher could not use both a between-subjects design and a within-subjects design to answer the same research question. In fact, professional researchers often do exactly this.
- Experiments can be conducted using either between-subjects or within-subjects designs. Deciding which to use in a particular situation requires careful consideration of the pros and cons of each approach.
- Random assignment to conditions in between-subjects experiments or to orders of conditions in within-subjects experiments is a fundamental element of experimental research. Its purpose is to control extraneous variables so that they do not become confounding variables.
- Experimental research on the effectiveness of a treatment requires both a treatment condition and a control condition, which can be a no-treatment control condition, a placebo control condition, or a waitlist control condition. Experimental treatments can also be compared with the best available alternative.
Discussion: For each of the following topics, list the pros and cons of a between-subjects and within-subjects design and decide which would be better.
- You want to test the relative effectiveness of two training programs for running a marathon.
- Using photographs of people as stimuli, you want to see if smiling people are perceived as more intelligent than people who are not smiling.
- In a field experiment, you want to see if the way a panhandler is dressed (neatly vs. sloppily) affects whether or not passersby give him any money.
- You want to see if concrete nouns (e.g., dog) are recalled better than abstract nouns (e.g., truth).
- Discussion: Imagine that an experiment shows that participants who receive psychodynamic therapy for a dog phobia improve more than participants in a no-treatment control group. Explain a fundamental problem with this research design and at least two ways that it might be corrected.
Birnbaum, M. H. (1999). How to show that 9 > 221: Collect judgments in a between-subjects design. Psychological Methods, 4, 243–249.
Moseley, J. B., O’Malley, K., Petersen, N. J., Menke, T. J., Brody, B. A., Kuykendall, D. H., … Wray, N. P. (2002). A controlled trial of arthroscopic surgery for osteoarthritis of the knee. The New England Journal of Medicine, 347, 81–88.
Price, D. D., Finniss, D. G., & Benedetti, F. (2008). A comprehensive review of the placebo effect: Recent advances and current thought. Annual Review of Psychology, 59, 565–590.
Shapiro, A. K., & Shapiro, E. (1999). The powerful placebo: From ancient priest to modern physician. Baltimore, MD: Johns Hopkins University Press.
This is a derivative of Research Methods in Psychology by a publisher who has requested that they and the original author not receive attribution, which was originally released and is used under CC BY-NC-SA. This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
A blind or blinded-experiment is an experiment in which information about the test is masked (kept) from the participant, to reduce or eliminate bias, until after a trial outcome is known. It is understood that bias may be intentional or unconscious, thus no dishonesty is implied by blinding. If both tester and subject are blinded, the trial is called a double-blind experiment.
Blind testing is used wherever items are to be compared without influences from testers' preferences or expectations, for example in clinical trials to evaluate the effectiveness of medicinal drugs and procedures without placebo effect, observer bias, or conscious deception; and comparative testing of commercial products to objectively assess user preferences without being influenced by branding and other properties not being tested.
Blinding can be imposed on researchers, technicians, or subjects. The opposite of a blind trial is an open trial. Blind experiments are an important tool of the scientific method, in many fields of research—medicine, psychology and the social sciences, natural sciences such as physics and biology, applied sciences such as market research, and many others. In some disciplines, such as medicinal drug testing, blind experiments are considered essential.
In some cases, while blind experiments would be useful, they are impractical or unethical; an example is in the field of developmental psychology: although it would be informative to raise children under arbitrary experimental conditions, such as on a remote island with a fabricated enculturation, it is a violation of ethics and human rights.
The terms blind (adjective) or to blind (transitive verb) when used in this sense are figurative extensions of the literal idea of blindfolding someone. The terms masked or to mask may be used for the same concept; this is commonly the case in ophthalmology, where the word 'blind' is often used in the literal sense.
Some argue that the use of the term "blind" for academic review or experiments is offensive. Some recommend the alternate term "masked" or "anonymous" for this reason.
The French Academy of Sciences originated the first recorded blind experiments in 1784: the Academy set up a commission to investigate the claims of animal magnetism proposed by Franz Mesmer. Headed by Benjamin Franklin and Antoine Lavoisier, the commission carried out experiments asking mesmerists to identify objects that had previously been filled with "vital fluid", including trees and flasks of water. The subjects were unable to do so. The commission went on to examine claims involving the curing of "mesmerized" patients. These patients showed signs of improved health, but the commission attributed this to the fact that these patients believed they would get better—the first scientific suggestion of the now well-known placebo effect.
In 1799 the British chemist Humphry Davy performed another early blind experiment. In studying the effects of nitrous oxide (laughing gas) on human physiology, Davy deliberately did not tell his subjects what concentration of the gas they were breathing, or whether they were breathing ordinary air.
Blind experiments went on to be used outside of purely scientific settings. In 1817, a committee of scientists and musicians compared a Stradivarius violin to one with a guitar-like design made by the naval engineer François Chanot. A well-known violinist played each instrument while the committee listened in the next room to avoid prejudice.
One of the first essays advocating a blinded approach to experiments in general came from Claude Bernard in the latter half of the 19th century, who recommended splitting any scientific experiment between the theorist who conceives the experiment and a naive (and preferably uneducated) observer who registers the results without foreknowledge of the theory or hypothesis being tested. This suggestion contrasted starkly with the prevalent Enlightenment-era attitude that scientific observation can only be objectively valid when undertaken by a well-educated, informed scientist.
Double-blind methods came into especial prominence in the mid-20th century.
Single-blind describes experiments where information that could introduce bias or otherwise skew the result is withheld from the participants, but the experimenter will be in full possession of the facts.
In a single-blind experiment, the individual subjects do not know whether they are so-called "test" subjects or members of an "experimental control" group. Single-blind experimental design is used where the experimenters either must know the full facts (for example, when comparing sham to real surgery) and so the experimenters cannot themselves be blind, or where the experimenters will not introduce further bias and so the experimenters need not be blind. However, there is a risk that subjects are influenced by interaction with the researchers – known as the experimenter's bias. Single-blind trials are especially risky in psychology and social science research, where the experimenter has an expectation of what the outcome should be, and may consciously or subconsciously influence the behavior of the subject.
A classic example of a single-blind test is the Pepsi Challenge. A tester, often a marketing person, prepares two sets of cups of cola labeled "A" and "B". One set of cups is filled with Pepsi, while the other is filled with Coca-Cola. The tester knows which soda is in which cup but is not supposed to reveal that information to the subjects. Volunteer subjects are encouraged to try the two cups of soda and polled for which ones they prefer. One of the problems with a single-blind test like this is that the tester can unintentionally give subconscious cues which influence the subjects. In addition, it is possible the tester could intentionally introduce bias by preparing the separate sodas differently (e.g., by putting more ice in one cup or by pushing one cup closer to the subject). If the tester is a marketing person employed by the company which is producing the challenge, there's always the possibility of a conflict of interest where the marketing person is aware that future income will be based on the results of the test.
"Double blind" redirects here. It is not to be confused with double bind.
Double-blind describes an especially stringent way of conducting an experiment which attempts to eliminate subjective, unrecognized biases carried by an experiment's subjects (usually human) and conductors. Double-blind studies were first used in 1907 by W. H. R. Rivers and H. N. Webber in the investigation of the effects of caffeine.
In most cases, double-blind experiments are regarded to achieve a higher standard of scientific rigor than single-blind or non-blind experiments.
In these double-blind experiments, neither the participants nor the researchers know which participants belong to the control group, nor the test group. Only after all data have been recorded (and, in some cases, analyzed) do the researchers learn which participants were which. Performing an experiment in double-blind fashion can greatly lessen the power of preconceived notions or physical cues (e.g., placebo effect, observer bias, experimenter's bias) to distort the results (by making researchers or participants behave differently from in everyday life). Random assignment of test subjects to the experimental and control groups is a critical part of any double-blind research design. The key that identifies the subjects and which group they belonged to is kept by a third party, and is not revealed to the researchers until the study is over.
Double-blind methods can be applied to any experimental situation in which there is a possibility that the results will be affected by conscious or unconscious bias on the part of researchers, participants, or both. For example, in animal studies, both the carer of the animals and the assessor of the results have to be blinded; otherwise the carer might treat control subjects differently and alter the results.
Computer-controlled experiments are sometimes also erroneously referred to as double-blind experiments, since software may not cause the type of direct bias between researcher and subject. Development of surveys presented to subjects through computers shows that bias can easily be built into the process. Voting systems are also examples where bias can easily be constructed into an apparently simple machine based system. In analogy to the human researcher described above, the part of the software that provides interaction with the human is presented to the subject as the blinded researcher, while the part of the software that defines the key is the third party. An example is the ABX test, where the human subject has to identify an unknown stimulus X as being either A or B.
A triple-blind study is an extension of the double-blind design; the committee monitoring response variables is not told the identity of the groups. The committee is simply given data for groups A and B. A triple-blind study has the theoretical advantage of allowing the monitoring committee to evaluate the response variable results more objectively. This assumes that appraisal of efficacy and harm, as well as requests for special analyses, may be biased if group identity is known. However, in a trial where the monitoring committee has an ethical responsibility to ensure participant safety, such a design may be counterproductive since in this case monitoring is often guided by the constellation of trends and their directions. In addition, by the time many monitoring committees receive data, often any emergency situation has long passed.
Double-blinding is relatively easy to achieve in drug studies, by formulating the investigational drug and the control (either a placebo or an established drug) to have identical appearance (color, taste, etc.). Patients are randomly assigned to the control or experimental group and given random numbers by a study coordinator, who also encodes the drugs with matching random numbers. Neither the patients nor the researchers monitoring the outcome know which patient is receiving which treatment, until the study is over and the random code is revealed.
Effective blinding can be difficult to achieve where the treatment is notably effective (indeed, studies have been suspended in cases where the tested drug combinations were so effective that it was deemed unethical to continue withholding the findings from the control group, and the general population), or where the treatment is very distinctive in taste or has unusual side-effects that allow the researcher and/or the subject to guess which group they were assigned to. It is also difficult to use the double blind method to compare surgical and non-surgical interventions (although sham surgery, involving a simple incision, might be ethically permitted). A good clinical protocol will foresee these potential problems to ensure blinding is as effective as possible. It has also been argued that even in a double-blind experiment, general attitudes of the experimenter such as skepticism or enthusiasm towards the tested procedure can be subconsciously transferred to the test subjects.
Evidence-based medicine practitioners prefer blinded randomised controlled trials (RCTs), where that is a possible experimental design. These are high on the hierarchy of evidence; only a meta analysis of several well designed RCTs is considered more reliable.
Modern nuclear physics and particle physics experiments often involve large numbers of data analysts working together to extract quantitative data from complex datasets. In particular, the analysts want to report accurate systematic error estimates for all of their measurements; this is difficult or impossible if one of the errors is observer bias. To remove this bias, the experimenters devise blind analysis techniques, where the experimental result is hidden from the analysts until they've agreed—based on properties of the data set other than the final value—that the analysis techniques are fixed.
One example of a blind analysis occurs in neutrino experiments, like the Sudbury Neutrino Observatory, where the experimenters wish to report the total number N of neutrinos seen. The experimenters have preexisting expectations about what this number should be, and these expectations must not be allowed to bias the analysis. Therefore, the experimenters are allowed to see an unknown fraction f of the dataset. They use these data to understand the backgrounds, signal-detection efficiencies, detector resolutions, etc.. However, since no one knows the "blinding fraction" f, no one has preexisting expectations about the meaningless neutrino count N' = N × f in the visible data; therefore, the analysis does not introduce any bias into the final number N which is reported. Another blinding scheme is used in B meson analyses in experiments like BaBar and CDF; here, the crucial experimental parameter is a correlation between certain particle energies and decay times—which require an extremely complex and painstaking analysis—and particle charge signs, which are fairly trivial to measure. Analysts are allowed to work with all the energy and decay data, but are forbidden from seeing the sign of the charge, and thus are unable to see the correlation (if any). At the end of the experiment, the correct charge signs are revealed; the analysis software is run once (with no subjective human intervention), and the resulting numbers are published. Searches for rare events, like electron neutrinos in MiniBooNE or proton decay in Super-Kamiokande, require a different class of blinding schemes.
The "hidden" part of the experiment—the fraction f for SNO, the charge-sign database for CDF—is usually called the "blindness box". At the end of the analysis period, one is allowed to "unblind the data" and "open the box".
In a police photo lineup, an officer shows a group of photos to a witness or crime victim and asks him or her to pick out the suspect. This is basically a single-blind test of the witness's memory, and may be subject to subtle or overt influence by the officer. There is a growing movement in law enforcement to move to a double-blind procedure in which the officer who shows the photos to the witness does not know which photo is of the suspect.
In recruiting musicians to perform in orchestras and so on, blind auditions are now routinely done: the musicians perform behind a screen so that their physical appearance and gender cannot prejudice the listener judging the performance.
- ^Oxford English Dictionary, 2nd ed.
- ^"Ableist language and philosophical associations - New APPS: Art, Politics, Philosophy, Science". www.newappsblog.com. Retrieved 2017-03-19.
- ^"Why 'blind alley', 'blind faith' and 'blind refereeing' may be offensive - New APPS: Art, Politics, Philosophy, Science". www.newappsblog.com. Retrieved 2017-03-19.
- ^"The Word "Blind" Is Still Misused in Everyday Speech- Let's Get Rid of It! | Vision Loss & Personal Recovery". www.visionlossandpersonalrecovery.com. Retrieved 2017-03-19.
- ^"Double-blind study - Oxford Reference". doi:10.1093/oi/authority.20110803095728138.
- ^Robertson, Christopher T.; Kesselheim, Aaron S. (2016-01-30). Blinding as a Solution to Bias: Strengthening Biomedical Science, Forensic Science, and Law. Academic Press. p. 78. ISBN 9780128026335.
- ^ abHolmes, Richard (2009). The Age of Wonder: How the Romantic Generation Discovered the Beauty and Terror of Science. [full citation needed]
- ^Fétis, François-Joseph (1868). Biographie Universelle des Musiciens et Bibliographie Générale de la Musique, Tome 1 (Second ed.). Paris: Firmin Didot Frères, Fils, et Cie. p. 249. Retrieved 2011-07-21.
- ^Dubourg, George (1852). The Violin: Some Account of That Leading Instrument and its Most Eminent Professors... (Fourth ed.). London: Robert Cocks and Co. pp. 356–357. Retrieved 2011-07-21.
- ^Daston, Lorraine (2005). "Scientific Error and the Ethos of Belief". Social Research. 72 (1): 18.
- ^Alder, Ken (2006), Kramer, Lloyd S.; Maza, Sarah C., eds., "A Companion to Western Historical Thought", The History of Science, Or, an Oxymoronic Theory of Relativistic Objectivity, Blackwell Companions to History, Wiley-Blackwell, p. 307, ISBN 978-1-4051-4961-7, retrieved 2012-02-11,
- ^Rivers WHR and Webber HN. “The action of caffeine on the capacity for muscular work” Journal of Physiology 36: 33-47: 1907.
- ^Aviva Petrie; Paul Watson (28 February 2013). Statistics for Veterinary and Animal Science. Wiley. pp. 130–131. ISBN 978-1-118-56740-1.
- ^Friedman, L.M., Furberg, C.D., DeMets, D.L. (2010). Fundamentals of Clinical Trials. New York: Springer, pp. 119-132. ISBN 9781441915856
- ^"Male circumcision 'cuts' HIV risk". BBC News. 2006-12-13. Retrieved 2009-05-18.
- ^McNeil Jr, Donald G. (2006-12-13). "Circumcision Reduces Risk of AIDS, Study Finds". The New York Times. Retrieved 2009-05-18.
- ^"Skeptical Comment About Double-Blind Trials". The Journal of Alternative and Complementary Medicine. Retrieved 2010-05-04.
- ^Psychological sleuths – Accuracy and the accused on apa.org
- ^Under the Microscope – For more than 90 years, forensic science has been a cornerstone of criminal law. Critics and judges now ask whether it can be trusted.