The Design of Experiments

Everyone engages in the process of data collection

Everyone collects data relevant to vague hypotheses or ideas about the world. Each person is an amateur personality theorist at times, hypothesizing, for example, that extroverts make better salesmen than do introverts, or that individuals born under the sign of Leo make good actors. As he acquires new data, the amateur theorist becomes either more or less convinced that his hypotheses are valid, but he seldom sets out to test their validity systematically by gathering and analyzing all the data available. A scientist's hypotheses tend to be more reliable than those of amateur theorists because the scientist does systematically what others do haphazardly. As you read this module, try to answer the following questions:


Suppose a person believes in certain relationships between personality traits and the Zodiac signs; how does he usually go about testing the relationship? First, he probably has only a vague idea of the hypothesis to be tested. He might "know" that an Aries should be impulsive, aggressive, and survival-oriented. But what is the evidence? Maybe he counted the people that fit his preconceptions and forgot about, or made exceptions for, those that the pattern did not describe. Then, he may have developed a general statement to reconcile the observations and preconceptions. Notice, however, that he did not systematically test the ideas. His observations were of the people he encountered naturally, from day to day, and he probably omitted some of those as exceptions.

The existence of a relationship does not demonstrate cause and effect

Next, we can assume our observer began operating on the basis of a general statement about Aries individuals acting and reacting in certain uniform ways. At this point, the tendency would be to overlook discrepancies, or to justify them with statements such as, "Oh well, he probably is a primitive Aries!" Notice that terms have not been defined explicitly; in fact, the definition of aggressiveness probably varies with the personal characteristics of the individuals observed. Now, with his preconceptions "proven," the amateur will act in a particular way toward individuals born under the Aries sign, which, of course, has an effect on the way they respond. Here we have a self-fulfilling prophecy. You should see many faults in this casual approach to testing hypotheses. Certainly, it does not meet the requirements of a scientific inquiry. The terms were not defined, the observations were made on a sample that was not representative of the population at large, the data were not recorded in an orderly fashion, and the hypothesis was probably changed several times along the way. The methodology of science would have dictated a different course of action.

Data collected informally would seldom qualify as scientific evidence

A scientific approach would, of course, have to avoid the criticisms already mentioned as well as satisfy other criteria. The scientific approach to the problem would probably involve correlational techniques in the analysis of observations of a randomly selected sample of Aries individuals. This approach would determine whether or not there is a relationship between Aries traits on conventional personality scales and those that belong to the sign, but it still could not demonstrate that the Zodiac sign caused the particular pattern of personality traits.



An hypothesis states a relation of dependency among variables. It takes the form of a prediction that a change in one variable will be accompanied by, or will produce, changes in another variable. The first step in designing an experiment to test an hypothesis is to decide what independent and dependent variables will be tested. You will recall that the conditions that change as a result of manipulation are the dependent variables. For example, if a researcher varies the amount of alcohol consumed by test subjects and records their reaction time, the amount of alcohol ingested is an independent variable (the experimenter controls its ingestion) and the reaction time is the dependent variable.

What we know in an experiment is the change we make in the independent variable.
What we want to learn is the change produced in the dependent variable.

Suppose the research objective is to determine whether nicotine has any effect on measured anxiety levels. There are many factors that can influence the outcome of such an experiment. One possible factor of importance is that a subject might be influenced by the knowledge that he has received a dosage. A well-designed experiment would identify the effect of that awareness on the dependent variable (i.e., anxiety level). One of the first steps would be to select two groups of subjects who are as nearly identical as possible (e.g., in terms of age, existing anxiety levels, and physical condition). Such a matching procedure controls for many irrelevant variables which could affect the outcome of the experiment. To control for the possible effects of "awareness of dosage," one group would receive pills containing nicotine, and the other group receive pills containing a placebo, which is identical in appearance to the nicotine pill but which contains no active ingredient.

An experimenter uses a control group to isolate the effect of changing one independent variable

The experimental design now appears as follows:
Group Treatment Criterion
Experimental group
Control group
given nicotine
given placebo
measured anxiety level
measured anxiety level

The presence or absence of nicotine is the independent variable in this design and the measure of anxiety is the dependent variable.

The dependent variable must be defined in measurable terms

It is not unusual in experimental design to have more than one control group. For example, consider the hypothesis that there is a learned component in extrasensory perception (ESP). If ESP is a real phenomenon, could someone learn to become better at it just as they learn to improve other skills? To test the hypothesis, we should first select a dependent variable; what is the skill to be learned? One meaningful variable might be the ability of an individual to guess correctly which figure appears on a card an experimenter is lookinWe might have the experimenter look at ten cards, one at a time, and have the subjects guess the card after each viewing. The score the subject made on this test would be the measure of our dependent variable.

The objective of the experiment is to determine the effects of learning and, the next step, therefore, is to identify the independent variable. What type of learning might influence one's ability to guess the figure on a card? One possibility is to try direct reinforcement for correct guesses. Subjects might make 100 practice trials prior to taking the test, being reinforced with 25 cents for every correct guess. A control group of subjects, who receive no practice trials, would provide a basis for comparison of the results. If reinforcement influences a learned component of ESP, then the experimental group should do better on the test than the control group.

However, one could still criticize the experimental design by saying, "It may not be the reinforcement that makes the difference; it may be that having 100 practice trials prior to the test acts as a 'warm-up.'" In order to determine whether this is the case, we need another control group, which would experience the same number of practice trials but would not be reinforced for correct guesses. The experimental design would now look like this.
Practice Trials 1
Criterion Test Trials
Experimental group
100 practice trials
with reinforcement
given for each
"correct" response
Guess 10 items
Control group 1 no practice trials Guess 10 items
Control group 2 100 practice trials and no reinforcer Guess 10 items

There are many possible outcomes for this experiment.

In practice, every experimental design falls short of the ideal

Although it is better than the approach of the amateur theorist, this experimental design also has certain shortcomings. For example, the experiment tested two very limited hypotheses, i.e., whether reinforcement during practice produced more improvement than no practice, and whether practice alone had any effect at all. Moreover, the experiment measured only one possible manifestation of ESP. With a different criterion, i.e., being able to predict which card would come up next, the experiment would probably have supported a different conclusion. Finally, there may still be other uncontrolled factors that were not identified in the experiment. For example, in experiments of this type, subjects may recognize very subtle cues that the experimenter gives off via facial and body gestures. It may have been that during the practice trials the subject learned to "read" the experimenter's expression, in which case, the improved scores on the criterion test may not be due to ESP but to an improved ability to react to the physical cues that the experimenter inadvertently provided.

( Figure 1 -- is just a picture of a rat in a Skinner box or standard operant chamber)

In some experimentation, the act of collecting data can influence the results of the study. Consider, for example, a researcher conducting a study of childrearing practices by counting the number of times a mother punishes her children. The very presence of a researcher in the home, making check marks on a data pad, may cause the mother to resist punishing the child.


More Complex Designs

Theoretically, there is no limit to the number of experimental and control group situations that may be studied. For example, we might compare the performance of four groups, each of which receives a different experimental treatment. One group could learn from a teaching machine, another from a lecturer, a third from independent study, a fourth from a small discussion group. Here the dependent variable is the amount learned and the independent variable is the method of learning. It is possible to design an experiment using 50 or 100 experimental conditions, but such designs usually are too expensive and too logistically difficult to be practical.


A single subject can provide both the experimental and control conditions

Sometimes it is necessary to compare the behavior of a single subject under two conditions. This is often the case in drug research, when the objective is to determine what warnings to put on which labels. Operant conditioning has provided a methodology for assessing drug effects. Suppose that a group of rats are to be given tranquilizers, the independent variable. The dependent variable is the rate at which food deprived rats press a bar to obtain food. The first task is to learn the baseline response level for each rat; some rats are normally more active than others. Figure 1 shows the typical Skinner-box arrangement used in conducting operant conditioning studies and in determining an animal's baseline performance.

After observing the baseline performance of each rat until it reaches a stable level (Figure 2A), we introduce a tranquilizer and measure the change from the baseline until the rat is pressing the bar at a new stable level (Figure 2B). To verify that the new level resulted from the tranquilizer, we might continue the observations to see if the baseline performance is restored after the tranquilizer wears off (Figure 2C).

This is a simplified version of the research procedures that provide the reasoning for warning labels (Table 1) attached to various products. Sleeping pills reduce the operant rate of responding, causing it to fall beneath the original baseline; hence, the warning about operating automobiles, machinery, etc.


Triaminic expectorant. CAUTION: This preparation may cause drowsiness. Do not drive or operate machinery while taking this medication.

Ex-Lax laxative. CAUTION: Frequent or prolonged use of this or any other laxative may result in dependence on laxatives.

Cope -- tablets for relief of nervous tension headaches. This preparation may cause drowsiness. Do not drive a car or operate machinery while taking this medication.

No-Doz keep alert tablets. Important: No stimulant should be substituted for sleep in activities requiring physical alertness.

Coricidin -- cold and hayfever tablets. This preparation may cause drowsiness. Do not operate machinery or drive a car while taking this medication.

Fedrazil antihistaminic nasal-decongestant. CAUTION. Antihistamines may cause drowsiness. If it occurs, do not drive a car or operate machinery ...

Table 1


Correlational techniques permit a measurement of the strength of a relationship between variables

The experimental designs described earlier provide verification of cause- andeffect relationships because the relevant variables can be controlled, or manipulated, by the experimenter. When we can change just one variable at a time, we can measure the effect of that change on another variable. However, when the variables are not under the control of the researcher, one can only establish correlations, or relationships, to show that changes in one variable are associated with changes in another. The example of an Aries personality illustrated a crude use of correlational techniques. Sometimes we cannot manipulate the variables, i.e., change a person's zodiac sign or his personality. Still, a relationship may be significant even without evidence to demonstrate causality. Remember, experimental research results in expressions of the variability in a dependent variable as a function of variations in an independent variable. Correlational techniques allow precise statements of the strength of a relationship between two variables. There may not be a cause-and-effect relationship; both variables may change as a result of changes in a third (perhaps unknown) variable. But the relationship is nevertheless significant if we can depend on changes in one variable being accompanied by changes in another.


One does not necessarily have to be objective to be right, but if we know a person has reason to be biased, it is especially important to examine his evidence. Accountants have a phrase, "Figures don't lie, but liars can figure." Facts neither explain nor select themselves. If research is done objectively, the results lead to a belief. It often happens, however, that we have a belief first and, perhaps without even being aware of it, select for observation the events that are most likely to support that belief.


Sometimes biases are easy to recognize. For example, there is now a substantial surplus of elementary school teachers in the United States. This would not be the group one would select to do an impartial study of the effect of class size on learning effectiveness, since any decision to increase class sizes would put even more teachers out of work.

In other cases, biases are more subtle and more difficult to recognize. Experienced researchers realize that they cannot even be sure of all their own prejudices, and they therefore take care to design research projects using techniques that minimize the possibility of bias, either deliberate or accidental.

The halo effect biases performance evaluation.

Researchers may encounter several sources of bias. We have already mentioned the problem of self-fulfilling prophecies; what we believe to be true becomes the truth because our actions provide the stimulus for another's response. When parents act as though their children are totally untrustworthy, it is not surprising if this turns out to be the case. This is referred to as the halo effect, in which a person's reputation (good or bad) becomes a "halo" which influences the treatment he receives from others. The halo effect becomes significant in performance evaluation. It usually results in a loss of objectivity when such evaluations are made. Suppose you made an excellent grade on the first English examination, on which most of your classmates performed performance evaluation poorly. The teacher would tend to carry this perception of you over to the grading of the second examination. In other words, your second test may be graded less critically because you are perceived to be an excellent student (with a halo). The opposite may also occur, which may explain why it is so difficult to raise one's grade after a poor first performance; the teacher is evaluating the second performance as that of someone already "proven" to be a dull student.

A second source of experimental bias is referred to as the Hawthorne effect. This refers to a condition in which a significant change in the dependent variable results from any type of special attention to the experimental group. The effect was identified by researchers studying the motivating effect of changes in conditions at the Hawthorne Works of the Western Electric Company. They found that production went up with any variable they manipulated, i.e., pay raises, rest periods, changes in illumination, even variables that worsened work conditions. For example, when they lowered the light intensity to that of pale moonlight, production still went up. The variable that was producing these positive changes was the attention being shown to the workers (Roethlisberger and Dickson, 1939).

A third source, which Bachrach (1972) calls hypothesis myopia, sometimes prevents scientists from discovering new facts. This involves working on a problem with such a strong preconceived idea of what the results should be that one becomes blinded to alternative solutions or ideas.


Data collection is only a part of the total research task

The selection of an experimental design must be based on many factors, such as the hypothesis to be tested and the nature of the variables. Many designs are available; only a few of the major ones have been introduced here. Collecting data does not make one a scientist. Data must be analyzed, interpreted, and organized into a theoretical framework. Some social scientists start by accumulating data, and then develop theories from the data collected. Others begin with theories that give direction to their search for data. Both approaches have merit and either may fit a given situation or scientist better than the other. However, research is not complete without both a theory and some evidence of its validity.

Now take Progress Check 1.



1. Correlational research allows one to determine:
a. if the independent variable is a function of the dependent variable.
b. if the dependent variable changes with manipulation of the independent variable.
c. the strength of the relationship between two variables.
d. which variable is producing the effect.

2. Independent variables are:
a. manipulated by the experimenter.
b. investigated to see if they produce the dependent behavior changes.
c. (neither)
d. (both)

3. Experimental research:
a. uses random samples from well-defined populations.
b. investigates limited questions.
c. demands systematic collection of data.
d. (all of the above)
e. (none of the above)

4. A control group in which no treatment is given is:
a. always used in experimental studies.
b. an integral aspect of certain experimental designs, but not used in others.
c. a useless baseline with which to compare our results.
d. generally composed of a sample from a different population than the experimental group.

5. In determining the effects of drugs, each animal is likely to:
a. serve as his own control.
b. be evaluated against animals in a control group representing the same population that he does.
c. be evaluated against animals selected randomly from a different population.
d. (none of the above)

6. Self-fulfilling prophecies:
a. are substantiated because we act as if they are true.
b. are no problem in scientific inquiries.
c. (both)
d. (neither)

7. Which of the following pose problems for researchers?
a. Hypothesis myopia
b. Hawthorne effect
c. Halo effect
d. Zeigarnik effect





Individuals test ideas about behavior, (e.g., obese people are jolly), and so do scientists. Some major differences in their approaches are: the scope of the problems they undertake, the method of data collection, the selection of samples to be studied, the precision with which variables (like "obesity" and "jollity") are defined, the development of an experimental design that allows one to determine a causal relationship or the strength of a relationship. Assume that you want to test the hypothesis that obese people are jolly, and answer the following questions.

This study will need a(n):
a. experimental design.
b. correlational design.


Jollity is an independent variable.
a. True
b. False


We experimentally manipulate our obesity variable.
a. True
b. False


If jolly is defined as scoring in the upper third on an objective test designed for this purpose, we have to be aware of the Halo effect.
a. True
b. False



1 False: we select people that already meet our criterion of obesity. It would be possible to experimentally manipulate obesity in rats, but then how would we define a "jolly rat"?

2 c

3 b

4 False: the test may not be an accurate measure of jollity, but objective testing avoids the Halo effect.

5 False: neither variable is the independent variable since this will be a correlational study.


The following (Keilman, 1971) is the journal abstract of an experimental study:

Eighteen male retardates were presented individually with a multiple-choice paired-associate task, using one of the following modes of presentation: tutorial or teaching ma- chine. The results clearly indicated that the tutorial mode was superior on the following measures: trials to criterion, number of correct responses during original reaming, and absolute retention values after a 1-week interval had elapsed. Further research is needed to determine if the actual presentation procedure or type of feedback is re- sponsible for the differences in these two methods.

Now let us identify some of the concepts that have been mentioned in this module with reference to this study. First identify the in- 3 dependent variable from me choices below.

a. Trials to criterion
b. Number of correct choices during original learning
c. Mode of presentation

Were there two experimental groups or a control group and an experimental group?

___________________________________________________________ 2

To what population can we appropriately generalize the results on the basis of the above information?


Is the halo or Hawthorne effect likely to be a problem in this study? Why?

__________________________________________ 3



1 Male retarded individuals

2 two experimental groups with one receiving the tutorial treatment while the other received the treatment involving instruction from the teaching machine.

3 No, for the halo effect, because the experimenter is apparently counting number of correct responses or number of trials that it takes a person to learn the task. These measures can be objectively scored. No, for the Hawthorne effect, because the experimenter is comparing the results of the two methods of presentation against one another rather than against a control group that receives no change In treatment.



Match the following terms:

1____Independent variable
2 _____Dependent variable
3_____ Halo effect
4_____ Population
5______Correlational study
6_____ Experimental study

(a) "Attention" is the variable producing the effect
(b) Determines the strength of a relationship
(c) Determines whether a given variable is producing an effect or not
(d) The experimenter's prior knowledge of a person's ability affects his present performance rating for that person
(e) The treatment under the experimenter's control
(f) Results are generalized to this
(g) Manipulations of the treatment variable may produce corresponding changes in this variable

7 Correlational studies determine causality.
(a) True
(b) False

8 Subjects may not serve as their own control in any experimental study.
(a) True
(b) False




Unit 13 Table of Contents

Home Page