Descriptive Statistics
The first problem in analysis is to interpret what actually happened
Although graphs provide some information about trends, it is still useful to describe data formally so that other researchers will know precisely what results were obtained. Your objective in this module is to learn how to analyze data statistically in order to understand what actually happened during an experiment. How to draw valid inferences from the results is the subject of the next module. Try to answer the following questions as you read this module.
51
CENTRAL TENDENCY
Descriptive statistics may involve the calculation of one or more of the measures of central tendency: mean, median, and mode. These are methods for determining an "average" score for a set of data.
52
The mode is defined as the most frequently occurring response, score, or event, and might be referred to as "the typical case."
The median divides the distribution into two equal parts, as one-half of the scores fall below it and one-half above it. In other words, half of the scores have a greater value than the median and half have a lesser value.
The mean, symbolized X, is the arithmetical average, calculated by summing all the scores in a distribution and dividing by the total number of scores.
A skewed curve gives different values for different measures of central tendency
In symmetrical distributions (typified by a bell-shaped curve) the mode, mean, and median have nearly the same value. When this is true, the mean is most often used because it represents the typical case, or performance, and because it can be handled most easily in other statistical operations that are usually necessary in the evaluation of psychological data. However, when a distribution is skewed (not symmetrical) the mean gives a markedly different view of the distribution than the median. This can be seen in Table 2, which records the scores on an examination in Introductory Psychology.
Table 2
Scores on an Introductory Psychology examination:
98, 97, 80, 73, 72, 65, 65, 65, 64, 61, 59
Mode = 65
Median = 65
Mean = Sum of all scores
_______________ = 72.6 or approx. 73
Number of scores
The formula for finding the mean is usually written as: EX/n
Where E means to sum what follows. in
this case, equals the sum of all
the scores recorded.
X is a symbol for the distribution, i.e., Introductory
Psychology scores on an examination.
N is equal to the number of scores (or persons), i.e., 11. M is the
usual symbol for the mean of distribution X.
The median is the middlemost score, or the sixth case in this example.
The mode is the most frequently occurring score, which is 65 in this
example.
Although the mean for this data is 72.6, the median is just 65. Obviously, the instructor's view of what the "average" student did on this test would depend on what measure of "average" he uses. In this case, the instructor would be more likely to curve the examination results on the median rather than on the mean.
The reason there is such a great discrepancy between median and mean is that the former identifies only a single score from the distribution while the latter depends upon all the scores. It may be that no one in the class actually received the mean score. Statistics do not lie, of course, but they can give false impressions to the unwary.
MEASURES OF DISPERSION
Experimental results are not described completely without a computation of the measure of dispersion. First look at the distributions in Table 3.
Table 3
Distribution X Scores.
90, 80, 80, 70, 70, 70, 70, 60, 60, 50
Distribution of Y Scores.
80, 75, 75, 70, 70, 70, 70, 65, 65, 60
Note: Since these distributions are symmetrical, the mean, median, and mode are equal. These measures of central tendency are not equal in skewed distributions.
The range for distribution X equals 40 (90 - 50), while the range for Y equals 20.
Dispersion can be expressed as a
range or in terms of standard
deviations
Both distributions have the same mean, but they are not equal; one is fat, while the other is slender. These distributions vary in the degree of dispersion; in other words, the scores are spread over a wider range in one distribution than in the other. The statistics that provide information about dispersion are called the range and the standard deviation.
The range is the simplest, but not necessarily the best, measure of dispersion. To obtain the range, one simply subtracts the smallest score from the largest score in the distribution. Since it relies on only two scores, the two extreme ones, the range is a very crude and unstable measure. It would be very unlikely that the range calculated from the results of one experiment would ever be duplicated in another, or that it would be characteristic of the range encountered in a total population.
53
Psychologist Clifton T. Morgan (1961) described the standard deviation as follows:
The standard deviation is the measure par excellence of the variability of measurements in a distribution. This is such a good measure that, if the frequency distribution is reasonably normal, the distribution can be reconstructed by knowing only two numbers, the mean and the standard deviation. This is true because mathematicians have a precise formula for the normal-probability curve, and the only two unknowns in it are the mean and the standard deviation. Given these, one can draw the normal curve that best fits the particular frequency distribution. Thus, in so far as a distribution is normal, the mean and the standard deviation completely describe and specify it.
Computing the standard deviation of a distribution takes more work, but it is a far more useful measure of dispersion.
The standard deviation is calculated as the square root of the average squared deviations from the mean. The procedure for calculating the standard deviation for a particular distribution requires the following steps.
1. Subtract the mean value (M) from each score (X-M) to obtain deviation scores, symbolized x (lower case). For example, the deviation score for 90 in Table 3 is X = (X - M), or 90 - 70 = 20. For the other scores, the deviations are: 10,10, 0, 0, 0, 0, - 10, - 10, - 20.
2. Square the deviation scores, X2: 400, 100,100, 0, 0, 0, 0,100,100, 400
3. Add all these squared deviation scores and divide by N. the number of scores:
Ex2/N = 1200/10 = 120
4. Find the square root of this value: = 10.95 Thus, the standard deviation = 10.95
The standard deviation for distribution X is 10.95. From the preceding steps, you can see that the formula for obtaining standard deviation is Ex2/N
Now suppose we wish to calculate the standard deviation for the Y distribu- tion, we would follow the same steps. The table below illustrates the procedure. For practice, fill in all the blanks and compute for the Y distribution.
(Step 1) (Step 2)
(Subtract the mean)(Square the deviations)
Y-scores Y -M = y y2
80 10 100
75 5
75 5
70 0
70 0 0
70 0
70 0
65 -5
65 -5
60 - 10 100
EY = 700 by = 0 (Step 3) (Add the squared deviations)
(Step 4) = square root of Ey2/N
(Remember, N is the number of scores in the sample)
Compare your work with the following solution
54
1. Which of the following are measures of central tendency?
a. Range
b. Mode
c. Median
d. Mean
2. If a distribution is markedly skewed, which measure of central tendency is
most appropriate?
a. Mode
b. Median
c. Mean
d. (none of the above)
For the following distribution, calculate the measures below:
5, 4, 1, 5, 0, 5, 3, 3, 2, 2. (Note: You may refer to the text for help.)
3. Mean=
4. Median =
5. Mode =
6. Range =
7. sdx =
ANSWER KEY PAGE 71
56
5 OR MORE CORRECT PAGE 61
FEWER THAN 5 CORRECT PAGE 57
Each of the three measures of central tendency, commonly referred to as the
average, provides a single point along the score scale that represents the
trend of the entire distribution. Remember that the mode is the most
frequently occurring value in the distribution. The median divides the list of
scores such that half of the scores have a greater value than the median and
half are less than the median. The median is easier to find if the scores are
ordered first. The mean is simply the sum of the scores divided by the
number of scores. Use the following distribution for the next series of
exercises.
10, 9, 2, 4, 7, 1, 4, 4, 5, 4
Find the mode.
______________________________________4
Now find the median for the above distribution.
______________________________________2
Now find the mean.
__________________________________________1
The first step in obtaining the standard deviation is to subtract the
mean from each of the scores in the distribution. For example, if the mean is
70, then from the scores 80 and 40 we may obtain an x of 10 and -30,
that is 80 - 70 and 40 - 70.
The x for 90 would be:
a. 20
b. - 20
c. 70
d. 90
_________________________________________6
After you subtract the mean, you square each value of x.
X x=(X-M) x2
80 10 100
40 -30 900
60 _____ ___
_________________________________ 5
Here is a table for the distribution. We have calculated some of the
values. Complete the table and determine the standard deviation.
X (X - M) = x x2
10 5 25
9
7
5 0 0
Range and standard deviation are measures of variability. They give
information about how dispersed the scores are. We find the range by
subtracting the smallest score from the largest score in the distribution.
The range for our distribution is____________________________________
_________________________________________________3
ANSWERS
2. The median is 4 since this value is halfway between the two middlemost scores when the scores are ordered as below:
1, 2, 4, 4, 4, 4, 5, 7, 9, 10
X (X-M) = x x2
10 5 25
9 4 16
7 2 4
5 0 0
4 - 1 1
4 - 1 1
4 - 1 1
4 - 1 1
2 -3 9
1 - 4 16
E X = 50 Ex = 0 Ex2 = ___
The distribution in the previous example is:
a. positively skewed.
b. negatively skewed.
c. symmetrical.
Graph the scores of the above distribution, plotting frequencies
against scores.
4 /
/
3 /
/
2 /
/
1 /
/
0 /
___/___________________________________
SCORES
NOW TAKE PROGRESS CHECK 2
1 a 2 c 3 b
58
1. A distribution with a small standard deviation has _________________
variability than a distribution with a large standard deviatinon.
a. more
b. less
2. The median is found:
a. by ordering the scores and identifying the middlemost value.
b. by adding up the scores and dividing by N.
c. by finding the most frequently occurring score.
d. by subtracting the smallest score from the largest score.
3. The standard deviation measures:
a. dispersion.
b. central tendency.
c. variability.
d. degree of skewness.
For the following distribution, calculate the measures below:
8, 2, 7, 7, 5, 3.
4. M
5. Median
6. Mode
7. Range
8. standard devation of x
60
6 OR MORE CORRECT PAGE 61
FEWER THAN 6 CORRECT INSTRUCTOR CONFERENCE
Unit 13 Table of Contents