** Descriptive and Correlation Statistics
**

One of the most important tools used in modern psychology is statistics. Statistical concepts were first used by Galton in 1885, but the use of these techniques was not widespread until after 1920. Today, however, a knowledge of basic statistics is important to an understanding of even elementary psychology. This module introduces the use of statistics to summarize and analyze data. The concepts taught here will make it possible for you to understand how psychologists measure individual differences.

As you read the text, try to answer the following questions.

- What are measures of central tendency?
- How can data be presented graphically?
- What is a scatter plot?.
- What does it mean when we say the correlation between two tests is highly positive?
- How does a correlation coefficient indicate correlation?
- When is a test reliable? Valid?

As in any empirical science, valid conclusions in psychology depend on the accumulation of amounts of data. For example, in the administration of a test, or in the performance of an experiment, many scores must be obtained and quantities measured. If one hundred subjects are tested, one hundred scores must be obtained and analyzed.

55576545454555466576 76767766666563767668 67657567647867557665 56565776678665765778 56555576957677666546 |

Score
3

4

5

6

7

8

9

Frequency | |
---|---|

1 6 28 36 24 4 1 |

* Data must be
organized in a
meaningful way
before it can be
analyzed
*

Figure 1 lists some test scores as an example of raw data with which a psychologist works. In this case, each score represents the number of questions answered correctly by a subject on a memory span test of nine items. Usually we find it convenient to summarize such data in either a table or a graph. Figure 2 shows the same raw scores summarized in a frequency distribution tab/e.

The frequency distribution reveals some things that are not readily apparent from the raw data. For example, there seems to be a "bunching" of subjects around the score of 6. This can be seen even more clearly if we convert our frequency distribution into a frequency polygon is in Figure 3. Figure 3 shows the same data as Figures 1 and 2, but here the scores are represented as points on a graph.

** Figure 3 Frequency polygon**

** Measures of Central Tendency**

Still another way to summarize data is to determine the central tendency of the test population.
There are many ways to describe the central tendency but we will only deal with the three that are
used most frequently.

*Mean.* The mean is the arithmetic average of the scores. This is the most
familiar and frequently used measure of central tendency To obtain it, just
add up all the scores and divide by the number of scores. In the example
above there are 100 scores, the sum of which is 592. To obtain the mean,
divide 592 by 100. In this example, the mean is 5.92.

* Mode. *The mode is defined as the score that occurs most frequently. Figure
3 shows that the score of 6 was made most frequently (36 times) so the
mode of this distribution is 6.

* Median. *Median means middle. It is defined as a theoretical point above and
below which exactly 50 percent of the cases lie. The exact method for
calculating the median is too complicated to teach here but a simpler
technique called the mid-point method can be used to obtain an approximate
value for the median. First arrange the scores in order. Then count from one
end until you reach the middle score. Since there are 100 scores in this
problem, the middle score would be between the 50th and 51st score. If we
count from either end we would find both of these scores to be 6. Therefore
6 is the approximate value of the median.

** CORRELATION
**

Psychologists frequently wish to quantify the relationship, if any, between two sets of events or observations. One might wish to know if college grades can be used to predict financial success, or whether there is any relationship between any two sorts of behavior, such as frustration and aggression. Such questions as these can be investigated by correlational studies.

The typical graph used in a correlation study is a scatter plot. In this graph the range of one test (one type of observation) is marked along one axis and the range of another test "another observation) is marked along the other axis. The graph in Figure 4 correlates hypothetical scores on two final examinations given to the same 80 students who enrolled in both Introductory Psychology and Introductory Sociology.

Figure 4. A scatter tally represents a person located according to his score on a psychology test and his score on a sociology test.

Each tally represents one student's scores on the two tests. For example, the two tallies in the column to the far right indicate that only two students scored above 90% on the psychology test. One scored between 70% and 79% on the sociology test and the other scored between 60% and 69%

* Correlation is not sufficient
evidence of cause and effect*

Let us consider the range of correlations that may be encountered and how they would appear in a scatter plot.

**Figure 5. Scatter plot
patterns**

** Perfect Positive**

Suppose we were comparing people's height in inches with their height in
centimeters. Both of these units measure the same thing (distance). Thus if
we were to determine the height of 10 people using these measures we
would find a perfect correlation (see Figure 5-A).

** High Positive**

If two different measures are closely related, they produce a high positive
correlation. For example, intelligence test scores and reading skills often
show a high positive correlation. Figure 5, Graph B, indicates such a
correlation,

** Low Positive Correlation**

There are many measures which tend to vary together. e the relationship
between height and weight. Generally, taller people weigh more, but there
are exceptions; the correlation is generally positive, in that increases in one
measure are associated with increases in the other, but the amounts of
increases differ. The scatter plot of such a relationship would be indicative
of a low positive correlation. Figure 5-C shows the distribution of heights
and weights that may be found in a typical college classroom
.

** Zero Correlation**

Most measures are totally unrelated. In a correlation of height and hair
color, the points of the scatter plot would be random and no trend could be
obtained in any direction. Figure D shows a zero correlation.

**Negative Correlations**

The negative correlations are interpreted in exactly the same way as the positive correlations but the relationships are in the reverse direction. That is, higher scores on one measure tend to be associated with low scores on a second measure. An example of a highly negative correlation may be the age of an automobile and its selling price. Figure F indicates that generally older cars (high age) cost less (low dollars).

** Final Note**

We must be careful to remember that correlation does not assume
causation. For example, there is a low positive correlation between the rate
at which ice cream melts on the sidewalks of New York and the number of
deaths in Bombay, India. But one does not cause the other. Both are affected
by a third factor, that is, the increase in temperature during summer months.
Correlation may lead us to search for a common cause but it does not prove
that a cause and effect relationship exists.

** RELIABILITY AND VALIDITY**

Much psychological information comes from the administration of tests. Two criteria that a test must meet are reliability and validity. These are both measured by correlations.

* A test is reliable *when repeated administrations to the same subjects have
a high positive correlation. Reliability, then, is a measure of the degree to
which repeated measurements of the same subject give the same reading.
For example, if we gave students in a group an intelligence test on Monday
and then re-tested them several weeks later, would their second scores be
very similar to the first scores? If they are, then we could say that the test is
reliable.

* A test is valid *when it can be proved that it does, in fact, measure what it
claims to measure. Suppose that in a course in psychology the instructor
gave a final examination on which all of the test questions required a
knowledge of algebra. The students would probably protest that such a test
was not a valid measure of their understanding of psychology. Unfortunately,
the question of validity is not often that obvious. Many tests have "face"
validity. That is, they look like they should measure what they claim but in fact
they may not. For example, it may be that our psychology instructor, in
response to student criticism, made up a new final but this time he used a
number of very tricky questions which were really designed to test the
students' intelligence rather than their knowledge of psychology. His
students could claim that this final is just as invalid as the algebra test.

The best way to validate a test is to compare it with another measure (or criterion) of the same event. For example, if someone has devised an industrial aptitude test which supposedly predicts how well someone will do on a job, one could correlate these test scores with performance ratings they received from their supervisors. Both tests may be invalid, but each test deserves more confidence if a comparison of results shows a high positive correlation.

PROGRESS CHECK 1

Now test yourself without looking back.

1. The most frequent score in a given distribution is called the_______________________________

2. The number that represents the sum of all scores divided by the number of scores is the___________________

3. The point in the distribution at which exactly 50% of the scores are higher and 50% are lower is called the________________________________

4. If a high score on one test can be used to predict a low score on another test, the correlation between
the tests is_________________________________

5. Reliability is a measure of the degree to which_________________________

6. Define validity_______________________.

7. (graph here)

The graph above shows two (polygons).

8. Name the correlation indicated by each of the following:

a.__________________

b.__________________

c.__________________

d.__________________

EXERCISES

SCORE | FREQUENCY |
---|---|

9 8 7 6 5 4 3 2 1 |
1 2 4 8 10 8 4 2 1 |

**Table A**

The value that occurs most frequently in a distribution is the mode.

The mode in the frequency distribution in Table A is _________. 2

The median is the midpoint in a distribution. The median in the distribution in Table A is_______________________________ 5

The mean is the arithmetic average of all the scores. Calculate below the mean for the distribution in Table A.

a. The sum of the scores is__________________

b. The number of scores is__________________

c. The sum divided by the number of scores is_________________________________4

The table below represents a__________________________. 3

SCORE | FREQUENCY |
---|---|

13 12 11 10 8 7 6 5 4 3 2 1 |
2 3 9 15 38 45 32 20 11 6 2 1 |

**Table B**

**ANSWERS**

1) 8

2) 5

3) frequency distribution

4) a.200

b.40

c. 5

5) 5

In correlational studies researchers try to discover relationships
between sets of data. In a graph used in a correlation study you might expect to find:

a. the subjects listed along one axis.

b. the scores achieved on one test along one axis and

the scores achieved on the other test along the other axis.

c. the total score from both tests along the bottom axis.

The following sets of scores were achieved by six students on two tests. Put dots in the scatter plot below for these results.

Subject | Test 1 | Test 2 |
---|---|---|

A B C D E F |
3 4 3 2 2 2 | 4 5 5 4 2 3 |

_________________________________________________5

The graph you have completed is called a:

a. frequency polygon.

b. skewed curve.

c. scatter plot.

d. histogram.

________________________________1

Two sets of data are positively correlated when a high score in one set tends to be associated with a high score in the other and low scores in one set tend to be associated with a low score in the other. Which plot shows a positive correlation?

_______________________________________________2

A high positive correlation indicates that more accurate predictions can be made than in a low positive correlation. Write high positive or low positive beside each of the scatter plots below.

**ANSWERS**

1 C

2 B

3 B

4 a. low positive

b. high positive

.

If a scatter plot shows a high negative correlation, then which of the following would be true?

a. a high score on one test implies a high score on the
other test.

b. a high score on one test implies a low score on the
other test.

c. predictions can be more accurate than with a low
negative correlation._________________________________3

A perfect positive correlation has a correlation coefficient of + 1.00. A perfect negative correlation has a correlation coefficient . Write the appropriate coefficient beside each of the following scatter plots.

.

All correlation coefficients fall and + 1.00. No correlation is indicated by a coefficient of 0. Which of the following coefficients would indicate a low negative correlation?

a. 0

b. +1.00

c. -.8l0

d. -.40

___________________________________________2

**ANSWERS**

1. a. +1.00

b. -1.00

2 d

3 b, c

4 a.perfect positive

b. perfect negative

Name the type of correlation indicated by each plot in the diagram above>\

a. ______________________________ .

b. ____________________________ .

c. _____________________________ .

_______________________________________7 .

From this list select the correlation coefficient that might refer to each plot in the diagram in the previous exercise: +3.00, +1.00, +.80, +.40, 0, - .40, - .80, - 1.00.

a.__________________________

b.______________________________

c.________________________________

________________________________________________4

Reliability is a measure of the degree to which repeated
measurement of the same subject gives the same reading. If a
test was reliable, the correlation between two measurements
would be

a. high

b. low

c. zero

__________________________________1

Validity is determined by the degree to which a test measures
what it is intended to measure. A test is valid when:

a. there is a high correlation between it and another
measure known to be valid.

b. repeated testing of the same subjects gives the same
score.

c. (both)

____________________________________________6

Write a definition of each of the following.

a. Reliability________________________________________________3

b. Validity___________________________________________________5

**NOW TAKE PROGRESS CHECK 2**

**ANSWERS**

1. a

2. correlation

3 a degree to which repeated measurement of the same subject gives the same results

4 a. +.80

b - .40

c.+1.00

5. degree to which a test measures what it is intended to measure

6 a

7 a high positive

b. low positive

c. perfect positive

PROGRESS CHECK 2

1.

90, 73, 82, 86, 78, 91, 72, 84, 74, 90, 87 |

Using the scores listed above, find the following:

a. Mean__________

b. Mode__________

c. Median_________

2.

SCORE | FREQUENCY |
---|---|

24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 |
1 3 5 7 4 8 12 18 12 5 6 6 3 4 1 |

Draw a frequency polygon for the distribution above.

3. Write the degree of correlation indicated by a coefficient of:

a. + 1.00____________________________

b. 0____________________

c. - .40__________________

d. - .80______________________

4. What sort of graph is used for correlation studies?______________________

5. A check for the reliability of a test is accomplished by:

a. administering it twice to see if the scores are highly correlated.

b. using two different tests to if their scores are highly correlated.

c. using a different test for each group.

d. comparing test results with results of another test known to be valid.

6. If the correlation between two sets of scores is high positive, then a high score on one test can be used to predict_________________________________

7. Match.

1 ) Reliability __________

2) Validity________

3) Correlation coefficients__________

a. The degree to which a test measures what it is intended to measure

b. A measure of relationship between two tests or sets of observations

c. A measure of the degree to which repeated testing of the same subjects gives the same results

d. Used only in cases of significant differences

October 15, 2007