Multivariate Analysis of Variance (MANOVA)

Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu


Keywords: MANCOVA, special cases, assumptions, further reading, computations

Introduction

Multivariate analysis of variance (MANOVA) is simply an ANOVA with several dependent variables.  That is to say, ANOVA tests for the difference in means between two or more groups, while MANOVA tests for the difference in two or more vectors of means.

For example, we may conduct a study where we try two different textbooks, and we are interested in the students' improvements in math and physics. In that case, improvements in math and physics are the two dependent variables, and our hypothesis is that both together are affected by the difference in textbooks.  A multivariate analysis of variance (MANOVA) could be used to test this hypothesis. Instead of a univariate F value, we would obtain a multivariate F value (Wilks' λ) based on a comparison of the error variance/covariance matrix and the effect variance/covariance matrix.  Although we only mention Wilks' λ here, there are other statistics that may be used, including Hotelling's trace and Pillai's criterion.  The "covariance" here is included because the two measures are probably correlated and we must take this correlation into account when performing the significance test.

Testing the multiple dependent variables is accomplished by creating new dependent variables that maximize group differences.  These artificial dependent variables are linear combinations of the measured dependent variables.

Research Questions

The main objective in using MANOVA is to determine if the response variables (student improvement in the example mentioned above), are altered by the observer’s manipulation of the independent variables.  Therefore, there are several types of research questions that may be answered by using MANOVA:

            1) What are the main effects of the independent variables?

            2) What are the interactions among the independent variables?

            3) What is the importance of the dependent variables?

4) What is the strength of association between dependent variables?

            5) What are the effects of covariates?  How may they be utilized?

Results

If the overall multivariate test is significant, we conclude that the respective effect (e.g., textbook) is significant. However, our next question would of course be whether only math skills improved, only physics skills improved, or both. In fact, after obtaining a significant multivariate test for a particular main effect or interaction, customarily one would examine the univariate F tests for each variable to interpret the respective effect. In other words, one would identify the specific dependent variables that contributed to the significant overall effect.

MANOVA is useful in experimental situations where at least some of the independent variables are manipulated. It has several advantages over ANOVA. First, by measuring several dependent variables in a single experiment, there is a better chance of discovering which factor is truly important.  Second, it can protect against Type I errors that might occur if multiple ANOVA’s were conducted independently. Additionally, it can reveal differences not discovered by ANOVA tests.

However, there are several cautions as well.  It is a substantially more complicated design than ANOVA, and therefore there can be some ambiguity about which independent variable affects each dependent variable.  Thus, the observer must make many potentially subjective assumptions.  Moreover, one degree of freedom is lost for each dependent variable that is added.  The gain of power obtained from decreased SS error may be offset by the loss in these degrees of freedom. Finally, the dependent variables should be largely uncorrelated. If the dependent variables are highly correlated, there is little advantage in including more than one in the test given the resultant loss in degrees of freedom.  Under these circumstances, use of a single ANOVA test would be preferable.

Assumptions

Normal Distribution: - The dependent variable should be normally distributed within groups.  Overall, the F test is robust to non-normality, if the non-normality is caused by skewness rather than by outliers.  Tests for outliers should be run before performing a MANOVA, and outliers should be transformed or removed.

Linearity  - MANOVA assumes that there are linear relationships among all pairs of dependent variables, all pairs of covariates, and all dependent variable-covariate pairs in each cell.  Therefore, when the relationship deviates from linearity, the power of the analysis will be compromised.

Homogeneity of Variances: - Homogeneity of variances assumes that the dependent variables exhibit equal levels of variance across the range of predictor variables. Remember that the error variance is computed (SS error) by adding up the sums of squares within each group. If the variances in the two groups are different from each other, then adding the two together is not appropriate, and will not yield an estimate of the common within-group variance.  Homoscedasticity can be examined graphically or by means of a number of statistical tests.

Homogeneity of Variances and Covariances: - In multivariate designs, with multiple dependent measures, the homogeneity of variances assumption described earlier also applies. However, since there are multiple dependent variables, it is also required that their intercorrelations (covariances) are homogeneous across the cells of the design. There are various specific tests of this assumption.

Special Cases

Two special cases arise in MANOVA, the inclusion of within-subjects independent variables and unequal sample sizes in cells.

Unequal sample sizes - As in ANOVA, when cells in a factorial MANOVA have different sample sizes, the sum of squares for effect plus error does not equal the total sum of squares.  This causes tests of main effects and interactions to be correlated.  SPSS offers and adjustment for unequal sample sizes in MANOVA.

Within-subjects design - Problems arise if the researcher measures several different dependent variables on different occasions.  This situation can be viewed as a within-subject independent variable with as many levels as occasions, or it can be viewed as separate dependent variables for each occasion.  Tabachnick and Fidell (1996) provide examples and solutions for each situation. This situation often lends itself to the use of profile analysis, which is explained below.

Additional Limitations

Outliers - Like ANOVA, MANOVA is extremely sensitive to outliers.  Outliers may produce either a Type I or Type II error and give no indication as to which type of error is occurring in the analysis.  There are several programs available to test for univariate and multivariate outliers.

Multicollinearity and Singularity - When there is high correlation between dependent variables, one dependent variable becomes a near-linear combination of the other dependent variables.  Under such circumstances, it would become statistically redundant and suspect to include both combinations.

MANCOVA

MANCOVA is an extension of ANCOVA. It is simply a MANOVA where the artificial DVs are initially adjusted for differences in one or more covariates. This can reduce error "noise" when error associated with the covariate is removed.

For Further Reading:

Cooley, W.W. and P. R. Lohnes. 1971. Multivariate Data Analysis. John Wiley & Sons, Inc.

George H. Dunteman (1984). Introduction to multivariate analysis. Thousand Oaks, CA: Sage Publications.  Chapter 5 covers classification procedures and discriminant analysis.

Morrison, D.F. 1967. Multivariate Statistical Methods. McGraw-Hill: New York.

Overall, J.E. and C.J. Klett. 1972. Applied Multivariate Analysis. McGraw-Hill: New York.

Tabachnick, B.G. and L.S. Fidell. 1996. Using Multivariate Statistics. Harper Collins College Publishers: New York.

Webpages:

Site

Link

 

 

Statsoft text entry on MANOVA

http://www.statsoft.com/textbook/stathome.html

EPA Statistical Primer

http://www.epa.gov/bioindicators/primer/html/manova.html

Introduction to MANOVA

http://ibgwww.colorado.edu/~carey/p7291dir/handouts/manova1.pdf

Practical guide to MANOVA for SAS

http://ibgwww.colorado.edu/~carey/p7291dir/handouts/manova2.pdf

 

Computations

First, the total sum-of-squares is partitioned into the sum-of-squares between groups (SSbg) and the sum-of-squares within groups (SSwg):

SStot = SSbg + SSwg

This can be expressed as:

The SSbg is then partitioned into variance for each IV and the interactions between them.

In a case where there are two IVs (IV1 and IV2), the equation looks like this:

Therefore, the complete equation becomes:

Because in MANOVA there are multiple DVs, a column matrix (vector) of values for each DV is used. For two DVs (a and b) with n values, this can be represented:

Similarly, there are column matrices for IVs - one matrix for each level of every IV. Each matrix of IVs for each level is composed of means for every DV. For "n" DVs and "m" levels of each IV, this is written:

Additional matrices are calculated for cell means averaged over the individuals in each group.

Finally, a single matrix of grand means is calculated with one value for each DV averaged across all individuals in matrix.

 

Differences are found by subtracting one matrix from another to produce new matrices. From these new matrices the error term is found by subtracting the GM matrix from each of the DV individual scores:

Next, each column matrix is multiplied by each row matrix:

These matrices are summed over rows and groups, just as squared differences are summed in ANOVA. The result is an S matrix (also known as: "sum-of-squares and cross-products," "cross-products," or "sum-of-products" matrices.)

For a two IV, two DV example:

 

Stot = SIV1 + SIV2 + Sinteraction + Swithin-group error

 

Determinants (variance) of the S matrices are found. Wilksλ is the test statistic preferred for MANOVA, and is found through a ratio of the determinants:

 

An estimate of F can be calculated through the following equations:

 

Where,

  

 

   

 

Finally, we need to measure the strength of the association. Since Wilksλ is equal to the variance not accounted for by the combined DVs, then (1 – λ) is the variance that is accounted for by the best linear combination of DVs.

 

However, because this is summed across all DVs, it can be greater than one and therefore less useful than:

 

Other statistics can be calculated in addition to Wilksλ. The following is a short list of some of the popularly reported test statistics for MANOVA:

  • Wilksλ = pooled ratio of error variances to effect variance plus error variance
    • This is the most commonly reported test statistic, but not always the best choice.
    • Gives an exact F-statistic

 

  • Hotelling’s trace = pooled ratio of effect variance to error variance

 

 

 

 

 


    • Pillai-Bartlett criterion = pooled effect variances
    • Often considered most robust and powerful test statistic. 
    • Gives most conservative F-statistic.

 

 

 

 


  • Roy’s Largest Root = largest eigenvalue
    • Gives an upper-bound of the F-statistic.
    • Disregard if none of the other test statistics are significant.

MANOVA works well in situations where there are moderate correlations between DVs. For very high or very low correlation in DVs, it is not suitable: if DVs are too correlated, there is not enough variance left over after the first DV is fit, and if DVs are uncorrelated, the multivariate test will lack power anyway, so why sacrifice degrees of freedom?

 

This page was last updated on 06/04/08