Reconstructing Phylogenies

It is well and good to read phylogenies and to use them to hypothesize about relationships. But how do Systematic Biologists go about the process of reconstructing a phylogeny?

(A) Requirements for comparing taxa
Three components are necessary to undertake a phylogenetic (or any other kind) classification:
1) The taxa that are to be classified must be determined (unless you have some idea of what the entities are that must be classified, what are you classifying)?

2) Classifications are founded on differences in features among different taxa. Thus, characters (morphological, molecular, or otherwise) must be identified.

3) Once characters are identified, the different character states must be decided upon. For example, just to say that flower color (a character) will be used to classify plants is not very helpful; we need to decide which and how many colors (character states) there are that can be used to differentiate taxa. Ideally, the changes from one character state to another will reflect evolutionary changes.

(B) Organizing your data
Once you have gathered enough data with which to compare the taxa you’re studying, you can create a data matrix. Data matrices are organized in two dimensions with the names of taxa along one axis and the names of characters along the other axis. Characters states are recorded in the corresponding positions in the matrix. For example, here’s a data matrix that you might create when shopping for dinner:

Color Legs Expense Skin
Chicken white two $ yes
Duck red two $$$ yes
Pork white four $$ no
Beef red four $$$ no
Tofu white none $ sometimes

This matrix is a simple one; more commonly in biology there are many more taxa and characters, and often the character states are much more complex.Fortunately we now have (and have had for many years) computers to facilitate the computations and explore all possibilities for comparing taxa based on our characters.

(C) Phylogenetic Assumptions - Parsimony
There are many approaches to reconstructing phylogenies, most of which have their different advantages and purposes. We will use only one approach in today’s lab - Parsimony - and it happens to be the most commonly-used approach by biologists. Parsimony argues that a reasonable estimate of evolutionary history is one that requires us to make the fewest additional assumptions about our data. This does not necessarily argue that evolution actually works that way - only that we should use that plan to infer evolutionary history from our data. We use the parsimony approach to seek the tree with the fewest number of evolutionary steps; that is, the tree with the fewest number of character state changes. If we can measure the “size” of a phylogenetic tree by the number of steps, then the most reasonable tree is the shortest tree. In truth the TRUE phylogeny may involve more than the minimum number of steps possible. But because we may never know the exact number of evolutionary changes that have occurred in the history of a group, we are reliant on estimating phylogenetic history using the best set of assumptions that we can obtain.

Bottom line: The shortest tree is the best estimate using parsimony criteria.

(D) Phylogenetic Assumptions - Polarity
We stressed earlier that in phylogenetic inference relationships among taxa are based on shared derived traits (synapomorphies), and that shared ancestral traits (symplesiomorphies) are not used to reconstruct phylogenies. Fine, but how do we know, for each character, which traits (character states) are apomorphic and which are plesiomorphic? The answer is, we really don’t know (how could we possible know unless we already knew precisely how evolution progressed in our group of taxa); however, once again we can make some assumptions that allow us to estimate relationships. Determining the polarity of characters (which are apomorphic and which are plesiomorphic) is most commonly done using the outgroup method. The entire group of taxa that you are studying is called the ingroup. An outgroup is any taxon that presumably shares recent common ancestry with the ingroup, but that has diverged from the ingroup prior to further diversification within the ingroup. Ideally our outgroup should be the sister group to the ingroup; but again, we rarely know the identity of the sister group. We certainly can guess as to some reasonably candidates, and these guesses are our outgroups.

Here’s how the outgroup method to assess character polarity works:
1) Choose an outgroup (often several are chosen)
2) Assume that the outgroup has all plesiomorphic characters (and why is this a reasonable assumption)?
3) Code your characters however you wish (0, 1, 2, 3, .... is typical)
4) When the outgroup method is employed, the program that generates your trees will use all character states that are shared among taxa EXCEPT the character states possessed by the outgroup

Bottom line: The plesiomorphic traits are whichever are present in the outgroup.

(E) Let’s Reconstruct a Phylogeny
You already know how to get started. The most difficult steps are the first ones - choosing the group that you want to study, the characters and their states, and finally an outgroup (or 2, or 3). For this exercise you needn’t worry about the accuracy of your phylogeny. You will learn the most by starting from the beginning and making the same kinds of decisions that Systematists make.
• Choose 10 taxa to represent your ingroup.
• Then select at least 10 characters that occur in members of the ingroup.
• Then identify the different character states for each character.
• Then assign a code for each character state within a character.
• Then create a data matrix.

(Time Out - A Brief Lesson on MacClade)
You’re going to have a computer do your analysis for you today. You’ll use the program MacClade to create your data matrix, generate parsimony trees, and study the evolutionary hisotry of your ingroup based on your assumptions

I. Creating a Data Matrix

1. Double click the MacClade icon to open the program - you’ll get a dialogue box asking which file to open, or in your case, create a new file.

2. When you create a new file you’ll get an abbreviated matrix that looks somewhat like the beginning of a blank spread sheet.
a. To create a blank matrix for n taxa, simply drag the square box at the left down until enough rows are available to type in the names of the taxa.

b. Do the same with the square box at right to create enough columns for the appropriate number of characters

c. When you have done this you will have a n0 by n1 size matrix into which you will type the names of your taxa, your characters, and the codes for the character states for each taxa.

d. You can modify the size of your matrix at any time. Remember to save the file after any changes.

2. Using MacClade

• MacClade is user friendly. You should be able to navigate through the program without any problem. Be aware, however, that the only way in which to become fluent in MacClade (and any other program, for that matter) is to use them. The manual for MacClade is good, but it won’t tell you anything that you can’t learn from prowling through the program.

• Once you have finished creating your data matrix, you can convert the data into a tree. Type <apple-T> to go to the tree window. You’ll be asked if you want the default ladder or the default bush. Select default ladder, and you’ll have a tree. Now this is not the most parsimonious tree; in fact, it’s simply a tree with a uniform branching pattern holding the taxa in the same order as you entered them into the data matrix. No, if you want the shortest tree you’ll have to search for it. You lab instructor will give you a demo on the two ways that MacClade allows you to search for parsimonious trees. Note that MacClade is good at many things, but finding the shortest trees is not one of them (other programs are far superior). It will, however, suffice for us, and it provides a clear view of how parsimony works.

(E - cont.) Back to Our Phylogeny
Once you have obtained your phylogeny, have a good look at it. Are the relationships what you expected, or are there surprises? Make a series of statements about monophyly and relatedness based on your tree.

There is still another important use to your tree. While the tree summarizes relationships among members of your ingroup, it also reflects how the characters have evolved in you ingroup. You can “trace” the evolution of any character by mousing to the “TRACE” submenu at the top of the screen. Ask for a demo.

You can also use MacClade to explore changes in tree topology. Once you get a tree (irrespective of how you got the tree) you can experiment with modifications by literally moving branches, removing lineages, collapsing branches, resolving polytomies, etc. You do this using the tools under the “TOOLS” palette. For example, bring two branches together and then check the difference between the “before” and “after” tree length. Make an evolutionary statement based on the change you made.