Inquiry information || Parsimony algorithm

Документ взят из кэша поисковой машины. Адрес оригинального документа : http://mouse.belozersky.msu.ru/tools/svetka/articles/parsimony.html
Дата изменения: Fri Jun 9 07:47:11 2006
Дата индексирования: Tue Feb 5 10:46:17 2013
Кодировка:

Some additional inquiry information

Back to the help page

Back to the description of this program

Parsimony method

Parsimony is the simplest method for ancestral sequence reconstruction. It is based on the following principle: the ancestral sequences (posed in nodes of a phylogenetic tree) should be such that the minimal required number of character replacements on the branches of the tree is less than for any other choice of sequences.
We assume that we have a multiple alignment of character sequences and a phylogenetic tree with the leaves labeled by the sequences. Our task is to reconstruct the ancestral character for any node of the tree and for any position (column) of the alignment. The parsimony principle sometimes allows to do this unambigously.

Example 1. Suppose we have the following tree and a position which contains the following letters in the contemporary sequences:

Initial tree

In this case the minimal number of replacements is two and there is only one way to reconstruct the letters in all three ancestral nodes so that this minimal number would be realised:

Reconstructiom
Here and further: branches without changes are black-colored

Example 2:

Initial tree

In this case the minimal number of replacements is also 2, but there are several ways of character reconstruction:

Variant 1
Variant 2
Variant 3
Variant 4

Notice that the node 2 is considered to contain the character A by all variants.

Thus, the result of parsimonious reconstruction is following:

Result
Here "*" means an unknown character

Algorithm

The following is done independently for every alignment position.

This algorithm consists of three parts.
Firstly, we go round all tree nodes (up from leaves to the root) and calculate for every node its set of possible characters. This set is defined as follows:
  1) if the node is a leaf, then the set consists of one element - the character from the correspondent sequence;
  2) if the node is not a leaf (i.e. it surely has posteriors), and the intersection of the sets for all posteriors of this node is not empty, then the set is this intersection;
  3) if the node is not a leaf, and the intersection of the sets of the sets for all posteriors is empty, then the set is the union of all posterior sets.

Tree
Tree Sets
Sets of characters in different nodes

Second, we define a preliminary character for every node. The preliminary character is *, if the set of the node contains more than 1 element; otherwise the preliminary character is the only element of the set.

Third, we define an output character for every node. If the preliminary character of the node is some letter (not the asterisk *), then its output character is also this letter. If the preliminary character of the node is *, but the output character of its immediate ancestor is some letter (say, X), and its set of possible characters contains X, then the output character of the node is also X. If the preliminary character of the node is X, but the above conditions are not satisfied, then the output character is *. According to this rule, the output characters could be determined recursively, from the root to leaves.

Tree
Tree Letters
Defined characters in different nodes

Additionally, the program prints the output characters in two possible manners: in uppercase (that means very probable reconstruction) if the output characters off all immediate ancestors of the node coincide with this letter; or in lowercase (less probable reconstruction) if there are asterisks among the preliminary characters of the immediate characters. Note that these preliminary asterisks can be replaced (on the third stage of algorithm), but the character remains small.

Upstairs

When mistakes or interesting facts are found, please tell!