"__This web page was produced as an assignment for Genetics 564, an undergraduate capstone course at UW-Madison."__

Phylogenetic trees are used to look at relationships within or among species. These trees are usually created from sequenced data (such as DNA, RNA, or proteins). They can be a useful tool when looking at genomic comparisons. Each tree contains nodes that connect branches. The nodes represent a speciation events, gene duplication events, or birthing events depending on what type of tree you are looking at. At each node the branches coming off of them represent a new species or a variation in sequence data. These trees can be constructed in several different ways, such as; maximum likelihood, maximum parsimony, bayesian, and using distance matrix (1).

Maximum likelihood revolves around determining what is the most probable changes given the parameters of the tree, focuses on what are the most probable sequences. It is representative of all the possible information about the parameters. The nice thing about these trees are that you have known assumptions that you can evaluate and make better. An issue with this method is that it can be very computationally demanding (1).

Maximum parsimony trees are made so that there is the least amount of evolutionary changes possible. It is calculated by assigning a character state at each node on the tree, this is the number of changes needed at that site. This is done for every node and the tree with the overall lowest number of changes is the tree selected. These trees are easy to interpret, but they make assumptions and don't take into account any known knowledge being compared (1).

The bayesian model is similar to that of the maximum likelihood, except that all the parameters are considered to be random variables. The parameters are given statistical distributions, unlike unknown fixed constant values, as in maximum likelihood. The bayesian method gives results that are easy to interpret, but computationally demanding (1).

Distance matrix trees are made up by analyzing substitution rates and then comparing that to genetic distances between sequences. This method is computationally efficient, it does not require comparing many trees. It is good for analyzing large data sets, but can give poor results when comparing divergent sequences (1).

*More detailed information on how phylogenetic trees are created can be found in reference 1*

Maximum likelihood revolves around determining what is the most probable changes given the parameters of the tree, focuses on what are the most probable sequences. It is representative of all the possible information about the parameters. The nice thing about these trees are that you have known assumptions that you can evaluate and make better. An issue with this method is that it can be very computationally demanding (1).

Maximum parsimony trees are made so that there is the least amount of evolutionary changes possible. It is calculated by assigning a character state at each node on the tree, this is the number of changes needed at that site. This is done for every node and the tree with the overall lowest number of changes is the tree selected. These trees are easy to interpret, but they make assumptions and don't take into account any known knowledge being compared (1).

The bayesian model is similar to that of the maximum likelihood, except that all the parameters are considered to be random variables. The parameters are given statistical distributions, unlike unknown fixed constant values, as in maximum likelihood. The bayesian method gives results that are easy to interpret, but computationally demanding (1).

Distance matrix trees are made up by analyzing substitution rates and then comparing that to genetic distances between sequences. This method is computationally efficient, it does not require comparing many trees. It is good for analyzing large data sets, but can give poor results when comparing divergent sequences (1).

*More detailed information on how phylogenetic trees are created can be found in reference 1*

In this trees I have separated them into neighbor joining and average distance, with using percent identity and BLOSUM62 to create the trees for each category. Neighbor joining uses a distance matrix to form a bottoms up approach, and the average distance uses an average of the genetic distance between sequences. BLOSUM62 scores the likelihood of an amino acid sequence, while percent identity focuses on matching identical sequences. From these trees I generated it appears that flies and c. elegans are the least related to human LPHN3, while chimpanzees and rhesus are the most similar which was to be expected. Dog and cow appear to have grouped together along with mice and rats. Although they all share the same domains (besides flies and c. elegans) there appears to be a distinction between mammals and non mammals.

1. Yang Z, Rannala B. Molecular phylogenetics: principles and practice. Nat Rev Genet. 2012 Mar 28;13(5):303-14. doi: 10.1038/nrg3186

Images:

figure 1. http://1.bp.blogspot.com/-q9D_yZogh48/VhPsQoVr6GI/AAAAAAAAABU/OF82NaCSpyU/s1600/640px-Phylogenetic_tree.svg.png

*figures 2-5 were created on Jalview from CLustal Omega*

Powered by