6.5A: Limitations to the Classic Model of Phylogenetic Trees - Biology

6.5A: Limitations to the Classic Model of Phylogenetic Trees - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The concepts of phylogenetic modeling are constantly changing causing limitations to the classic model to arise.

Learning Objectives

  • Identify the limitations to the classic model of phylogenetic trees

Key Points

  • Charles Darwin sketched the first phylogenetic tree in 1837.
  • A single trunk on a phylogenetic tree represents a common ancestor and the branches represent the divergence of species from this ancestor.
  • Prokaryotes are assumed to evolve clonally in the classic tree model.
  • Horizontal gene transfer is the transfer of genes between unrelated species and, as such, complicates the simple tree model.
  • Ultimate gene transfer has provided theories of genome fusion between symbiotic or endosymbiotic organisms.

Key Terms

  • phylogenetic: of, or relating to the evolutionary development of organisms
  • clonal: pertaining to asexual reproduction
  • horizontal gene transfer: the transfer of genetic material from one organism to another one that is not its offspring; especially common among bacteria

The concepts of phylogenetic modeling are constantly changing. It is one of the most dynamic fields of study in all of biology. Over the last several decades, new research has challenged scientists’ ideas about how organisms are related. New models of these relationships have been proposed for consideration by the scientific community. Many phylogenetic trees have been shown as models of the evolutionary relationship among species. Phylogenetic trees originated with Charles Darwin, who sketched the first phylogenetic tree in 1837, which served as a pattern for subsequent studies for more than a century. The concept of a phylogenetic tree with a single trunk representing a common ancestor, with the branches representing the divergence of species from this ancestor, fits well with the structure of many common trees, such as the oak. However, evidence from modern DNA sequence analysis and newly-developed computer algorithms has caused skepticism about the validity of the standard tree model in the scientific community.

Classical thinking about prokaryotic evolution, included in the classic tree model, is that species evolve clonally. That is, they produce offspring themselves with only random mutations causing the descent into the variety of modern and extinct species known to science. This view is somewhat complicated in eukaryotes that reproduce sexually, but the laws of Mendelian genetics explain the variation in offspring, again, to be a result of a mutation within the species. The concept of genes being transferred between unrelated species was not considered as a possibility until relatively recently. Horizontal gene transfer (HGT), also known as lateral gene transfer, is the transfer of genes between unrelated species. HGT has been shown to be an ever-present phenomenon, with many evolutionists postulating a major role for this process in evolution, thus complicating the simple tree model. Genes have been shown to be passed between species which are only distantly related using standard phylogeny, thus adding a layer of complexity to the understanding of phylogenetic relationships. Finally, as an example of the ultimate gene transfer, theories of genome fusion between symbiotic or endosymbiotic organisms have been proposed to explain an event of great importance: the evolution of the first eukaryotic cell, without which humans could not have come into existence.

103 Perspectives on the Phylogenetic Tree

By the end of this section, you will be able to do the following:

  • Describe horizontal gene transfer
  • Illustrate how prokaryotes and eukaryotes transfer genes horizontally
  • Identify the web and ring models of phylogenetic relationships and describe how they differ from the original phylogenetic tree concept

Phylogenetic modeling concepts are constantly changing. It is one of the most dynamic fields of study in all biology. Over the last several decades, new research has challenged scientists’ ideas about how organisms are related. The scientific community has proposed new models of these relationships.

Many phylogenetic trees are models of the evolutionary relationship among species. Phylogenetic trees originated with Charles Darwin, who sketched the first phylogenetic tree in 1837 ((Figure)a). This served as a prototype for subsequent studies for more than a century. The phylogenetic tree concept with a single trunk representing a common ancestor, with the branches representing the divergence of species from this ancestor, fits well with the structure of many common trees, such as the oak ((Figure)b). However, evidence from modern DNA sequence analysis and newly developed computer algorithms has caused skepticism about the standard tree model’s validity in the scientific community.

Limitations to the Classic Model

Classical thinking about prokaryotic evolution, included in the classic tree model, is that species evolve clonally. That is, they produce offspring themselves with only random mutations causing the descent into the variety of modern-day and extinct species known to science. This view is somewhat complicated in eukaryotes that reproduce sexually, but the laws of Mendelian genetics explain the variation in offspring, again, to be a result of a mutation within the species. Scientists did not consider the concept of genes transferring between unrelated species as a possibility until relatively recently. Horizontal gene transfer (HGT), or lateral gene transfer, is the transfer of genes between unrelated species. HGT is an ever-present phenomenon, with many evolutionists postulating a major role for this process in evolution, thus complicating the simple tree model. Genes pass between species which are only distantly related using standard phylogeny, thus adding a layer of complexity to understanding phylogenetic relationships.

The various ways that HGT occurs in prokaryotes is important to understanding phylogenies. Although at present some do not view HGT as important to eukaryotic evolution, HGT does occur in this domain as well. Finally, as an example of the ultimate gene transfer, some scientists have proposed genome fusion theories between symbiotic or endosymbiotic organisms to explain an event of great importance—the evolution of the first eukaryotic cell, without which humans could not have come into existence.

Horizontal Gene Transfer

Horizontal gene transfer (HGT) is the introduction of genetic material from one species to another species by mechanisms other than the vertical transmission from parent(s) to offspring. These transfers allow even distantly related species to share genes, influencing their phenotypes. Scientists believe that HGT is more prevalent in prokaryotes, but that this process transfers only about 2% of the prokaryotic genome. Some researchers believe such estimates are premature: we must view the actual importance of HGT to evolutionary processes as a work in progress. As scientists investigate this phenomenon more thoroughly, they may reveal more HGT transfer. Many scientists believe that HGT and mutation are (especially in prokaryotes) a significant source of genetic variation, which is the raw material in the natural selection process. These transfers may occur between any two species that share an intimate relationship ((Figure)).

Prokaryotic and Eukaryotic HGT Mechanisms Summary
Mechanism Mode of Transmission Example
Prokaryotes transformation DNA uptake many prokaryotes
transduction bacteriophage (virus) bacteria
conjugation pilus many prokaryotes
gene transfer agents phage-like particles purple non-sulfur bacteria
Eukaryotes from food organisms unknown aphid
jumping genes transposons rice and millet plants
epiphytes/parasites unknown yew tree fungi
from viral infections

HGT in Prokaryotes

HGT mechanisms are quite common in the Bacteria and Archaea domains, thus significantly changing the way scientists view their evolution. The majority of evolutionary models, such as in the Endosymbiont Theory, propose that eukaryotes descended from multiple prokaryotes, which makes HGT all the more important to understanding the phylogenetic relationships of all extant and extinct species. The Endosymbiont Theory purports that the eukaryotes’ mitochondria and the green plants’ chloroplasts and flagellates originated as free-living prokaryotes that invaded primitive eukaryotic cells and become established as permanent symbionts in the cytoplasm.

Microbiology students are well aware that genes transfer among common bacteria. These gene transfers between species are the major mechanism whereby bacteria acquire resistance to antibiotics. Classically, scientists believe that three different mechanisms drive such transfers.

  1. Transformation: bacteria takes up naked DNA
  2. Transduction: a virus transfers the genes
  3. Conjugation: a hollow tube, or pilus transfers genes between organisms

More recently, scientists have discovered a fourth gene transfer mechanism between prokaryotes. Small, virus-like particles, or gene transfer agents (GTAs) transfer random genomic segments from one prokaryote species to another. GTAs are responsible for genetic changes, sometimes at a very high frequency compared to other evolutionary processes. Scientists characterized the first GTA in 1974 using purple, non-sulfur bacteria. These GTAs, which are most likely bacteriophages that lost the ability to reproduce on their own, carry random DNA pieces from one organism to another. Controlled studies using marine bacteria have demonstrated GTAs’ ability to act with high frequency. Scientists have estimated gene transfer events in marine prokaryotes, either by GTAs or by viruses, to be as high as 10 13 per year in the Mediterranean Sea alone. GTAs and viruses are efficient HGT vehicles with a major impact on prokaryotic evolution.

As a consequence of this modern DNA analysis, the idea that eukaryotes evolved directly from Archaea has fallen out of favor. While eukaryotes share many features that are absent in bacteria, such as the TATA box (located in many genes’ promoter region), the discovery that some eukaryotic genes were more homologous with bacterial DNA than Archaea DNA made this idea less tenable. Furthermore, scientists have proposed genome fusion from Archaea and Bacteria by endosymbiosis as the ultimate event in eukaryotic evolution.

HGT in Eukaryotes

Although it is easy to see how prokaryotes exchange genetic material by HGT, scientists initially thought that this process was absent in eukaryotes. After all, prokaryotes are but single cells exposed directly to their environment whereas, the multicellular organisms’ sex cells are usually sequestered in protected parts of the body. It follows from this idea that the gene transfers between multicellular eukaryotes should be more difficult. Scientists believe this process is rarer in eukaryotes and has a much smaller evolutionary impact than in prokaryotes. In spite of this, HGT between distantly related organisms is evident in several eukaryotic species, and it is possible that scientists will discover more examples in the future.

In plants, researchers have observed gene transfer in species that cannot cross-pollinate by normal means. Transposons or “jumping genes” have shown a transfer between rice and millet plant species. Furthermore, fungal species feeding on yew trees, from which the anti-cancer drug TAXOL® is derived from the bark, have acquired the ability to make taxol themselves, a clear example of gene transfer.

In animals, a particularly interesting example of HGT occurs within the aphid species ((Figure)). Aphids are insects that vary in color based on carotenoid content. Carotenoids are pigments that a variety of plants, fungi, and microbes produce, and they serve a variety of functions in animals, who obtain these chemicals from their food. Humans require carotenoids to synthesize vitamin A, and we obtain them by eating orange fruits and vegetables: carrots, apricots, mangoes, and sweet potatoes. Alternatively, aphids have acquired the ability to make the carotenoids on their own. According to DNA analysis, this ability is due to fungal genes transferring into the insect by HGT, presumably as the insect consumed fungi for food. A carotenoid enzyme, or desaturase, is responsible for the red coloration in certain aphids, and when mutation activates this gene, the aphids revert to their more common green color ((Figure)).

Genome Fusion and Eukaryote Evolution

Scientists believe the ultimate in HGT occurs through genome fusion between different prokaryote species when two symbiotic organisms become endosymbiotic. This occurs when one species is taken inside another species’ cytoplasm, which ultimately results in a genome consisting of genes from both the endosymbiont and the host. This mechanism is an aspect of the Endosymbiont Theory, which most biologists accept as the mechanism whereby eukaryotic cells obtained their mitochondria and chloroplasts. However, the role of endosymbiosis in developing the nucleus is more controversial. Scientists believe that nuclear and mitochondrial DNA have different (separate) evolutionary origins, with the mitochondrial DNA derived from the bacteria’s circular genomes ancient prokaryotic cells engulfed. We can regard mitochondrial DNA as the smallest chromosome. Interestingly enough, mitochondrial DNA is inherited only from the mother. The mitochondrial DNA degrades in sperm when the sperm degrades in the fertilized egg or in other instances when the mitochondria located in the sperm’s flagellum fails to enter the egg.

Within the past decade, James Lake of the UCLA/NASA Astrobiology Institute proposed that the genome fusion process is responsible for the evolution of the first eukaryotic cells ((Figure)a). Using DNA analysis and a new mathematical algorithm, conditioned reconstruction (CR), his laboratory proposed that eukaryotic cells developed from an endosymbiotic gene fusion between two species, one an Archaea and the other a Bacteria. As mentioned, some eukaryotic genes resemble those of Archaea whereas, others resemble those from Bacteria. An endosymbiotic fusion event, such as Lake has proposed, would clearly explain this observation. Alternatively, this work is new and the CR algorithm is relatively unsubstantiated, which causes many scientists to resist this hypothesis.

Lake’s more recent work ((Figure)b) proposes that gram-negative bacteria, which are unique within their domain in that they contain two lipid bilayer membranes, resulted from an endosymbiotic fusion of archaeal and bacterial species. The double membrane would be a direct result of the endosymbiosis, with the endosymbiont picking up the second membrane from the host as it was internalized. Scientists have also used this mechanism to explain the double membranes in mitochondria and chloroplasts. Lake’s work is not without skepticism, and the biological science community still debates his ideas. In addition to Lake’s hypothesis, there are several other competing theories as to the origin of eukaryotes. How did the eukaryotic nucleus evolve? One theory is that the prokaryotic cells produced an additional membrane that surrounded the bacterial chromosome. Some bacteria have the DNA enclosed by two membranes however, there is no evidence of a nucleolus or nuclear pores. Other proteobacteria also have membrane-bound chromosomes. If the eukaryotic nucleus evolved this way, we would expect one of the two types of prokaryotes to be more closely related to eukaryotes.

The nucleus-first hypothesis proposes that the nucleus evolved in prokaryotes first ((Figure)a), followed by a later fusion of the new eukaryote with bacteria that became mitochondria. The mitochondria-first hypothesis proposes that mitochondria were first established in a prokaryotic host ((Figure)b), which subsequently acquired a nucleus, by fusion or other mechanisms, to become the first eukaryotic cell. Most interestingly, the eukaryote-first hypothesis proposes that prokaryotes actually evolved from eukaryotes by losing genes and complexity ((Figure)c). All of these hypotheses are testable. Only time and more experimentation will determine which hypothesis data best supports.

Web and Network Models

Recognizing the importance of HGT, especially in prokaryote evolution, has caused some to propose abandoning the classic “tree of life” model. In 1999, W. Ford Doolittle proposed a phylogenetic model that resembles a web or a network more than a tree. The hypothesis is that eukaryotes evolved not from a single prokaryotic ancestor, but from a pool of many species that were sharing genes by HGT mechanisms. As (Figure)a shows, some individual prokaryotes were responsible for transferring the bacteria that caused mitochondrial development to the new eukaryotes whereas, other species transferred the bacteria that gave rise to chloroplasts. Scientists often call this model the “ web of life .” In an effort to save the tree analogy, some have proposed using the Ficus tree ((Figure)b) with its multiple trunks as a phylogenetic way to represent a diminished evolutionary role for HGT.

Ring of Life Models

Others have proposed abandoning any tree-like model of phylogeny in favor of a ring structure, the so-called “ ring of life ” ((Figure)). This is a phylogenetic model where all three domains of life evolved from a pool of primitive prokaryotes. Lake, again using the conditioned reconstruction algorithm, proposes a ring-like model in which species of all three domains—Archaea, Bacteria, and Eukarya—evolved from a single pool of gene-swapping prokaryotes. His laboratory proposes that this structure is the best fit for data from extensive DNA analyses performed in his laboratory, and that the ring model is the only one that adequately takes HGT and genomic fusion into account. However, other phylogeneticists remain highly skeptical of this model.

In summary, we must modify Darwin’s “tree of life” model to include HGT. Does this mean abandoning the tree model completely? Even Lake argues that scientists should attempt to modify the tree model to allow it to accurately fit his data, and only the inability to do so will sway people toward his ring proposal.

This doesn’t mean a tree, web, or a ring will correlate completely to an accurate description of phylogenetic relationships of life. A consequence of the new thinking about phylogenetic models is the idea that Darwin’s original phylogenetic tree concept is too simple, but made sense based on what scientists knew at the time. However, the search for a more useful model moves on: each model serves as hypotheses to test with the possibility of developing new models. This is how science advances. Researchers use these models as visualizations to help construct hypothetical evolutionary relationships and understand the massive amount of data that requires analysis.

Section Summary

The phylogenetic tree, which Darwin first used, is the classic “tree of life” model describing phylogenetic relationships among species, and the most common model that scientists use today. New ideas about HGT and genome fusion have caused some to suggest revising the model to resemble webs or rings.

Review Questions

The transfer of genes by a mechanism not involving asexual reproduction is called:

Particles that transfer genetic material from one species to another, especially in marine prokaryotes:

  1. horizontal gene transfer
  2. lateral gene transfer
  3. genome fusion device
  4. gene transfer agents

What does the trunk of the classic phylogenetic tree represent?

  1. single common ancestor
  2. pool of ancestral organisms
  3. new species
  4. old species

Which phylogenetic model proposes that all three domains of life evolved from a pool of primitive prokaryotes?

Critical Thinking Questions

Compare three different ways that eukaryotic cells may have evolved.

Some hypotheses propose that mitochondria were acquired first, followed by the development of the nucleus. Others propose that the nucleus evolved first and that this new eukaryotic cell later acquired the mitochondria. Still others hypothesize that prokaryotes descended from eukaryotes by the loss of genes and complexity.

Describe how aphids acquired the ability to change color.

Aphids have acquired the ability to make the carotenoids on their own. DNA analysis has demonstrated that this ability is due to the transfer of fungal genes into the insect by HGT, presumably as the insect consumed fungi for food.


Limitations of Phylogenetic Trees

It may be easy to assume that more closely related organisms look more alike, and while this is often the case, it is not always true. If two closely related lineages evolved under significantly varied surroundings or after the evolution of a major new adaptation, it is possible for the two groups to appear more different than other groups that are not as closely related. For example, the phylogenetic tree in the figure below shows that lizards and rabbits both have amniotic eggs, whereas frogs do not yet lizards and frogs appear more similar than lizards and rabbits.

This ladder-like phylogenetic tree of vertebrates is rooted by an organism that lacked a vertebral column. At each branch point, organisms with different characters are placed in different groups based on the characteristics they share.

Another aspect of phylogenetic trees is that, unless otherwise indicated, the branches do not account for length of time, only the evolutionary order. In other words, the length of a branch does not typically mean more time passed, nor does a short branch mean less time passed— unless specified on the diagram. For example, in the figure above, the tree does not indicate how much time passed between the evolution of amniotic eggs and hair. What the tree does show is the order in which things took place. Again using the figure above, the tree shows that the oldest trait is the vertebral column, followed by hinged jaws, and so forth.

Remember that any phylogenetic tree is a part of the greater whole, and like a real tree, it does not grow in only one direction after a new branch develops. So, for the organisms in the figure above, just because a vertebral column evolved does not mean that invertebrate evolution ceased, it only means that a new branch formed. Also, groups that are not closely related, but evolve under similar conditions, may appear more phenotypically similar to each other than to a close relative.


Head to this website to see interactive exercises that allow you to explore the evolutionary relationships among species.


Supertree methods allow constructing trees (called supertrees) that combine phylogenetic information represented by a set of smaller phylogenies with partially overlapped taxa. We used Visual TreeCmp to analyse the accuracy of two variants of the proposed SPR-supertree method (Whidden, Zeh, & Beiko, 2014 ) that is based on computing SPR distances between binary trees. The idea of supertree distance methods, which SPR-supertree belongs to, is that we are trying to find a tree that minimizes the sum of its distances in a particular metric to source phylogenies. Since a supertree contains more taxa than each of the source trees and because most of the phylogenetic metrics require trees with the same leaf sets, before applying a particular method to compute the distance, the supertree has to be pruned to fulfil the requirement. This operation can be easily accomplished in Visual TreeCmp by simply enabling ‘Prune trees’ in the user interface.

We then analysed in detail the accuracy of seabirds supertrees computed by two variants of SPR-supertree method. SPR-RF-TIES and SPR seeded with an Matrix Representation with Parsimony starting tree (SPR-MRP) proposed in Whidden et al. ( 2014 ), see Figure 2. The seabirds dataset (Kennedy & Page, 2002 ) contains 121 different taxa and consists of seven source trees having from 14 to 90 leaves. Based on the analysis presented in Whidden et al. ( 2014 ) we know that an average SPR distance of both the supertrees to source trees equals 17/72.43 and they receive the same parsimony score of 208. SPR-MRP is slightly better (avg. dist. approx. 4.36) than SPR-RF-TIES (4.5) according to RF metric. However, there are many other metrics defined in the literature whose use may lead to a different conclusion. A comparison of the two above-mentioned trees made by using Visual TreeCmp is presented in Figure 2. In Figure 2a we see the distance between SPR-RF-TIES and SPR-MRP supertrees in seven metrics. The graphical representation of the trees, obtained using embedded library (Robinson et al., 2016 ), is presented in Figure 2b. When analysing summary reports containing average distances of analysed supertrees to source trees (Figure 2c,d) we noticed that according to almost all metrics (six out of seven) the SPR-MRP tree is better. Although, in the case of Nodal Splitted metric (Cardona, Llabrés, Rosselló, & Valiente, 2010 ) the order is reversed, the relative difference between the results is quite small (in comparison to std. deviation for example), i.e. 26.68 for SPR-MRP and 26.62 for SPR-RF-TIES.

4. Application to the ribosomal RNA tree of life

Comparisons of rRNA sequences have been central to the debates over the deep structure of the tree of life, and in particular the relationship of eukaryotes to the Archaea [10]. Many early analyses favoured a ‘three domains’ tree, in which the Bacteria, Archaea and eukaryotes were each monophyletic domains. By contrast, more recent analyses taking advantage of an improved sampling of archaeal sequence diversity and using better-fitting substitution models have instead favoured the 𠆎ocyte’ tree of Lake [1], in which the eukaryotic rRNA sequences—taken to represent the host cell lineage for the mitochondrial endosymbiont𠅎merge from within the archaeal radiation [42�]. We analysed a previously published 16-species concatenated rRNA alignment containing 761 sites from the large subunit rRNA gene and 720 sites from the small subunit [29]. Sequences were aligned with MUSCLE [46], MAFFT [47], P rob C ons [48] and K align [49], and a consensus alignment inferred using M-C offee [50]. Poorly aligning sites were identified and removed using BMGE [51] with the default parameters. Analysis of this alignment under the NR model recovered the classic ‘three domains’ topology, in which the eukaryotes emerge as the sister group to a monophyletic Archaea with strong posterior support ( figureਂ a, PP = 0.93 for archaeal monophyly). Based on recently published analyses of rRNA and protein-coding genes, this tree is currently thought to be incorrect [10,42,44,52,53], although it has historically received support from simpler stationary models (reviewed in [10]). This result suggested that, while the NR model can provide useful rooting information, it is subject to many of the same limitations as other stationary models. Inference under the HB model recovered an eocyte tree, with the eukaryotes emerging as the sister group to the ‘TACK’ superphylum of Archaea ( figureਂ b, PP = 0.89 for the eukaryote/TACK clade), consistent with recent phylogenomic analyses [42�]. As in the case of the Thermus dataset, mapping posterior inferences of the most GC-rich branches onto the consensus tree provides an intuitive explanation for the differences in results between the NR and HB models. The branches leading to the common ancestor of the Archaea, and to each of the major archaeal clades, are among the most GC-rich in the phylogeny (ranked first (0.756), sixth (0.639) and eighth (0.621) by posterior mean GC content see figureਂ b and electronic supplementary material, figure S2), but the long branch leading to the common ancestor of the eukaryotes has a much more moderate GC content (ranked 20th overall 0.444). Thus, the eocyte tree requires the placement of a moderate GC branch inside a high GC clade: this is biologically plausible, because we know that sequence composition can change over evolutionary time, but not possible under NR and other stationary models. This result provides some insight into why early analyses with simpler phylogenetic methods often recovered the three domains tree and provides support for the suggestion of Tourasse & Gouy [9] that the eocyte tree might be intrinsically more difficult to recover than the three domains tree.

Compositional shifts in rRNA during the evolution of Bacteria, Archaea and eukaryotes. (a) On these 16 taxa, inference under the NR model recovers a three domains tree in which the eukaryotes form the sister group to a monophyletic Archaea (PP = 0.93). The root in the consensus tree lies on the branch connecting the Bacteria to all other cells, consistent with analyses of anciently duplicated genes [11�]. (b) Inference under the HB model recovers a tree consistent with the 𠆎ocyte’ hypothesis, in which the eukaryotic rRNA sequences emerge from within the Archaea. This relationship is supported by analyses of rRNA that include more taxa and analyses of broadly conserved protein-coding genes [10,43,52]. The branches are labelled in red in order of decreasing posterior mean GC content for example, the branch leading to the common ancestor of the Archaea is the most GC-rich in the tree. Support values are given as Bayesian posterior probabilities, and branch lengths are proportional to the expected number of substitutions per site, as indicated by the scale bar.

For these rRNA genes, the posterior distributions for root splits support the placement of the roots on the consensus trees, showing disagreement between the NR and HB models (see figureਂ and electronic supplementary material, table S2). The NR model favours a root on the branch separating the Bacteria from all other cells, in agreement with traditional paralogue rooting approaches [11�] and analyses of genome networks [54]. By contrast, the HB model places the root within the Bacteria ( figureਂ b) with posterior support equal to 1. Although the root is unresolved on the consensus tree, the root split with the greatest posterior support (electronic supplementary material, table S2, PP = 0.34) groups all the Bacteria except Rhodopirellula on one side of the root. Some authors have argued for a root within the Bacteria on the basis of polarized indels or other rare genomic changes [55,56], although neither of these proposals unites the planctomycetes (here represented by Rhodopirellula) with the Archaea and eukaryotes. While resolution of this issue will clearly require analyses with a greatly improved sampling of Bacteria, we also sought to investigate the reason for the different root inferences under the NR and HB models. Recall that, in the case of the NR model, the σR parameter provides a measure of reversible departures from the HKY85 model while the σN parameter provides a measure of non-reversibility. Plots showing the weight of evidence in the data for different values of σN and σR for the Thermus and tree of life datasets showed markedly different behaviour ( figureਃ ): while both datasets revealed evidence of non-zero values for σR, providing support for GTR-like over HKY85 exchangeabilities, posterior support for non-zero values of σN is clearly greater in the tree of life. Thus, the tree of life dataset shows substantial evidence of non-reversibility in the substitution process within branches, which is not accounted for in the HB model. This observation may provide a partial explanation for the failure of the HB model to recover the most widely accepted root position on this dataset. It also suggests that, beyond the compositional heterogeneity that is increasingly recognized as an important and pervasive feature of real sequence data, some alignments may also contain significant evidence of non-reversibility in the substitution process. This finding agrees with the work of Squartini & Arndt [57], who presented evidence for non-reversibility in the evolution of the Drosophila and human lineages, and motivates the development of phylogenetic models that can account for both non-stationarity and non-reversibility, as these may both be salient features of real sequence data.

Evidence of GTR-like structure and non-reversibility in the Thermus�inococcus and tree of life datasets. The plots show the standardized marginal likelihood (proportional to the posterior density divided by the prior density) for the reversible (σR) and non-reversible (σN) perturbation standard deviation parameters of the NR model. They summarize the weight of evidence from the data for different values of σR and σN given the choice of model and prior. (a) There is strong evidence of non-zero σR for both datasets, suggesting that GTR exchangeabilities are more plausible than HKY85 exchangeabilities in both cases. (b) There is also evidence for small, but non-zero, values of σN in both datasets, which provides evidence of detectable non-reversibility in the data. This is particularly true for the tree of life dataset, which may partially explain the failure of the HB model to recover the most widely accepted root position (i.e. between the Bacteria and Archaea).


Published by the Royal Society under the terms of the Creative Commons Attribution License, which permits unrestricted use, provided the original author and source are credited.


Lake JA, Henderson E, Oakes M, Clark MW

. 1984 Eocytes: a new ribosome structure indicates a kingdom with a close relationship to eukaryotes . Proc. Natl Acad. Sci. USA 81, 3786–3790. (doi:10.1073/pnas.81.12.3786) Crossref, PubMed, ISI, Google Scholar

. 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach . J. Mol. Evol . 17, 368–376. (doi:10.1007/BF01734359) Crossref, PubMed, ISI, Google Scholar

. 1976 Criteria for optimising phylogenetic trees and the problem of determining the root of a tree . J. Mol. Evol . 8, 95–116. (doi:10.1007/BF01739097) Crossref, PubMed, Google Scholar

Tarrío R, Rodríguez-Trelles F, Ayala FJ

. 2000 Tree rooting with outgroups when they differ in their nucleotide composition from the ingroup: the Drosophila saltans and willistoni groups, a case study . Mol. Phylogenet. Evol . 16, 344–349. (doi:10.1006/mpev.2000.0813) Crossref, PubMed, Google Scholar

Holland BR, Penny D, Hendy MD

. 2003 Outgroup misplacement and phylogenetic inaccuracy under a molecular clock—a simulation study . Syst. Biol . 52, 229–238. (doi:10.1080/10635150390192771) Crossref, PubMed, Google Scholar

. 1978 Cases in which parsimony or compatibility methods will be positively misleading . Syst. Zool . 27, 401–410. (doi:10.2307/2412923) Crossref, Google Scholar

. 1998 How good are deep phylogenetic trees? Curr. Opin. Genet. Dev . 8, 616–623. (doi:10.1016/S0959-437X(98)80028-2) Crossref, PubMed, ISI, Google Scholar

Hirt RP, Logsdon JM, Healy B, Dorey MW, Doolittle WF, Embley TM

. 1999 Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins . Proc. Natl Acad. Sci. USA 96, 580–585. (doi:10.1073/pnas.96.2.580) Crossref, PubMed, ISI, Google Scholar

. 1999 Accounting for evolutionary rate variation among sequence sites consistently changes universal phylogenies deduced from rRNA and protein-coding genes . Mol. Phylogenet. Evol . 13, 159–168. (doi:10.1006/mpev.1999.0675) Crossref, PubMed, Google Scholar

Williams TA, Foster PG, Cox CJ, Embley TM

. 2013 An archaeal origin of eukaryotes supports only two primary domains of life . Nature 504, 231–236. (doi:10.1038/nature12779) Crossref, PubMed, ISI, Google Scholar

Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T

. 1989 Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes . Proc. Natl Acad. Sci. USA 86, 9355–9359. (doi:10.1073/pnas.86.23.9355) Crossref, PubMed, ISI, Google Scholar

1989 Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes . Proc. Natl Acad. Sci. USA 86, 6661–6665. (doi:10.1073/pnas.86.17.6661) Crossref, PubMed, ISI, Google Scholar

. 1995 Root of the universal tree of life based on ancient aminoacyl-tRNA synthetase gene duplications . Proc. Natl Acad. Sci. USA 92, 2441–2445. (doi:10.1073/pnas.92.7.2441) Crossref, PubMed, ISI, Google Scholar

. 1999 The rooting of the universal tree of life is not reliable . J. Mol. Evol . 49, 509–523. (doi:10.1007/PL00006573) Crossref, PubMed, ISI, Google Scholar

Gouy R, Baurain D, Philippe H

. 2015 Rooting the tree of life: the phylogenetic jury is still out . Phil. Trans. R. Soc. B 370, 20140329. (doi:10.1098/rstb.2014.0329) Link, ISI, Google Scholar

Drummond AJ, Ho SYW, Phillips MJ, Rambaut A

. 2006 Relaxed phylogenetics and dating with confidence . PLoS Biol . 4, 699–710. (doi:10.1371/journal.pbio.0040088) Crossref, ISI, Google Scholar

Katz LA, Grant JR, Parfrey LW, Burleigh JG

. 2012 Turning the crown upside down: gene tree parsimony roots the eukaryotic tree of life . Syst. Biol . 61, 653–660. (doi:10.1093/sysbio/sys026) Crossref, PubMed, ISI, Google Scholar

Boussau B, Szöllösi G, Duret L

. 2013 Genome-scale coestimation of species and gene trees . Genome Res . 23, 323–330. (doi:10.1101/gr.141978.112) Crossref, PubMed, ISI, Google Scholar

. 1987 Statistical analysis of hominoid molecular evolution . Stat. Sci . 2, 191–210. (doi:10.1214/ss/1177013353) Crossref, Google Scholar

Jayaswal V, Jermiin LS, Robinson J

. 2005 Estimation of phylogeny using a general Markov model . Evol. Bioinform. Online 1, 62–80. Crossref, Google Scholar

Jayaswal V, Ababneh F, Jermiin LS, Robinson J

. 2011 Reducing model complexity of the general Markov model of evolution . Mol. Biol. Evol . 28, 3045–3059. (doi:10.1093/molbev/msr128) Crossref, PubMed, Google Scholar

. 1995 On the use of nucleic acid sequences to infer early branchings in the tree of life . Mol. Biol. Evol . 12, 451–458. PubMed, ISI, Google Scholar

. 2004 Modeling compositional heterogeneity . Syst. Biol . 53, 485–495. (doi:10.1080/10635150490445779) Crossref, PubMed, ISI, Google Scholar

. 2006 A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution . Mol. Biol. Evol . 23, 2058–2071. (doi:10.1093/molbev/msl091) Crossref, PubMed, ISI, Google Scholar

. 2008 A site- and time-heterogeneous model of amino acid replacement . Mol. Biol. Evol . 25, 842–858. (doi:10.1093/molbev/msn018) Crossref, PubMed, ISI, Google Scholar

Huelsenbeck JP, Bollback JP, Levine AM

. 2002 Inferring the root of a phylogenetic tree . Syst. Biol . 51, 32–43. (doi:10.1080/106351502753475862) Crossref, PubMed, ISI, Google Scholar

. 2006 Efficient likelihood computations with nonreversible models of evolution . Syst. Biol . 55, 756–768. (doi:10.1080/10635150600975218) Crossref, PubMed, Google Scholar

Cherlin S, Nye TMW, Boys RJ, Heaps SE, Williams TA, Embley TM

. 2015 The effect of non-reversibility on inferring rooted phylogenies . See Google Scholar

Heaps SE, Nye TMW, Boys RJ, Williams TA, Embley TM

. 2014 Bayesian modelling of compositional heterogeneity in molecular phylogenetics . Stat. Appl. Genet. Mol. Biol . 13, 589–609. (doi:10.1515/sagmb-2013-0077) Crossref, PubMed, Google Scholar

. 2001 A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach . Mol. Biol. Evol . 18, 691–699. (doi:10.1093/oxfordjournals.molbev.a003851) Crossref, PubMed, ISI, Google Scholar

. 1986 Some probabilistic and statistical problems in the analysis of DNA sequences . In Lectures on mathematics in the life sciences , vol. 17, pp. 57–86. Providence, RI: American Mathematical Society. Google Scholar

Hasegawa M, Kishino H, Yano T

. 1985 Dating of the human–ape splitting by a molecular clock of mitochondrial DNA . J. Mol. Evol . 22, 160–174. (doi:10.1007/BF02101694) Crossref, PubMed, ISI, Google Scholar

. 1994 Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods . J. Mol. Evol . 39, 306–314. (doi:10.1007/BF00160154) Crossref, PubMed, ISI, Google Scholar

. 2005 Deinococcus radiodurans—the consummate survivor . Nat. Rev. Microbiol . 3, 882–892. (doi:10.1038/nrmicro1264) Crossref, PubMed, ISI, Google Scholar

. 1969 Thermus aquaticus gen. n. and sp. n., a nonsporulating extreme thermophile . J. Bacteriol . 98, 289–297. PubMed, Google Scholar

Omelchenko MV, Wolf YI, Gaidamakova EK, Matrosova VY, Vasilenko A, Zhai M, Daly MJ, Koonin EV, Makarova KS

. 2005 Comparative genomics of Thermus thermophilus and Deinococcus radiodurans: divergent routes of adaptation to thermophily and radiation resistance . BMC Evol. Biol . 5, 57. (doi:10.1186/1471-2148-5-57) Crossref, PubMed, Google Scholar

Embley TM, Thomas RH, Williams RAD

. 1993 Reduced thermophilic bias in the 16S rDNA sequence from Thermus ruber provides further support for a relationship between Thermus and Deinococcus . Syst. Appl. Microbiol . 16, 25–29. (doi:10.1016/S0723-2020(11)80247-X) Crossref, Google Scholar

2013 Insights into the phylogeny and coding potential of microbial dark matter . Nature 499, 431–437. (doi:10.1038/nature12352) Crossref, PubMed, ISI, Google Scholar

. 1997 Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in prokaryotes . J. Mol. Evol . 44, 632–636. (doi:10.1007/PL00006186) Crossref, PubMed, Google Scholar

. 2003 A classification of consensus methods for phylogenetics . DIMACS Ser. Discret. Math. Theor. Comput. Sci . 61, 163–184. Crossref, Google Scholar

. 2007 Summarizing a posterior distribution of trees using agreement subtrees . Syst. Biol . 56, 578–590. (doi:10.1080/10635150701485091) Crossref, PubMed, ISI, Google Scholar

Williams TA, Foster PG, Nye TMW, Cox CJ, Embley TM

. 2012 A congruent phylogenomic signal places eukaryotes within the Archaea . Proc. R. Soc. B 279, 4870–4879. (doi:10.1098/rspb.2012.1795) Link, ISI, Google Scholar

. 2014 Archaeal ‘dark matter’ and the origin of eukaryotes . Genome Biol. Evol . 6, 474–481. (doi:10.1093/gbe/evu031) Crossref, PubMed, ISI, Google Scholar

Lasek-Nesselquist E, Gogarten JP

. 2013 The effects of model choice and mitigating bias on the ribosomal tree of life . Mol. Phylogenet. Evol . 69, 17–38. (doi:10.1016/j.ympev.2013.05.006) Crossref, PubMed, Google Scholar

. 2011 The archaeal ‘TACK’ superphylum and the origin of eukaryotes . Trends Microbiol . 19, 580–587. (doi:10.1016/j.tim.2011.09.002) Crossref, PubMed, ISI, Google Scholar

. 2004 MUSCLE: multiple sequence alignment with high accuracy and high throughput . Nucleic Acids Res . 32, 1792–1797. (doi:10.1093/nar/gkh340) Crossref, PubMed, ISI, Google Scholar

Katoh K, Kuma K, Toh H, Miyata T

. 2005 MAFFT version 5: improvement in accuracy of multiple sequence alignment . Nucleic Acids Res . 33, 511–518. (doi:10.1093/nar/gki198) Crossref, PubMed, ISI, Google Scholar

Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S

. 2005 ProbCons: Probabilistic consistency-based multiple sequence alignment . Genome Res . 15, 330–340. (doi:10.1101/gr.2821705) Crossref, PubMed, Google Scholar

Lassmann T, Sonnhammer ELL

. 2005 Kalign—an accurate and fast multiple sequence alignment algorithm . BMC Bioinform. 6, 298. (doi:10.1186/1471-2105-6-298) Crossref, PubMed, Google Scholar

Wallace IM, O'Sullivan O, Higgins DG, Notredame C

. 2006 M-Coffee: combining multiple sequence alignment methods with T-Coffee . Nucleic Acids Res . 34, 1692–1699. (doi:10.1093/nar/gkl091) Crossref, PubMed, Google Scholar

. 2010 BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments . BMC Evol. Biol . 10, 210. (doi:10.1186/1471-2148-10-210) Crossref, PubMed, ISI, Google Scholar

Cox CJ, Foster PG, Hirt RP, Harris SR, Embley TM

. 2008 The archaebacterial origin of eukaryotes . Proc. Natl Acad. Sci. USA 105, 20 356–20 361. (doi:10.1073/pnas.0810647105) Crossref, ISI, Google Scholar

Foster PG, Cox CJ, Embley TM

. 2009 The primary divisions of life: a phylogenomic approach employing composition-heterogeneous methods . Phil. Trans. R. Soc. B 364, 2197–2207. (doi:10.1098/rstb.2009.0034) Link, ISI, Google Scholar

Dagan T, Roettger M, Bryant D, Martin W

. 2010 Genome networks root the tree of life between prokaryotic domains . Genome Biol. Evol . 2, 379–392. (doi:10.1093/gbe/evq025) Crossref, PubMed, ISI, Google Scholar

. 2006 Rooting the tree of life by transition analyses . Biol. Direct 1, 19. (doi:10.1186/1745-6150-1-19) Crossref, PubMed, Google Scholar

Lake JA, Skophammer RG, Herbold CW, Servin JA

. 2009 Genome beginnings: rooting the tree of life . Phil. Trans. R. Soc. B 364, 2177–2185. (doi:10.1098/rstb.2009.0035) Link, Google Scholar

. 2008 Quantifying the stationarity and time reversibility of the nucleotide substitution process . Mol. Biol. Evol . 25, 2525–2535. (doi:10.1093/molbev/msn169) Crossref, PubMed, Google Scholar

. 2011 Adaptation to environmental temperature is a major determinant of molecular evolutionary rates in Archaea . Mol. Biol. Evol . 28, 2661–2674. (doi:10.1093/molbev/msr098) Crossref, PubMed, Google Scholar

Petitjean C, Deschamps P, López-García P, Moreira D

. 2014 Rooting the domain Archaea by phylogenomic analysis supports the foundation of the new kingdom Proteoarchaeota . Genome Biol. Evol . 7, 191–204. (doi:10.1093/gbe/evu274) Crossref, PubMed, Google Scholar

. 2004 A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process . Mol. Biol. Evol . 21, 1095–1109. (doi:10.1093/molbev/msh112) Crossref, PubMed, ISI, Google Scholar

Simulation model

Modeling Ecosystem Dynamics

  • Conceptual models describe ecosystem structure, while analytical and simulationmodels use algorithms to predict ecosystem dynamics.
  • In these cases, scientists often use analytical or simulationmodels.
  • Like analytical models, simulationmodels use complex algorithms to predict ecosystem dynamics.
  • However, sophisticated computer programs have enabled simulationmodels to predict responses in complex ecosystems.
  • Compare and contrast conceptual, analytical, and simulationmodels of ecosystem dynamics

Studying Ecosystem Dynamics

  • Many different models are used to study ecosystem dynamics, including holistic, experimental, conceptual, analytical, and simulationmodels.
  • Three basic types of ecosystem modeling are routinely used in research and ecosystem management: conceptual models, analytical models, and simulationmodels.
  • Analytical and simulationmodels are mathematical methods of describing ecosystems that are capable of predicting the effects of potential environmental changes without direct experimentation, although with limitations in accuracy.
  • A simulationmodel is created using complex computer algorithms to holistically model ecosystems and to predict the effects of environmental disturbances on ecosystem structure and dynamics.
  • Differentiate between conceptual, analytical, and simulationmodels of ecosystem dynamics, and mesocosm and microcosm research studies

Web, Network, and Ring of Life Models

  • To more accurately describe the phylogenetic relationships of life, web and ring models have been proposed as updates to tree models.
  • This model is often called the "web of life."
  • However, phylogeneticists remain highly skeptical of this model.
  • In the (a) phylogenetic model proposed by W.
  • Describe the web, network, and ring of life models of phylogenetic trees

Genetic Drift

  • Ten simulations of random genetic drift of a single given allele with an initial frequency distribution 0.5 measured over the course of 50 generations, repeated in three reproductively synchronous populations of different sizes.
  • In these simulations, alleles drift to loss or fixation (frequency of 0.0 or 1.0) only in the smallest population.Effect of population size on genetic drift: Ten simulations each of random change in the frequency distribution of a single hypothetical allele over 50 generations for different sized populations first population size n=20, second population n=200, and third population n=2000.

Limitations to the Classic Model of Phylogenetic Trees

  • The concepts of phylogenetic modeling are constantly changing causing limitations to the classic model to arise.
  • The concepts of phylogenetic modeling are constantly changing.
  • New models of these relationships have been proposed for consideration by the scientific community.
  • Many phylogenetic trees have been shown as models of the evolutionary relationship among species.
  • Classical thinking about prokaryotic evolution, included in the classic tree model, is that species evolve clonally.

Use of Whole-Genome Sequences of Model Organisms

  • Sequencing genomes of model organisms allows scientists to study homologous proteins in more complex eukaryotes, such as humans.
  • By 1997, the genome sequences of two important model organisms were available: the bacterium Escherichia coli K12 and the yeast Saccharomyces cerevisiae.
  • Much basic research is performed using model organisms because the information can be applied to the biological processes of genetically-similar organisms.
  • It is the most-studied eukaryotic model organism in molecular and cell biology, similar to E. coli's role in the study of prokaryotic organisms.
  • Saccharomyces cerevisiae, a yeast, is used as a model organism for studying signaling proteins and protein-processing enzymes which have homologs in humans.

Sliding Filament Model of Contraction

  • In the sliding filament model, the thick and thin filaments pass each other, shortening the sarcomere.
  • The sliding filament model describes the process used by muscles to contract.
  • To understand the sliding filament model requires an understanding of sarcomere structure.
  • At the level of the sliding filament model, expansion and contraction only occurs within the I and H-bands.

Varying Rates of Speciation

  • As their ideas take shape and as research reveals new details about how life evolves, they develop models to help explain rates of speciation.
  • In terms of how quickly speciation occurs, two patterns are currently observed: the gradual speciation model and the punctuated equilibrium model.
  • In the gradual speciation model, species diverge gradually over time in small steps.
  • In the punctuated equilibrium model, a new species changes quickly from the parent species and then remains largely unchanged for long periods of time afterward.
  • This early change model is called punctuated equilibrium, because it begins with a punctuated or periodic change and then remains in balance afterward.

Defining Population Evolution

Electron Shells and the Bohr Model

  • Niels Bohr proposed an early model of the atom as a central nucleus containing protons and neutrons being orbited by electrons in shells.
  • An early model of the atom was developed in 1913 by Danish scientist Niels Bohr (1885–1962).
  • The Bohr model shows the atom as a central nucleus containing protons and neutrons with the electrons in circular orbitals at specific distances from the nucleus .
  • The Bohr model was developed by Niels Bohr in 1913.
  • In this model, electrons exist within principal shells.
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.

Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough

Citation: Philippe H, Brinkmann H, Lavrov DV, Littlewood DTJ, Manuel M, Wörheide G, et al. (2011) Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough. PLoS Biol 9(3): e1000602.

Academic Editor: David Penny, Massey University, New Zealand

Published: March 15, 2011

Copyright: © 2011 Philippe et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The work was funded by NSERC (, CRC (, Agence Nationale de la Recherche (, ARC Biomod (, and DFG ( The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: BS, bootstrap support EST, expressed sequence tag LBA, long branch attraction

In the quest to reconstruct the Tree of Life, researchers have increasingly turned to phylogenomics, the inference of phylogenetic relationships using genome-scale data (Box 1). Mesmerized by the sustained increase in sequencing throughput, many phylogeneticists entertained the hope that the incongruence frequently observed in studies using single or a few genes [1] would come to an end with the generation of large multigene datasets. Yet, as so often happens, reality has turned out to be far more complex, as three recent large-scale analyses, one published in PLoS Biology [2]–[4], make clear. The studies, which deal with the early diversification of animals, produced highly incongruent (Box 2) findings despite the use of considerable sequence data (see Figure 1). Clearly, merely adding more sequences is not enough to resolve the inconsistencies.

Box 1. From Phylogenetics to Phylogenomics

Phylogenetics, the determination of evolutionary relationships among organisms, is central to our understanding of the evolution of life. For instance, the three phylogenies of Figure 1 entail profoundly different interpretations about the complexity of the common ancestor of all animals. Important body plan characters (e.g., neurosensory and digestive systems and muscle cells) are found in cnidarians, ctenophores, and bilaterians but not in sponges and placozoans. According to the phylogenies of Schierwater et al. [4] and Dunn et al. [2], the taxonomic distribution of these characters implies either (i) that the ancestral metazoan already featured these traits and that sponges (and placozoans) have secondarily lost them or (ii) that these characters were acquired several times independently by convergence (e.g., in the cnidarian + ctenophore and in the bilaterian lineages, according to the tree of Figure 1A). In contrast, the phylogeny of Philippe et al. [3] is more congruent with morphological characters and compatible with a simple metazoan ancestor and a later emergence of these characters only once, in the lineage leading to the common ancestor of coelenterates (cnidarians+ctenophores) and bilaterians.

Phylogenies are generally depicted as trees (which are non-reticulated graphs, as in Figure 1) because vertical evolution is undisputedly the primary mechanism of inheritance for genetic material. However, the existence of horizontal transmission (e.g., hybridization of closely related taxa, organelle acquisition through endosymbiosis and horizontal gene transfer) makes phylogenetic trees only pragmatic approximations, which will probably be replaced by phylogenetic networks in the long term (particularly for unicellular organisms).

Recently, phylogenomics, the use of genomic data to infer evolutionary relationships, has emerged as a new domain of phylogenetics. The main strength of phylogenomics is the drastic reduction in random (or sampling) error brought by the use of large (multigene) datasets. Numerous approaches can be used to take advantage of genomic data (for review see [49]). Briefly, new methods based on oligonucleotide content, gene content, or intron positions look promising (as shown by their ability to yield reasonable trees) but require additional theoretical developments to achieve their full potential. That is why the two most popular phylogenomic approaches are simple extensions of the standard phylogenetics methods applied to single-gene datasets. The first, known as the “supermatrix” (or superalignment), consists in concatenating numerous orthologous genes into a single supergene, which is analyzed using standard methods (or slightly modified methods such as separate models allowing for multiple sets of branch lengths [50]). The second, “supertree,” approach takes the opposite path by first inferring a tree for each gene in the dataset and then combining these individual trees into a single supertree. The supermatrix approach is the most commonly used, in agreement with the handful of studies suggesting that it offers greater accuracy than the supertree [13],[51], though this remains to be formally demonstrated.

Box 2. Glossary

Homology/orthology/paralogy/xenology: Genes that derive from a common ancestor are termed homologs. Two homologous genes are orthologous if they diverged through a speciation event. In contrast, paralogs originate by duplication of a single gene within a given lineage, whereas xenologs result from the horizontal transfer of a gene from a donor species to a receiver species (which might eventually get its original copy replaced by the xenolog).

Homoplasy/convergence: Spurious similarity due to convergence or reversion and not to common ancestry is termed homoplasy. Convergence describes the independent acquisition by separate evolutionary lineages of the same nucleotide (or amino acid) at a given position. This is a direct consequence of multiple substitutions.

Incomplete lineage sorting: The transient retention of ancestral polymorphisms across speciation events. Speciations compressed in time and large reproductive populations both increase the likelihood of this phenomenon. Considering three lineages having rapidly diverged, by chance some sequence positions will be shared between one pair, while others will be shared between another pair, and yet others between the third possible pair, hence blurring the phylogenetic signal on the corresponding branches.

Incongruence: Two (or more) phylogenetic trees are said to be incongruent when they exhibit conflicting branching orders (i.e., topologies) and cannot be superimposed. This implies that at least one node (also known as a bipartition) present in one tree is not found in the other(s), where it is replaced by alternative groupings of taxa.

Model of sequence evolution: A statistical description of the process of substitution in nucleotide or amino acid sequences. Complex models better approximate the evolutionary process but at the expense of more parameters and computational time. As parameter-rich models require more data to behave properly, they have become really useful with the advent of phylogenomic datasets.

Monophyly: To be considered monophyletic, a taxonomic group must satisfy two conditions: (i) all its taxa must derive from a single ancestor and, reciprocally, (ii) all taxa deriving from this common ancestor must belong to the group.

Non-phylogenetic signal: The combination of different kinds of structured noise (e.g., undetected homoplasies) that compete with the genuine phylogenetic signal during tree reconstruction. Even if the non-phylogenetic content is partly a property of a multiple sequence alignment (notably related to its saturation level), the non-phylogenetic signal actually inferred heavily depends on the method and the model of evolution selected. In probabilistic methods, the non-phylogenetic signal mainly results from the data violating the model of sequence evolution. These violations arise because our models are inevitably oversimplified in comparison to the complexity of the natural evolutionary process. Eventually, the apparent signal analyzed will be a blend of phylogenetic and non-phylogenetic signal.

Outgroup/ingroup: Nearly all tree reconstruction methods produce unrooted trees, in which inferred relationships do not convey any information about the direction of time. To root a tree and turn it into a phylogeny, one has to include in the analysis a group of taxa that are known to be outside the group under study. This reference group is termed the outgroup, while the taxa of interest make the ingroup.

Patristic distance: The sum of the lengths of the branches that connect two nodes in a phylogenetic tree, where those nodes are typically terminal nodes representing extant taxa. It is thus an inferred distance (taking into account multiple substitutions) greater than the uncorrected distance directly computed from the number of differences observed between the two corresponding sequences in the alignment.

Phylogenetic signal/synapomorphy: The substitutions occurring along a given branch of the evolutionary tree. The strength of the phylogenetic signal is proportional to the number of substitutions occurring along the branch. In non-probabilistic methods, the signal is encoded in synapomorphies, i.e., shared residues (nucleotides or amino acids) at aligned positions that are specific to a set of sequences derived from a common ancestor. In probabilistic methods, the amount of phylogenetic signal actually extracted from a given dataset depends on the model and is expected to increase with the fit of the model to the data (i.e., the ability of the model to explain the data).

Phylogenetic tree: A (connected acyclic) graph describing the estimated evolutionary relationships among a group of species. In molecular trees, branch lengths are proportional to the genetic distances (and hence to some extent to time) inferred from the analysis of a multiple alignment of homologous sequences (nucleotide or amino acid sequences).

Probabilistic methods: A family of tree reconstruction methods from multiple sequence alignments that are grounded in statistical theory and make use of explicit models of sequence evolution. These include maximum likelihood and Bayesian inference approaches and are known to be the most accurate but also the most computationally demanding.

Saturation: When sequences in a multiple alignment have undergone so many multiple substitutions that apparent distances largely underestimate the real genetic distances, the alignment is said to be saturated. Phylogenetic inference works best with datasets that are only slightly saturated. Owing to their reduced state space (four possible bases), nucleotide sequences saturate more rapidly than protein sequences (20 possible amino acids).

Site-homogeneous/site-heterogeneous models: Most models of sequence evolution assume that the same evolutionary process takes place at every position (or site) of an alignment. With such models, only the evolutionary rate can be modeled as heterogeneous across sites, usually through a gamma distribution of rates. However, selective constraints are known to be quite heterogeneous across positions, hence seriously violating the hypotheses of site-homogeneous models. On the other hand, site-heterogeneous models assume that the evolutionary process varies widely across sites, in particular the set of acceptable amino acids (e.g., in the CAT model). A number of studies have demonstrated that site-heterogeneous models provide a better fit to phylogenomic datasets and tend to reduce the sensitivity to tree reconstruction artifacts (e.g., LBA).

(A) Schierwater et al. [4] tree. (B) Dunn et al. [2] tree. (C) Philippe et al. [3] tree. Numbers in parentheses after taxon names indicate the number of species included in the dataset for the corresponding taxon. Bootstrap support values above 90% are indicated by a bullet (for nodes) or by underlining (for terminal taxa). It is worth mentioning that the monophyly of Porifera is not unequivocally accepted [28],[46] only the analysis of 30,000 positions with a rich taxon sampling and a complex model of evolution recovers it with significant statistical support [3]. Although such a sparse phylogenetic signal will require harnessing the full potential of phylogenomics to be confidently solved, this question is outside the scope of this study. Simplified drawings (redrawn from [74]) on the bottom illustrate the huge morphological disparity existing between the five terminal taxa. Porifera correspond to sponges Cnidaria to sea anemones, jellyfishes, and allies Ctenophora to comb jellies and Bilateria to all other animals (characterized by their bilateral symmetry) except Trichoplax (Placozoa), which appears to be morphologically the most simply organized animal phylum.

Here, taking these three studies as a case in point, we discuss pitfalls that the simple addition of sequences cannot avoid, and show how the observed incongruence can be largely overcome and how improved bioinformatics methods can help reveal the full potential of phylogenomics.


Aim Phylogenetic diversity can provide insight into how evolutionary processes may have shaped contemporary patterns of species richness. Here, we aim to test for the influence of phylogenetic history on global patterns of amphibian species richness, and to identify areas where macroevolutionary processes such as diversification and dispersal have left strong signatures on contemporary species richness.

Location Global equal-area grid cells of approximately 10,000 km 2 .

Methods We generated an amphibian global supertree (6111 species) and repeated analyses with the largest available molecular phylogeny (2792 species). We combined each tree with global species distributions to map four indices of phylogenetic diversity. To investigate congruence between global spatial patterns of amphibian species richness and phylogenetic diversity, we selected Faith’s phylogenetic diversity (PD) index and the total taxonomic distinctness (TTD) index, because we found that the variance of the other two indices we examined (average taxonomic distinctness and mean root distance) strongly depended on species richness. We then identified regions with unusually high or low phylogenetic diversity given the underlying level of species richness by using the residuals from the global relationship of species richness and phylogenetic diversity.

Results Phylogenetic diversity as measured by either Faith’s PD or TTD was strongly correlated with species richness globally, while the other two indices showed very different patterns. When either Faith’s PD or TTD was tested against species richness, residuals were strongly spatially structured. Areas with unusually low phylogenetic diversity for their associated species richness were mostly on islands, indicating large radiations of few lineages that have successfully colonized these archipelagos. Areas with unusually high phylogenetic diversity were located around biogeographic contact zones in Central America and southern China, and seem to have experienced high immigration or in situ diversification rates, combined with local persistence of old lineages.

Main conclusions We show spatial structure in the residuals of the relationship between species richness and phylogenetic diversity, which together with the positive relationship itself indicates strong signatures of evolutionary history on contemporary global patterns of amphibian species richness. Areas with unusually low and high phylogenetic diversity for their associated richness demonstrate the importance of biogeographic barriers to dispersal, colonization and diversification processes.


Similar to the way in which independent contrasts (Felsenstein, 1985 ) paved the way for major progress in comparative methods for quantitative traits (e.g. PGLS Grafen, 1989 Blomberg's K Blomberg et al., 2003 DOT test Ackerly et al., 2006 ), the publication of joint character evolution and diversification rates models (Maddison et al., 2007 ) has triggered the development of an entire suite of methods applicable to a broader range of traits and evolutionary questions (e.g. FitzJohn, 2010 Goldberg et al., 2011 Magnuson-Ford & Otto, 2012 ). The use of these methods has exposed some of the complexities of testing SDD, such as the potential for transition rate asymmetries to produce patterns similar to key innovations and dead ends (e.g. Johnson et al., 2011 ). Although differences in rates of gain and loss are biologically realistic for many traits (Ree & Donoghue, 1999 Wiens, 2001 ), distinguishing these trends from differential diversification was previously difficult in the absence of joint models (Maddison, 2006 ).

With these new and potentially powerful methods, however, modern comparative biologists find themselves faced with a fresh set of challenges. As the range of comparative methods continues to expand, there are a myriad of options for building complex evolutionary models for continuous or discrete characters, with a single or multiple characters, with anagenetic and/or cladogenetic trait changes, etc. It is also possible to allow for heterogeneity in processes (e.g. transition rates) across the tree or over time (e.g. Johnson et al., 2011 ), although there is no well-developed approach for simultaneously identifying the optimal number and placement of break points (as in Alfaro et al., 2009 ). With this flexibility, it is tempting to saturate analyses with parameters to capture the range of biological phenomena that may play a part in a lineage's history. However, creating empirical data sets with the hundreds of species needed for evaluating complex models may be arduous (especially as the most interesting characters are often time-consuming to score) and, in some cases, this effort may be unnecessary if the same questions can be adequately addressed with simpler models. Thus, we stress the need for careful experimental design that considers the match between the macroevolutionary questions, the study system, and the available methods (Freckleton, 2009 ). As with any experiment, comparative biologists should take the time to explore their data and consider alternative explanations (such as a codistributed character) in interpreting significant results from diversification analyses (Maddison et al., 2007 Maddison & FitzJohn, in press ).

In the coming years, we anticipate continued development to extend existing phylogenetic comparative methods to include diversification parameters and to create new bridges with palaeontological research. For example, in the same way that the MuSSE model allows for SDD to be included in Pagel's ( 1994 ) test of correlated evolution, the QuaSSE model for continuous traits could be extended to create the equivalent of phylogenetic generalized least squares (Grafen, 1989 ) for estimating trait correlations. An SDD extension of phylogenetic path analysis (Hardenberg & Gonzalez-Voyer, 2013 ) would also be appealing for cases where a researcher predicts that a character affects diversification, but only indirectly through its effects on another character. There is also great interest in integrating fossil information with data from extant taxa (Fritz et al., 2013 Pennell & Harmon, 2013 ), which will also help increase the power to estimate extinction rates (Quental & Marshall, 2010 Rabosky, 2010 Pyron & Burbrink, 2012 ) however, much of this effort has thus far only focused on character evolution (e.g. Slater et al., 2012 ) rather than the joint estimation of transition and diversification rates. Furthermore, there has also been increasing focus on the effect of species interactions and changing abiotic and biotic conditions on patterns of diversity (reviewed in Pyron & Burbrink, 2013 Rabosky, 2013 Morlon, 2014 ). The integration of SDD, diversity dependent rates and rate heterogeneity throughout the tree in the same analysis is still yet to come.

As these statistical comparative analyses bring greater insight into the types of traits that shape lineage history, a grand challenge is to connect results about macroevolutionary processes with processes observed at an ecological timescale (e.g. Kisel et al., 2012 Rabosky & Matute, 2013 ). Comparative approaches provide powerful tools for testing evolutionary questions at broad scales, such as whether trait evolution exhibits directional trends or whether functional innovations are required for adaptive radiations. However, understanding the biology that underlies such findings relies on integrating knowledge and approaches from other fields. For example, directional evolution may arise due to the nature of the genetic or developmental changes associated with phenotypic transitions (Igic et al., 2006 Rausher, 2008 ). Determining the mechanisms by which traits alter diversification rates may be even more challenging. For key innovations, a reasonable first step may be to functionally test how the trait changes ecological performance (Galis, 2001 ), whereas for evolutionary dead ends, experiments may target whether the trait limits adaptive evolution. Ultimately, integrating phylogenetic comparative methods with other approaches, from development to ecology, will provide a more comprehensive understanding of the proximate causes of SDD. Together, this combined approach studying both macro- and microevolutionary processes will allow us to get to the root of how traits shape trees.


  1. Tagal

    It's still fun :)

  2. Linly

    and how is it necessary to act in this case?

  3. Raedeman

    It you science.

  4. Rayder

    Nothing special

Write a message