Thesis (Ph.D., Bioinformatics & Computational Biology) -- University of Idaho, 2015 | One of the most enduring puzzles in evolutionary biology is how processes operating at the level of populations (microevolution) scale up to large-scale patterns of diversity (macroevolution). Recent advances in our ability to infer the historical pattern of evolutionary branching---the phylogeny---for many groups of organisms have provided opportunities to gain new perspectives on this question. In this dissertation I develop statistical methods, computational machinery, and theoretical frameworks that will enable researchers to make more meaningful inferences about the processes that have driven diversity through deep time using phylogenetic data.
In my opening chapter, I develop a theoretical foundation for how researchers can use models of trait evolution to test hypotheses related to the long-controversial theory of punctuated equilibrium, which asserts that speciation causes rapid evolution against a backdrop of stasis. I break the hypothesis down into four key elements and argue that combining these conceptually distinct ideas under the single framework of punctuated equilibrium is distracting and confusing, and more likely to hinder progress than to spur it.
Next, I present a suite of statistical software, written in the R programming language, for fitting evolutionary models to phylogenetic data. This is a complete overhaul of the popular geiger package designed to facilitate analyses of large and complex comparative data sets.
As an example of how phylogenetic models of trait evolution can provide complimentary insights to population-level models, I investigate the evolution of sex chromosome-autosome fusions. Using discrete character models and a recently compiled database of sexual systems, I find that Y-autosome fusions occur at a much higher rate than X-, Z-, or W-autosome fusions in fish and squamate reptiles. This result grounded a theoretical investigation into the evolutionary forces driving sex chromosome fusions---the phylogenetic results allowed my collaborators and I to exclude from consideration several existing theories for why fusions become fixed in populations. Specifically, we found that the phylogenetic results cannot be accounted for by either direct or sexually antagonistic selection on their own. We argue that the observed patterns can be best explained when chromosomal fusions occur more frequently in males, are slightly deleterious, and are primarily fix by drift.
In the final two chapters, I address two outstanding statistical problems that hinder the use and interpretation of phylogenetic models of trait evolution. First, I develop a novel statistical framework for assessing the absolute fit, or adequacy, of phylogenetic models of trait evolution. To date, researchers have focused almost exclusively on the relative explanatory power of alternative models, rather than the ability of a model to provide a good explanation for data on its own terms. I use my approach to evaluate the statistical performance of commonly used trait models on 337 comparative data sets covering three key functional traits of angiosperms ("flowering plants"). In general, the models I considered often provide poor statistical explanations for the evolution of these traits. This was true for many different groups and at many different scales. Whether such statistical inadequacy will qualitatively alter inferences drawn from comparative datasets will depend on the context. Regardless, assessing model adequacy can provide interesting biological insights---how and why a model fails to describe variation in a data set gives us clues about what evolutionary processes may have driven trait evolution across time.
Second, I develop a new technique that leverages taxonomic information to assess and overcome sampling biases in trait data sets; such sampling biases are likely prevalent and have the potential to confound both tests of macroevolutionary and macroecological hypothesis. As an example of the utility of this method, I use it to provide the first estimate of the global distribution of woody and herbaceous plants from a database of 39,313 records and find that the world is likely much woodier than researchers thought.