|Abstract or Summary
- This thesis summarizes work completed over the previous four years primarily focusing on chloroplast phylogenomic inquiry into the genus Pinus and related Pinaceae outgroups using next-generation sequencing on Illumina platforms. During the time of our work, Illumina sequence read lengths have essentially been limited to 25 to 100 base pairs, presenting challenges when trying to assemble genomic space featuring repetitive regions or regions divergent from established reference genomes. Our assemblies initially relied on previously constructed high quality plastome sequences for each of the two Pinus subgenera, yet we were able to show clear negative trends in assembly success as divergence from reference sequences. This was most evident in assemblies of Pinaceae outgroups, but the trend was also apparent within Pinus subgenera. To counter this problem, we used a combination of de novo and reference-guided assembly approaches, which allowed us to more effectively assemble highly divergent regions.
From a biological standpoint, our initial focus was on increasing phylogenetic resolution by using nearly complete plastome sequences from select Pinus and Pinaceae outgroup species. This effort indeed resulted in greatly increased phylogenetic resolution as evidenced by a nearly 60-fold increase in parsimony informative positions in our dataset as compared to previous datasets comprised of only several chloroplast loci. In addition, bootstrap support levels across the resulting phylogenetic tree were consistently high, with ≥95% bootstrap
support at 30/33 ingroup nodes in maximum likelihood analysis. A positive correlation between the length/amount of sequence data applied to our phylogeny and overall bootstrap support values was also supported, although trends indicated some nodes would likely remain recalcitrant even with the application of complete plastomes. This correlation was important to demonstrate, as it was reflective of trends seen in a meta-analysis of contemporary, infrageneric chloroplast-based phylogenies. In addition, our meta-analysis indicated that most researchers rely on relatively small regions of the chloroplast genome in these studies and obtain relatively little in resolution and support in resulting phylogenies. Clearly, the application of plastome sequences to these types of analyses has great potential for increasing our understanding of evolutionary relationships at low taxonomic levels.
An unexpected finding of this work involved two putative protein-coding regions in the chloroplast, ycf1 and ycf2, which featured strongly elevated rates of mutation, and together accounted for over half of exon parsimony informative sites although making up only 22% of exon sequence length. Of these two loci, clearly ycf1 was more problematic to assemble from short read data, as it featured numerous indels as well as several repetitive regions. We designed primers based on conserved regions allowing essentially complete amplification of this locus and sequenced the ycf1 locus (with Sanger technology) for a representative of each of the 11 Pinus subsections, using accessions from the previous study. Importantly, these primers were also effective across Pinaceae and should facilitate future work throughout the family.
Accessions with full ycf1 sequences were in turn utilized as subsectional references as we sequenced and assembled plastomes for most of the remaining Pinus species. To efficiently produce these sequences, we relied on a solution-based hybridization strategy developed by Richard Cronn to enrich preparations of total genomic DNA for chloroplast-specific DNA. While the phylogenetic results of a full-plastome, full-genus analysis were certainly of interest, our final focus was on the investigation of ‘noise’ in our dataset, and whether it affected phylogenetic conclusions drawn from the plastome. To determine this, we explored the removal of variable sites from our alignment and the resultant effect on topology and resolution. This allowed us to identify a window of alignment partitions in which nodal bootstrap support remained high across the genus, yet sufficient noise was removed to identify important patterns in the positioning of three clades with historically problematic phylogenetic positioning.