Graduate Thesis Or Dissertation
 

Efficient Comparative Sequence Analysis Algorithms for RNA Conserved Structure and Region Detection

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/gh93h6982

Descriptions

Attribute NameValues
Creator
Abstract
  • RNAs play important roles in multiple cellular processes, and many of their functions rely on folding to specific structures. To maintain their functions, secondary structures of RNA homologs are conserved across evolution. These conserved structures provide critical targets for diagnostics and therapeutics. Thus, there is a need for developing fast and accurate computational methods to identify conserved structural regions. Commonly, conserved structures involve compensating base pair changes, where two changes in the sequence across evolution that conserve a base pair. These compensating changes provide strong evidence for conserved structures. Meanwhile, they also make it harder to align sequences when structures are unknown. To solve this issue, Sankoff proposed a dynamic algorithm that simultaneously predicts structures and an alignment for two or more sequences. Several software packages provide implementations of the Sankoff algorithm. One drawback of this approach is that the Sankoff algorithm runs in O(n3k) against averaged sequence length n and k sequences. TurboFold II, an extension of TurboFold, provides a more computationally efficient method to align and fold sequences. However, it cannot scale to longer sequences due to cubic runtime against the sequence length. We propose LinearTurboFold, which can output the RNA structural alignment and identify conserved base pairs in linear time. LinearTurboFold leads to equal or better accuracies among the benchmarks. With LinearTurboFold, we explored conserved structural regions among betacoronavirus with the well-known RNA structures. Additionally, we listed novel conserved regions whose functions are not well understood. Most existing implementations of the Sankoff algorithm reduce time complexity to O(n3) by restricting the searching space, which still prevents them from being applied to long sequence. Moreover, those algorithms mainly focus on structure prediction and the alignment model is significantly simple, which results in a poor alignment output. We propose LinearSankoff, which extends the classical Sankoff algorithm with a powerful HMM-based alignment model. Based on the evaluation on RNAStrAlign dataset and comparison with benchmark methods, we can conclude that both the alignment and structure prediction accuracies benefit from the HMM-based alignment model. Regarding efficiency, LinearSankoff is a linear version of the classical Sankoff algorithm by applying beam pruning heuristic algorithm. Besides, LinearSankoff applies A∗ algorithm heuristic to accelerate searching, which leads to comparable accuracy with a small beam size.
License
Resource Type
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Academic Affiliation
Rights Statement
Publisher
Peer Reviewed
Language
Embargo reason
  • Ongoing Research
Embargo date range
  • 2022-12-09 to 2023-07-10

Relationships

Parents:

This work has no parents.

In Collection:

Items