Graduate Thesis Or Dissertation
 

Walking tree methods for biological string matching

公开 Deposited

可下载的内容

下载PDF文件
https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/pg15bh229

Descriptions

Attribute NameValues
Creator
Abstract
  • Approximate string matching is commonly used to align genetic sequences (DNA or RNA) to determine their shared characteristics. In contrast with the standard dynamic programming methods which use local edit distance models, the Walking Tree heuristic method was created to handle non-local changes, e.g., translocations, inversions, and duplications, altogether and simultaneously. The Walking Tree Method approximately maximizes the global alignment scores of matched translocations and inversions and minimizes gap penalties. It is heuristic because some special cases of string alignment problems have been shown to be NP-complete, e.g., determining the minimum number of flips needed to sort a sequence. We demonstrated that it produces reasonable alignments by 1) aligning a rearranged sequence with its original, 2) using alignment scores to construct distance trees of several families of organisms to compare with existing phylogenetic trees, and 3) aligning real biological sequences or whole genomes to compare with biologically annotated regions. To construct the alignment, the original version runs in Θ (|P| * |T| * log |P|) sequential runtime, where P and T are the pattern string and the text string, respectively. The runtime was later improved to Θ (|P| * |T|) using snapshots of the tree. We call this version the "Improved Walking Tree Method". We used the "Four Russians" technique to improve it further to sub-quadratic, i.e., Θ (|P| * |T| / log|P|). We call this version the "Fast Walking Tree Method". The alignment of 8 million base pairs can be done in a week using a cluster of 65 Pentium II processors. Its sequential runtime can probably be improved further to Θ (|P| * |T| / (log|P|)²).
Resource Type
Date Available
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Academic Affiliation
Non-Academic Affiliation
Subject
权利声明
Publisher
Peer Reviewed
Language
Digitization Specifications
  • File scanned at 300 ppi (Monochrome, 256 Grayscale, 24 bit color) using Capture Perfect 3.0.82 on a Canon DR-9080C in PDF format. CVista PdfCompressor 4.0 was used for pdf compression and textual OCR.
Replaces

关联

Parents:

This work has no parents.

属于 Collection:

单件