Walking tree methods for biological string matching Public Deposited

http://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/pg15bh229

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • Approximate string matching is commonly used to align genetic sequences (DNA or RNA) to determine their shared characteristics. In contrast with the standard dynamic programming methods which use local edit distance models, the Walking Tree heuristic method was created to handle non-local changes, e.g., translocations, inversions, and duplications, altogether and simultaneously. The Walking Tree Method approximately maximizes the global alignment scores of matched translocations and inversions and minimizes gap penalties. It is heuristic because some special cases of string alignment problems have been shown to be NP-complete, e.g., determining the minimum number of flips needed to sort a sequence. We demonstrated that it produces reasonable alignments by 1) aligning a rearranged sequence with its original, 2) using alignment scores to construct distance trees of several families of organisms to compare with existing phylogenetic trees, and 3) aligning real biological sequences or whole genomes to compare with biologically annotated regions. To construct the alignment, the original version runs in Θ (|P| * |T| * log |P|) sequential runtime, where P and T are the pattern string and the text string, respectively. The runtime was later improved to Θ (|P| * |T|) using snapshots of the tree. We call this version the "Improved Walking Tree Method". We used the "Four Russians" technique to improve it further to sub-quadratic, i.e., Θ (|P| * |T| / log|P|). We call this version the "Fast Walking Tree Method". The alignment of 8 million base pairs can be done in a week using a cluster of 65 Pentium II processors. Its sequential runtime can probably be improved further to Θ (|P| * |T| / (log|P|)²).
Resource Type
Date Available
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Academic Affiliation
Non-Academic Affiliation
Subject
Rights Statement
Publisher
Peer Reviewed
Language
Digitization Specifications
  • File scanned at 300 ppi (Monochrome, 256 Grayscale, 24 bit color) using Capture Perfect 3.0.82 on a Canon DR-9080C in PDF format. CVista PdfCompressor 4.0 was used for pdf compression and textual OCR.
Replaces
Additional Information
  • description.provenance : Approved for entry into archive by Patricia Black(patricia.black@oregonstate.edu) on 2012-07-02T17:16:19Z (GMT) No. of bitstreams: 1HsuTaiC2004.pdf: 1074821 bytes, checksum: bc808b3cb9fde3b5aaac76689c9d7dee (MD5)
  • description.provenance : Submitted by Kevin Martin (martikev@onid.orst.edu) on 2012-06-26T22:18:22ZNo. of bitstreams: 1HsuTaiC2004.pdf: 1074821 bytes, checksum: bc808b3cb9fde3b5aaac76689c9d7dee (MD5)
  • description.provenance : Made available in DSpace on 2012-08-16T21:09:55Z (GMT). No. of bitstreams: 1HsuTaiC2004.pdf: 1074821 bytes, checksum: bc808b3cb9fde3b5aaac76689c9d7dee (MD5) Previous issue date: 2003-06-20
  • description.provenance : Approved for entry into archive by Patricia Black(patricia.black@oregonstate.edu) on 2012-08-16T21:09:55Z (GMT) No. of bitstreams: 1HsuTaiC2004.pdf: 1074821 bytes, checksum: bc808b3cb9fde3b5aaac76689c9d7dee (MD5)

Relationships

Parents:

This work has no parents.

Last modified

Downloadable Content

Download PDF

Items