Technical Report
 

Improved parallel and sequential walking tree methods for biological string alignments

Öffentlich Deposited

Herunterladbarer Inhalt

PDF Herunterladen
https://ir.library.oregonstate.edu/concern/technical_reports/xd07gv26f

Descriptions

Attribute NameValues
Creator
Abstract
  • Approximate string matching is commonly used to align genetic sequences (DNA or RNA) to determine their shared characteristics. Most genetic string matching methods are based on the edit-distance model, which does not provide alignments for inversions and translocations. Recently, a heuristic called the Walking Tree Method [2, 3] has been developed to solve this problem. Unlike other heuristics, it can handle more than one level of inversion, i.e., inversions within inversions. Furthermore, it tends to capture the matched strings’ genes while other heuristics fail. There are two versions of the original walking tree heuristics: the score version gives only the alignment score, the alignment version gives both the score and the alignment mapping between the strings. The score version runs in quadratic time and uses linear space while the alignment version uses an extra log factor for time and space. In this paper, we will briefly describe the walking tree method and the original sequential and parallel algorithms. We will explain why different parallel algorithms are needed for a network of workstations rather than the original algorithm which worked well on a symmetric multi-processor. Our improved parallel method also led to a quadratic time sequential algorithm that uses less space. We give an example of our parallel method. We describe several experiments that show speedup linear in the number of processors, but eventual drop off in speedup as the communication network saturates. For big enough strings, we found linear speedup for all processors we had available. These results suggest that our improved parallel method will scale up as both the size of the problem and the number of processors increase. We include two figures that use real biological data and show that the walking tree methods can find translocations and inversions in DNA sequences and also discover unknown genes.
  • Keywords: heuristic, inversions, sequence alignment, parallel algorithm, walking tree, dynamic programming, genome alignment, translocations
Resource Type
Date Available
Date Issued
Series
Subject
Urheberrechts-Erklärung
Publisher
Peer Reviewed
Language
Replaces

Beziehungen

Parents:

This work has no parents.

Artikel