Technical Report
 

Walking tree heuristics for biological string alignment, gene location, and phylogenies

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/technical_reports/wd375x462

Descriptions

Attribute NameValues
Creator
Abstract
  • Basic biological information is stored in strings of nucleic acids (DNA, RNA) or amino acids (proteins). Teasing out the meaning of these strings is a central problem of modern biology. Matching and aligning strings brings out their shared characteristics. Although string matching is well-understood in the edit-distance model, biological strings with transpositions and inversions violate this model's assumptions. We propose a family of heuristics called walking trees to align biologically reasonable strings. Both edit-distance and walking tree methods can locate specifi c genes within a large string when the genes' sequences are given. When we attempt to match whole strings, the walking tree matches most genes, while the edit-distance method fails. We also give examples in which the walking tree matches substrings even if they have been moved or inverted. The edit-distance method was not designed to handle these problems. We include an example in which the walking tree "discovered" a gene. Calculating scores for whole genome matches gives a method for approximating evolutionary distance. We show two evolutionary trees for the picornaviruses which were computed by the walking tree heuristic. Both of these trees show great similarity to previously constructed trees. The point of this demonstration is that WHOLE genomes can be matched and distances calculated. The first tree was created on a Sequent parallel computer and demonstrates that the walking tree heuristic can be effeciently parallelized. The second tree was created using a network of work stations and demonstrates that there is suffient parallelism in the phylogenetic tree calculation that the sequential walking tree can be used effectively on a network.
  • Keywords: translocations, Picornavirus, sequence alignment, genome alignment, phylogeny, heuristic, swaps, inversions, walking tree, dynamic programming
  • Keywords: translocations, Picornavirus, sequence alignment, genome alignment, phylogeny, heuristic, swaps, inversions, walking tree, dynamic programming
Resource Type
Date Available
Date Issued
Series
Subject
Rights Statement
Publisher
Peer Reviewed
Language
Replaces

Relationships

Parents:

This work has no parents.

Items