Abstract:
The meaning of biological sequences is a central problem
of modern biology. Although string matching is well understood
in the edit-distance model, biological strings
with transpositions and inversions violate this model's
assumptions. To align biologically reasonable strings, we
proposed the Walking Tree Method [4,5,6,7,8], an
approximate string alignment method that can handle
insertion, deletions, substitutions, translocations, and
more than one level of inversions. Our earlier versions
were able to align whole bacterial genomes (~1 Mbps)
and discover and verify genes. As extremely long
sequences can now be deciphered rapidly and accurately
without amplification [2,3,15], speeding up the method
becomes necessary. Via a technique that we call it
"recurrence reduction" in which same computations can
be looked up rather than recomputed, we are able to
significantly improve the performance, e.g., 400% for 1-
million base pair alignments.