Technical Report
 

Algorithms for constructing a consensus sequence

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/technical_reports/8049gd61j

Descriptions

Attribute NameValues
Creator
Abstract
  • Biological and physical limitations require that DNA be sequenced in fragments. There are several approaches to obtain the appropriate sized fragments of DNA to sequence. The method of sequencing that we are interested in is loosely referred to as shotgun sequencing. Many copies of the genomic DNA to be sequenced are cleaved by one or more restriction endonucleases resulting in a multiset, S, of DNA fragments that are not ordered. DNA fragments are essentially selected at random from this multset and sequenced. A consensus sequence is constructed by joining together fragments which overlap. (One hopes that the consensus sequence is very close to the original sequence.) Since errors occur reading the sequences, the overlaps must be approximate, not exact. This process of reassembly is similar to the NP-complete shortest common superstring problem [GMS80]. To simplify the problem we make the following assumptions. • An integer k can be supplied that defines the minimum acceptable overlap between two sequences. • There is a unique alignment of the sequence fragments such that all suf- fix/prefix overlaps are of length k or greater. • All suffix/prefix overlaps are exact (log inexact) matches. We define the string consensus problem and give three algorithms to solve it. We then define the log inexact string consensus problem and give three algorithms to solve it. We believe that the log inexact string consensus problem is closer to the problem of constructing a consensus sequence from shotgun data that biochemists are trying to solve than the problems previous approximation algorithms for the shortest common superstring problem.
Resource Type
Date Issued
Academic Affiliation
Series
Rights Statement
Publisher
Peer Reviewed
Language

Relationships

Parents:

This work has no parents.

Items