Abstract:
Edit distances are a well-established technique for classification problems. They have been employed successfully in many classification problems including chromosome classification and hand-written digit recognition. Virtually all machine learning algorithms represent the objects to be classified as vectors of features. However, edit distances provide only a measure of the difference between two objects--they do not provide a feature-based representation. Recently, kernel-based algorithms such as support vector machines (SVMs) have been developed. A kernel is a measure of similarity between two objects. This thesis explores various ways in which edit distances can be converted into kernels and combined with support vector machines to solve difficult pattern recognition problems. The thesis compares the performance of SVMs, the k-nearest neighbor algorithm, and the weighted k-nearest neighbor algorithm on a problem of leaf shape classification. The goal of the leaf classification problem is support content-based image retrieval from the Oregon State University Herbarium, which is a collection of plant specimens. The thesis shows that SVMs and standard k-nearest neighbor provide high classification accuracy. SVMs, with an edit distance kernel, provide the most accurate method.