A study of distance-based machine learning algorithms Public Deposited

http://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/zw12z7835

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • Distance-based algorithms are machine learning algorithms that classify queries by computing distances between these queries and a number of internally stored exemplars. Exemplars that are closest to the query have the largest in uence on the classi cation assigned to the query. Two speci c distance-based algorithms, the nearest neighbor algorithm and the nearest-hyperrectangle algorithm, are studied in detail. It is shown that the k-nearest neighbor algorithm (kNN) outperforms the rst- nearest neighbor algorithm only under certain conditions. Data sets must contain moderate amounts of noise. Training examples from the di erent classes must belong to clusters that allow an increase in the value of k without reaching into clusters of other classes. Methods for choosing the value of k for kNN are investigated. It is shown that one-fold cross-validation on a restricted number of values for k su ces for best performance. It is also shown that for best performance the votes of the k-nearest neighbors of a query should be weighted in inverse proportion to their distances from the query. Principal component analysis is shown to reduce the number of relevant dimen- sions substantially in several domains. Two methods for learning feature weights for a weighted Euclidean distance metric are proposed. These methods improve the performance of kNN and NN in a variety of domains. The nearest-hyperrectangle algorithm (NGE) is found to give predictions that are substantially inferior to those given by kNN in a variety of domains. Experiments performed to understand this inferior performance led to the discovery of several improvements to NGE. Foremost of these is BNGE, a batch algorithm that avoids construction of overlapping hyperrectangles from di erent classes. Although it is generally superior to NGE, BNGE is still signi cantly inferior to kNN in a variety of domains. Hence, a hybrid algorithm (KBNGE), that uses BNGE in parts of the input space that can be represented by a single hyperrectangle and kNN otherwise, is introduced. The primary contributions of this dissertation are (a) several improvements to existing distance-based algorithms, (b) several new distance-based algorithms, and (c) an experimentally supported understanding of the conditions under which various distance-based algorithms are likely to give good performance.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Academic Affiliation
Non-Academic Affiliation
Subject
Rights Statement
Language
Replaces
Additional Information
  • description.provenance : Approved for entry into archive by Linda Kathman(linda.kathman@oregonstate.edu) on 2009-02-02T19:00:54Z (GMT) No. of bitstreams: 1 Dietrich_Wettscherech.pdf: 771532 bytes, checksum: 0d02af6d935f0b7b3eeef1f7219e10e1 (MD5)
  • description.provenance : Approved for entry into archive by Linda Kathman(linda.kathman@oregonstate.edu) on 2009-02-02T19:02:57Z (GMT) No. of bitstreams: 1 Dietrich_Wettscherech.pdf: 771532 bytes, checksum: 0d02af6d935f0b7b3eeef1f7219e10e1 (MD5)
  • description.provenance : Submitted by Philip Vue (vuep@onid.orst.edu) on 2009-01-29T22:23:39Z No. of bitstreams: 1 Dietrich_Wettscherech.pdf: 771532 bytes, checksum: 0d02af6d935f0b7b3eeef1f7219e10e1 (MD5)
  • description.provenance : Made available in DSpace on 2009-02-02T19:02:58Z (GMT). No. of bitstreams: 1 Dietrich_Wettscherech.pdf: 771532 bytes, checksum: 0d02af6d935f0b7b3eeef1f7219e10e1 (MD5)

Relationships

In Administrative Set:
Last modified: 08/08/2017

Downloadable Content

Download PDF
Citations:

EndNote | Zotero | Mendeley

Items