Methods for cost-sensitive learning Public Deposited

http://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/qv33s0560

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • Many approaches for achieving intelligent behavior of automated (computer) systems involve components that learn from past experience. This dissertation studies computational methods for learning from examples, for classification and for decision making, when the decisions have different non-zero costs associated with them. Many practical applications of learning algorithms, including transaction monitoring, fraud detection, intrusion detection, and medical diagnosis, have such non-uniform costs, and there is a great need for new methods that can handle them. This dissertation discusses two approaches to cost-sensitive classification: input data weighting and conditional density estimation. The first method assigns a weight to each training example in order to force the learning algorithm (which is otherwise unchanged) to pay more attention to examples with higher misclassification costs. The dissertation discusses several different weighting methods and concludes that a method that gives higher weight to examples from rarer classes works quite well. Another algorithm that gave good results was a wrapper method that applies Powell's gradient-free algorithm to optimize the input weights. The second approach to cost-sensitive classification is conditional density estimation. In this approach, the output of the learning algorithm is a classifier that estimates, for a new data point, the probability that it belongs to each of the classes. These probability estimates can be combined with a cost matrix to make decisions that minimize the expected cost. The dissertation presents a new algorithm, bagged lazy option trees (B-LOTs), that gives better probability estimates than any previous method based on decision trees. In order to evaluate cost-sensitive classification methods, appropriate statistical methods are needed. The dissertation presents two new statistical procedures: BLOTs provides a confidence interval on the expected cost of a classifier, and BDELTACOST provides a confidence interval on the difference in expected costs of two classifiers. These methods are applied to a large set of experimental studies to evaluate and compare the cost-sensitive methods presented in this dissertation. Finally, the dissertation describes the application of the B-LOTs to a problem of predicting the stability of river channels. In this study, B-LOTs were shown to be superior to other methods in cases where the classes have very different frequencies a situation that arises frequently in cost-sensitive classification problems.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Academic Affiliation
Non-Academic Affiliation
Subject
Rights Statement
Language
Digitization Specifications
  • PDF derivative scanned at 300 ppi (256 Grayscale + 265 b+w), using Capture Perfect 3.0.82, on a Canon DR-9080C. CVista PdfCompressor 4.0 was used for pdf compression and textual OCR.
Replaces
Additional Information
  • description.provenance : Approved for entry into archive by Linda Kathman(linda.kathman@oregonstate.edu) on 2009-06-11T21:46:19Z (GMT) No. of bitstreams: 1 Margineantu_Dragos_D_2002.pdf: 1911158 bytes, checksum: 19bc20dd41bfd8dac28a6a7ce4037fac (MD5)
  • description.provenance : Approved for entry into archive by Linda Kathman(linda.kathman@oregonstate.edu) on 2009-06-11T21:45:18Z (GMT) No. of bitstreams: 1 Margineantu_Dragos_D_2002.pdf: 1911158 bytes, checksum: 19bc20dd41bfd8dac28a6a7ce4037fac (MD5)
  • description.provenance : Made available in DSpace on 2009-06-11T21:46:19Z (GMT). No. of bitstreams: 1 Margineantu_Dragos_D_2002.pdf: 1911158 bytes, checksum: 19bc20dd41bfd8dac28a6a7ce4037fac (MD5)
  • description.provenance : Submitted by Anna Opoien (aoscanner@gmail.com) on 2009-06-11T18:23:36Z No. of bitstreams: 1 Margineantu_Dragos_D_2002.pdf: 1911158 bytes, checksum: 19bc20dd41bfd8dac28a6a7ce4037fac (MD5)

Relationships

Parents:

This work has no parents.

Last modified

Downloadable Content

Download PDF

Items