Rare category detection using hierarchical mean shift Public Deposited

http://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/2j62s749x

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning that can address this issue using a "human-in-the-loop" approach. In this interactive setting, the algorithm asks the user to label a query data point under an existing category or declare the query data point to belong to a previously undiscovered category. The goal of category detection is to discover all the categories in the data in as few queries as possible. In a data set with imbalanced categories, the main challenge is in identifying the rare categories or anomalies; hence, the task is often referred to as rare category detection. We present a new approach to rare category detection using a hierarchical mean shift procedure. In our approach, a hierarchy is created by repeatedly applying mean shift with increasing bandwidth on the entire data set. This hierarchy allows us to identify anomalies in the data set at different scales, which are then posed as queries to the user. The main advantage of this methodology over existing approaches is that it does not require any knowledge of the data set properties such as the total number of classes or the prior probabilities of the classes. Results on real-world data sets show that our hierarchical mean shift approach performs consistently better than previous techniques.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Keyword
Subject
Rights Statement
Language
Replaces
Additional Information
  • description.provenance : Approved for entry into archive by Julie Kurtz(julie.kurtz@oregonstate.edu) on 2009-01-14T21:37:53Z (GMT) No. of bitstreams: 1 Thesis_pavan.pdf: 309222 bytes, checksum: ce21c4d278b867be65d4c16d020a62ab (MD5)
  • description.provenance : Submitted by Pavan Kumar Vatturi (vatturip@onid.orst.edu) on 2009-01-12T07:17:44Z No. of bitstreams: 1 Thesis_pavan.pdf: 309222 bytes, checksum: ce21c4d278b867be65d4c16d020a62ab (MD5)
  • description.provenance : Approved for entry into archive by Laura Wilson(laura.wilson@oregonstate.edu) on 2009-01-22T22:35:34Z (GMT) No. of bitstreams: 1 Thesis_pavan.pdf: 309222 bytes, checksum: ce21c4d278b867be65d4c16d020a62ab (MD5)
  • description.provenance : Made available in DSpace on 2009-01-22T22:35:35Z (GMT). No. of bitstreams: 1 Thesis_pavan.pdf: 309222 bytes, checksum: ce21c4d278b867be65d4c16d020a62ab (MD5)

Relationships

In Administrative Set:
Last modified: 08/08/2017

Downloadable Content

Download PDF
Citations:

EndNote | Zotero | Mendeley

Items