Improved text classification through label clustering Public Deposited

http://ir.library.oregonstate.edu/concern/undergraduate_thesis_or_projects/zw12z723q

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • This paper introduces an approach to text classification for semi-structured label systems that have poor performance with standard methods. With the perspective that perfect classification for such a system is unattainable, we demonstrate an automated procedure to isolate the learnable elements of the problem. Through analysis of an example dataset, we identify attributes of the label system that hinder performance and demonstrate through manual methods that minimizing these attributes will lead to improved performance. Further we present that label clustering effectively minimizes these attributes. We then show that with a combination of frequency, co-occurrence, and document similarity we are able to construct label clusters in an automated fashion. Finally, we demonstrate that by using label clusters we are able to improve classification performance without excessively limiting the label space.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Advisor
Non-Academic Affiliation
Keyword
Rights Statement
Language
Replaces
Additional Information
  • description.provenance : Made available in DSpace on 2015-06-10T13:31:08Z (GMT). No. of bitstreams: 2 license_rdf: 1379 bytes, checksum: da3654ba11642cda39be2b66af335aae (MD5) final-draft.pdf: 3608203 bytes, checksum: 8c4f3403cf559b4b0d1225547569a2c7 (MD5)
  • description.provenance : Approved for entry into archive by Patricia Black(patricia.black@oregonstate.edu) on 2015-06-10T13:31:08Z (GMT) No. of bitstreams: 2 license_rdf: 1379 bytes, checksum: da3654ba11642cda39be2b66af335aae (MD5) final-draft.pdf: 3608203 bytes, checksum: 8c4f3403cf559b4b0d1225547569a2c7 (MD5)
  • description.provenance : Submitted by Christopher Vanderschuere (vandersc@onid.orst.edu) on 2015-06-09T20:44:44Z No. of bitstreams: 2 license_rdf: 1379 bytes, checksum: da3654ba11642cda39be2b66af335aae (MD5) final-draft.pdf: 3608203 bytes, checksum: 8c4f3403cf559b4b0d1225547569a2c7 (MD5)

Relationships

In Administrative Set:
Last modified: 07/27/2017

Downloadable Content

Download PDF
Citations:

EndNote | Zotero | Mendeley

Items