Graduate Thesis Or Dissertation

 

Evaluating online text classification algorithms for email prediction in TaskTracer Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/p2676x89f

Descriptions

Attribute NameValues
Creator
Abstract
  • This paper examines how six online multiclass text classification algorithms perform in the domain of email tagging within the TaskTracer system. TaskTracer is a project-oriented user interface for the desktop knowledge worker. TaskTracer attempts to tag all documents, web pages, and email messages with the projects to which they are relevant. In previous work, we deployed an SVM email classifier to tag email messages. However, the SVM is a batch algorithm whose training time scales quadratically with the number of examples. The goal of the study reported in this paper was to select an online learning algorithm to replace this SVM classifier. We investigated Bernoulli Naïve Bayes, Multinomial Naïve Bayes, Transformed Weight-Normalized Complement Naïve Bayes, Term Frequency – Inverse Document Frequency counts, Online Passive Aggressive algorithms, and Linear Confidence Weighted classifiers. These methods were evaluated for their online accuracy, their sensitivity to the number and frequency of classes, and their tendency to make repeated errors. The Confidence Weighted Classifier and Bernoulli Naïve Bayes were found to perform the best. They behaved more stably than the other algorithms when handling the imbalanced classes and sparse features of email data.
License
Resource Type
Date Available
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Subject
Rights Statement
Publisher
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Approved for entry into archive by Laura Wilson(laura.wilson@oregonstate.edu) on 2009-06-17T23:16:06Z (GMT) No. of bitstreams: 1 emailPredictionThesisFinal.pdf: 961487 bytes, checksum: c401711f3ef39e38df0f9546d31bd48d (MD5)
  • description.provenance : Submitted by Victoria Keiser (baileyvi@onid.orst.edu) on 2009-06-07T03:27:52Z No. of bitstreams: 1 emailPredictionThesisFinal.pdf: 961487 bytes, checksum: c401711f3ef39e38df0f9546d31bd48d (MD5)
  • description.provenance : Made available in DSpace on 2009-06-17T23:16:06Z (GMT). No. of bitstreams: 1 emailPredictionThesisFinal.pdf: 961487 bytes, checksum: c401711f3ef39e38df0f9546d31bd48d (MD5)
  • description.provenance : Approved for entry into archive by Julie Kurtz(julie.kurtz@oregonstate.edu) on 2009-06-11T17:26:42Z (GMT) No. of bitstreams: 1 emailPredictionThesisFinal.pdf: 961487 bytes, checksum: c401711f3ef39e38df0f9546d31bd48d (MD5)

Relationships

Parents:

This work has no parents.

In Collection:

Items