End-user feature engineering in the presence of class imbalance Public Deposited

http://ir.library.oregonstate.edu/concern/technical_reports/4q77fs696

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • Intelligent user interfaces, such as recommender systems and email classifiers, use machine learning algorithms to customize their behavior to the preferences of an end user. Although these learning systems are somewhat reliable, they are not perfectly accurate. Traditionally, end users who need to correct these learning systems can only provide more labeled training data. In this paper, we focus on incorporating new features suggested by the end user into machine learning systems. To investigate the effects of user-generated features on accuracy we developed an auto- coding application that enables end users to assist a machine-learned program in coding a transcript by adding custom features. Our results show that adding user-generated features to the machine learning algorithm can result in modest improvements to its F1 score. Further improvements are possible if the algorithm accounts for class imbalance in the training data and deals with low-quality user-generated features that add noise to the learning algorithm. We show that addressing class imbalance improves performance to an extent but improving the quality of features brings about the most beneficial change. Finally, we discuss changes to the user interface that can help end users avoid the creation of low-quality features.
Resource Type
Date Available
Date Issued
Series
Keyword
Table of Contents
  • Abstract Author Keywords ACM Classification Keywords Introduction related Work Methodology The AutoCoder Prototype Explaining Predictions End-User Feature Engineering Data Collection Machine Learning Algorithm Baseline Algorithm Incorporating User-Defined Features Class Imbalance Correction Over/Undersampling SMOTE Extreme SMOTE Off-line Evaluation of Classifiers Evaluation Metric Results Adding User Features and Addressing Class Imbalance Influence of Individual User Features Identifying non-predictive user-defined features Characteristic 1: Poor test data agreement Characteristic 2: Under-representation of a user-defined feature in its assigned class in test data Implications For Design Conclusion Acknowledgements References
Rights Statement
Funding Statement (additional comments about funding)
Language
Replaces
Additional Information
  • description.provenance : Approved for entry into archive by Linda Kathman(linda.kathman@oregonstate.edu) on 2009-11-02T22:44:04Z (GMT) No. of bitstreams: 1 paper209.pdf: 259524 bytes, checksum: 259877bb39a61582149dda58cf8d2d0e (MD5)
  • description.provenance : Submitted by Ian Oberst (obersti@onid.orst.edu) on 2009-11-02T22:37:44Z No. of bitstreams: 1 paper209.pdf: 259524 bytes, checksum: 259877bb39a61582149dda58cf8d2d0e (MD5)
  • description.provenance : Made available in DSpace on 2009-11-02T22:44:04Z (GMT). No. of bitstreams: 1 paper209.pdf: 259524 bytes, checksum: 259877bb39a61582149dda58cf8d2d0e (MD5)

Relationships

In Administrative Set:
Last modified: 07/18/2017

Downloadable Content

Download PDF
Citations:

EndNote | Zotero | Mendeley

Items