Technical Report
 

End-user feature engineering in the presence of class imbalance

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/technical_reports/4q77fs696

Descriptions

Attribute NameValues
Creator
Abstract
  • Intelligent user interfaces, such as recommender systems and email classifiers, use machine learning algorithms to customize their behavior to the preferences of an end user. Although these learning systems are somewhat reliable, they are not perfectly accurate. Traditionally, end users who need to correct these learning systems can only provide more labeled training data. In this paper, we focus on incorporating new features suggested by the end user into machine learning systems. To investigate the effects of user-generated features on accuracy we developed an auto- coding application that enables end users to assist a machine-learned program in coding a transcript by adding custom features. Our results show that adding user-generated features to the machine learning algorithm can result in modest improvements to its F1 score. Further improvements are possible if the algorithm accounts for class imbalance in the training data and deals with low-quality user-generated features that add noise to the learning algorithm. We show that addressing class imbalance improves performance to an extent but improving the quality of features brings about the most beneficial change. Finally, we discuss changes to the user interface that can help end users avoid the creation of low-quality features.
  • Keywords: Feature Engineering, Class Imbalance, machine learning, artificial intelligence, end-user programming, HCI
Resource Type
Date Available
Date Issued
Series
Rights Statement
Funding Statement (additional comments about funding)
  • This work was supported by NSF IIS-0803487 and by the EUSES Consortium via NSF CCR-0325273.
Language
Replaces

Relationships

Parents:

This work has no parents.

Items