Graduate Thesis Or Dissertation
 

Modeling Human Learning of Data Quality Rules

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/41687r04s

Descriptions

Attribute NameValues
Creator
Abstract
  • The process of cleaning data is quite laborious and can be complicated by the presence of inconsistencies and other issues in the data, leading many data scientists to utilize interactive data cleaning software. However, many current state-of-the-art cleaning solutions fail to account for the reality that the human needs to explore the data to learn and as a result may periodically provide noisy feedback. Therefore, we need to understand how humans fundamentally learn data quality rules over the data and how that translates to identifying violations of these rules in the data they work with. In this thesis, we perform an empirical analysis of how humans understand and iteratively learn data quality rules, working to both quantify and model this learning process. We find that a Bayesian learning model replicates user behavior well when the model success definition is strict, while a hypothesis testing model also performs well when the success definition is loosened.
License
Resource Type
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Rights Statement
Publisher
Peer Reviewed
Language

Relationships

Parents:

This work has no parents.

In Collection:

Items