Anomaly Detection Meta-Analysis Benchmarks Public Deposited

http://ir.library.oregonstate.edu/concern/datasets/47429f155

[Suggested Citation: Emmott, Andrew; Das, Shubhomoy; Dietterich, Thomas; Fern, Alan; Wong, Weng-Keen (2016): Anomaly Detection Meta-Analysis Benchmarks. Oregon State University Libraries & Press. Dataset.  http://dx.doi.org/10.7267/N97H1GGX]

See the related paper: Emmott A, Das S, Dietterich T, Fern A, Wong WK (2016). A meta-analysis of the anomaly detection problem. arXiv:1503.01158v2

This dataset is hosted at:  https://oregonstate.app.box.com/s/5txnlc6z7aifsptu415vrxc03bhhm40x

Benchmarks are derived from several data sets found at the UC Irvine Machine Learning Repository:  https://archive.ics.uci.edu/ml/index.html

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • This article provides a thorough meta-analysis of the anomaly detection problem. To accomplish this we first identify approaches to benchmarking anomaly detection algorithms across the literature and produce a large corpus of anomaly detection benchmarks that vary in their construction across several dimensions we deem important to real-world applications: (a) point difficulty, (b) relative frequency of anomalies, (c) clusteredness of anomalies, and (d) relevance of features. We apply a representative set of anomaly detection algorithms to this corpus, yielding a very large collection of experimental results. We analyze these results to understand many phenomena observed in previous work. First we observe the effects of experimental design on experimental results. Second, results are evaluated with two metrics, ROC Area Under the Curve and Average Precision. We employ statistical hypothesis testing to demonstrate the value (or lack thereof) of our benchmarks. We then offer several approaches to summarizing our experimental results, drawing several conclusions about the impact of our methodology as well as the strengths and weaknesses of some algorithms. Last, we compare results against a trivial solution as an alternate means of normalizing the reported performance of algorithms. The intended contributions of this article are many; in addition to providing a large publicly-available corpus of anomaly detection benchmarks, we provide an ontology for describing anomaly detection contexts, a methodology for controlling various aspects of benchmark creation, guidelines for future experimental design and a discussion of the many potential pitfalls of trying to measure success in this field. Link to the dataset can be found below.
License
Resource Type
Date Available
Date Issued
Citation
  • Emmott, Andrew; Das, Shubhomoy; Dietterich, Thomas; Fern, Alan; Wong, Weng-Keen (2016): Anomaly Detection Meta-Analysis Benchmarks. Oregon State University Libraries & Press. Dataset. http://dx.doi.org/10.7267/N97H1GGX
Keyword
Rights Statement
Funding Statement (additional comments about funding)
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Approved for entry into archive by Steven Van Tuyl(steve.vantuyl@oregonstate.edu) on 2016-06-10T15:33:15Z (GMT) No. of bitstreams: 2 license_rdf: 1089 bytes, checksum: 0a703d871bf062c5fdc7850b1496693b (MD5) Dummy file for Anomaly Detection Dataset.txt: 139 bytes, checksum: 6bca6669865c0c345b8b8811b41df054 (MD5)
  • description.provenance : Submitted by Steven Van Tuyl (steve.vantuyl@oregonstate.edu) on 2016-06-10T15:31:27Z No. of bitstreams: 2 license_rdf: 1089 bytes, checksum: 0a703d871bf062c5fdc7850b1496693b (MD5) Dummy file for Anomaly Detection Dataset.txt: 139 bytes, checksum: 6bca6669865c0c345b8b8811b41df054 (MD5)
  • description.provenance : Made available in DSpace on 2016-06-10T15:33:15Z (GMT). No. of bitstreams: 2 license_rdf: 1089 bytes, checksum: 0a703d871bf062c5fdc7850b1496693b (MD5) Dummy file for Anomaly Detection Dataset.txt: 139 bytes, checksum: 6bca6669865c0c345b8b8811b41df054 (MD5)

Relationships

In Administrative Set:
Last modified: 10/10/2017 Default
Citations:

EndNote | Zotero | Mendeley

Items