It is common practice in the unsupervised anomaly detection literature to create experimental benchmarks by sampling from existing supervised learning datasets. We seek to improve this practice by identifying four dimensions important to real-world anomaly detection applications --- point difficulty, clusteredness of anomalies, relevance of features, and relative frequency of...