EmersonSarahStatisticsDetectingDifferentialGene.pdf Public Deposited

Downloadable Content

Download PDF


Attribute NameValues
  • In many disease settings, it is likely that only a subset of the disease population will exhibit certain genetic or phenotypic differences from the healthy population. Therefore, when seeking to identify genes or other explanatory factors that might be related to the disease state, we might expect a mixture distribution of the variable of interest in the disease group. A number of methods have been proposed for performing tests to identify situations for which only a subgroup of samples or patients exhibit differential expression levels. Our discussion here focuses on how inattention to standard statistical theory can lead to approaches that exhibit some serious drawbacks. We present and discuss several approaches motivated by theoretical derivations and compare to an ad hoc approach based upon identification of outliers. We find that the outlier-sum statistic proposed by Tibshirani and Hastie offers little benefit over a t-test even in the most idealized scenarios and suffers from a number of limitations including difficulty of calibration, lack of robustness to underlying distributions, high false positive rates owing to its asymmetric treatment of groups, and poor power or discriminatory ability under many alternatives.
Rights Statement