Humans are remarkably efficient in learning by interacting with other people and observing their behavior. Children learn by watching their parents’ actions and mimic their behavior. When they are not sure about their parents demonstration, they communicate with them, ask questions, and learn from their feedback. On the other hand,...
Due to recent advances in computer technology, the cost of collecting and storing data has dropped drastically. This makes it feasible to collect large amounts of information for each data point. This increasing trend in feature dimensionality justifies the need for research on variable selection. Random forest (RF) has demonstrated...
Diverse scientific fields collect multiple time series data to investigate the dynamical behavior of complex systems: atmospheric and climate science, geophysics, neuroscience, epidemiology, ecology, and environmental science. Identifying patterns of mutual dependence among such data generates valuable knowledge that can be applied either for inferential or forecasting purposes. Vector autoregressive...
Anomaly detection is the task of identifying observations (points) that differ from the majority of other points, which requires some measure of difference, or distance. Many anomaly detection methods rely on “implicit distance” measures: rather than directly calculating an explicitly defined distance, these approaches quantify a point’s “abnormality” by examining...
Randomized trials are the gold standard for the clinical assessment of a new treatment
compared to a placebo or standard of care. Often in clinical trials, patients are
accrued sequentially rather than all at once. Thus, the data from such a trial becomes
available sequentially to the researcher. Monitoring and...
Bayesian optimization (BO) aims to optimize costly-to-evaluate functions by running a limited number of experiments that each evaluates the function at a selected input. Typical BO formulations assume that experiments are selected sequentially, or in fixed batches. Moreover, these experiments can be executed immediately upon request and have the same...
For spatial data visualization, we approach two problems and provide solutions: heat map resolution selection, and heat map confidence interval presentation. Analysts often present spatial data in gridded heat maps, at some chosen resolution. However, many data types vary in density across the domain. We develop variable-resolution heat maps to...
Turbines at wind projects pose a threat to birds and bats flying at altitudes within the rotor swept area. These animals die from colliding with the turbine blades. Estimating mortality, or the total number of bird or bat fatalities at a wind project, is critical to understanding environmental impacts of...
Differential expression (DE) analysis allows us to identify genes that respond differently under varying experimental conditions, therefore granting us an understanding of the molecular basis of phenotypic variation. Following the identification of significantly differentially expressed genes, it is often of the researcher's interest to elucidate hidden patterns or groups in...
Differential expression (DE) analysis is a key task in gene expression study, because it uncovers the association between expression levels of a gene and the covariates of interest. This dissertation pertains to two particular aspects of DE analysis—identifying stably expressed genes for count normalization and accounting for correlation between DE...