Respondent-driven sampling (RDS) is used throughout the world to estimate prevalence and population size for hidden populations. Although RDS is an effective method for enrolling people from key populations in studies, it relies on a partially unknown sampling mechanism, and thus each individual’s inclusion probability is unknown. Current estimators for...
In this dissertation we present a compilation of the research conducted during the author’s doctoral program. In the first part, we discuss a case study regarding the impact of scholar-ships on student success at Oregon State University (OSU). Specifically, we look at the grad-uation and retention rates and aim to...
In this dissertation I will demonstrate a novel application of self-exciting point process models to mass shooting data. I will also introduce two adaptations to the traditional nonparametric Hawkes process modeling framework. One such modification allows for the estimation of the additional productivity introduced by an event that is not...
Analysis of observations on sequential events over time is common in real life. Sequential measurements over time describing the behavior of systems are usually called time series data, which have been collected in a wide range of disciplines. Over the years there have been multiple research areas in studying stochastic...
The recent advent of large-scale microbiome studies enabled by high-throughput sequencing calls for innovative statistical methodologies that are capable of tackling a myriad of challenges presented by microbiome data. Compelled by this need, we focus on the development of statistical tools for two types of problems that arise in microbiome...
Multiple hypothesis testing has been a popular topic in statistical research. Although vast works have been done, controlling the false discoveries remains a challenging task when the corresponding test statistics are dependent. Various methods have been proposed to estimate the false discovery proportion (FDP) under arbitrary dependence among the test...
Agent-based models (ABM) are widely used in network data analysis, and due to their simple structures and sophisticated outcomes, they serve as good tools in understanding the dynamics in networks. In this thesis, we develop an agent-based dynamic network model, and show that it can replicate the expected degree distribution...
In areas such as spatial analysis and time series analysis, it is essential to understand and quantify spatial or temporal heterogeneity. In this dissertation, we focus on a spatially varying coefficient model, in which spatial heterogeneity is accommodated by allowing the regression coefficients to vary in a given spatial domain....
Successive sampling population size estimation (SS-PSE) is a method used by government agencies and aid organizations around the world to estimate the size of hidden populations using data from respondent-driven sampling (RDS) surveys. SS-PSE ad-dresses a specific need in estimation and helps us evaluate the vulnerability of areas to HIV...
Objective: The goal of this project is to recreate aspects of the article “Comparison between convolutional neural networks and random forest for local climate zone classification in mega urban areas using Landsat images” (Yoo et al., 2019), where methods for predicting LCZ classes for four large cities throughout the world...