Graduate Thesis Or Dissertation


Localized Variable Selection with Random Forest Public Deposited

Downloadable Content

Download PDF


Attribute NameValues
  • Due to recent advances in computer technology, the cost of collecting and storing data has dropped drastically. This makes it feasible to collect large amounts of information for each data point. This increasing trend in feature dimensionality justifies the need for research on variable selection. Random forest (RF) has demonstrated the ability to select important variables and model complex data. However, simulations confirm that it fails in detecting less influential features in presence of variables with large impacts in some cases. In this dissertation, we propose two algorithms for localized variable selection: clustering based feature selection (CBFS) and locally adjusted feature importance (LAFI). Both methods aim to find regions where the effects of weaker features can be isolated and measured. CBFS combines RF variable selection with a two-stage clustering method to detect variables where their effect can be detected only in certain regions. LAFI, on the other hand, uses a binary tree approach to split data into bins based on response variable rankings, and implements RF to find important variables in each bin. Larger LAFI is assigned to variables that get selected in more bins. Simulations and real datasets are used to evaluate these variable selection methods. Finally, we also propose an extension to CBFS for localized prediction.
Resource Type
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Committee Member
Academic Affiliation
Rights Statement
Peer Reviewed
Embargo reason
  • Ongoing Research
Embargo date range
  • 2019-02-24 to 2019-09-25



This work has no parents.

In Collection: