Graduate Thesis Or Dissertation
 

Statistical enhancement of support vector machines

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/zw12z938r

Descriptions

Attribute NameValues
Creator
Abstract
  • Support Vector Machines (SVM) and Random Forests (RF) have consistently outperformed other machine learning algorithms on a variety of problems. SVM can be used for classification and regression on many types of data (e.g. nonlinear, high dimensional), but cannot handle missing or mixed data. This research implements a permutation-based variable importance measure and missing value imputation method for SVM founded on similar techniques developed for RF. The results of the SVM variable importance measure are compared to RF results on simulated data sets with known variable importance. The variability of the importance outcomes are examined when different tuning parameter values (for SVM and RF) and kernels (for SVM) are used on benchmark data sets. Two of the benchmark data sets are also used to evaluate the missing value imputation method. The variable importance measure developed in this study has comparable results to RF on the simulated data sets. However, the results have greater variability and less consistency than RF on the same benchmark data sets for the tuning parameter values investigated. SVM often had a smaller test error than RF, indicating that SVM was able to better fit the benchmark data. Unlike the RF results, the SVM variable importance results can be highly sensitive to the choice of tuning parameters. Successive grid searches are needed to tune these parameters and achieve more consistent SVM variable importance results. This research compares a median-based missing value imputation method to a mean-based approach. The quality of the methods was evaluated by comparing the test set error (or test mean-square error) achieved after application to two benchmark data sets. There is improvement on a regression data set, but no significant difference in results for a classification example. Further investigation is needed to evaluate this imputation technique. A variable importance measure for SVM provides insight into which explanatory variables are important in determining the response. SVM has been known to perform better than other machine learning algorithms on some data sets. By developing such a measure, this research has furthered the capabilities of an important algorithm used for data mining.
License
Resource Type
Date Available
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Subject
Rights Statement
Publisher
Peer Reviewed
Language
Replaces

Relationships

Parents:

This work has no parents.

In Collection:

Items