Abstract:
We propose a new classification method for longitudinal data based on a semiparametric approach. Our approach builds a classifier by taking advantage of modeling information between response and covariates for each class, and assigns a new subject to the class with the smallest quadratic distance. This enables one to overcome the difficulty in estimating covariance matrices as in linear discriminant analysis while still incorporate correlation into the classifier. Extensive simulation studies and real data applications show that our approach outperforms support vector machine, the logistic regression and linear discriminant analysis for continuous outcomes, and outperforms the naive Bayes classifier, decision tree and logistic regression for discrete responses.
Unlike many other generative approaches, our method is derived with distributional assumptions on the first moment, as compared to the full distribution, and provides a classifier which handles both continuous and discrete responses. Another advantage of
our classifier is that it possesses an inferential property under normality since it is built on the quadratic inference function, which is analog of minus twice log-likelihood, and the distance measure for the new classifier also follows a chi-squared distribution if it is assumed that the data follow a multivariate normal distribution. This provides a p-value interpretation as to how accurate the classification is for the new subject.