Abstract:
Sequential supervised learning problems involve assigning a class label to each item in a sequence. Examples include part-of-speech tagging and text-to-speech mapping. A very general-purpose strategy for solving such problems is to construct a recurrent sliding window (R.SW) classifier, which maps some window of the input sequence plus some number of previously-predicted items into a prediction for the next item in the sequence. Tins paper describes a general-purpose implementation of RSW classifiers and discusses time highly practical issue of how to choose the size of time input window and the number of previous predictions to incorporate. Experiments on two real-world domains show that the optimal choices vary from one learning algorithm to another. They also depend on the evaluation criterion (number of correctly-predicted items versus number of correctly-predicted whole sequences). We conclude that window sizes must be chosen by cross-validation. The results have implications for the choice of window sizes for other models including hidden Markov models and conditional random fields.