Abstract:
Conditional Random Fields (CRFs; Lafferty, McCallum, & Pereira, 2001) provide a flexible
and powerful model for learning to assign labels to elements of sequences in such applications
as part-of-speech tagging, text-to-speech mapping, protein and DNA sequence analysis, and
information extraction from web pages. However, existing learning algorithms are slow, particularly
in problems with large numbers of potential input features. This paper describes a
new method for training CRFs by applying Friedman’s (1999) gradient tree boosting method.
In tree boosting, the CRF potential functions are represented as weighted sums of regression
trees. Regression trees are learned by stage-wise optimizations similar to Adaboost, but with
the objective of maximizing the conditional likelihood P(Y|X) of the CRF model. By growing
regression trees, interactions among features are introduced only as needed, so although the
parameter space is potentially immense, the search algorithm does not explicitly consider the
large space. As a result, gradient tree boosting scales linearly in the order of the Markov model
and in the order of the feature interactions, rather than exponentially like previous algorithms
based on iterative scaling and gradient descent.