This thesis studies the problem of structured prediction (SP), where the agent needs to predict a structured output for a given structured input (e.g., Part-of-Speech tagging sequence for an input sentence). Many important applications including machine translation in natural language processing (NLP) and image interpretation in computer vision can be naturally formulated as structured prediction problems. These prediction problems have an exponentially number of candidate outputs, which poses signiﬁcant challenges for inference and learning. Search-based structured prediction views the prediction of structured outputs as a search process in the space of candidate outputs guided by a learned scoring function. Search-based approaches offer several advantages including, incorporation of expressive representations over inputs and outputs with negligible overhead in inference complexity, and providing a way to trade off accuracy for efﬁcient inference. In this thesis, I make three contributions to advance search-based structured prediction methods towards the goal of improving their accuracy and computational-efﬁciency.
First, I developed a search-based learning approach called “Prune-and-Score” to improve the accuracy of greedy policy based structured prediction for search spaces with large action spaces. The key idea is to learn a pruning function that prunes bad decisions and a scoring function that then selects the best among the remaining decisions. I show the efﬁcacy of this approach for coreference resolution, i.e., clustering mentions that refer to the same entity, which is a hard NLP task.
Second, I studied multi-task structured prediction (MTSP) problems in the context of entity analysis, which includes the three tasks of named entity recognition, co-reference resolution, and entity linking. I developed three different search-based learning architectures for MTSP problem that make different trade-offs between speed and accuracy of training and inference. I performed empirical evaluation of the proposed architectures for entity analysis with state-of-the-art results.
Finally, I developed the HC-Nets framework by integrating the advances in deep learning with search-based SP methods. I formulate structured prediction as a complete output space search problem, and employ two neural network functions to guide the search: a heuristic function H for generating candidate outputs and a cost function C for selecting the best output from the generated candidates. Unlike prior methods such as structured prediction energy networks and deep value networks that perform gradient-based inference in relaxed space, HC-Nets allows to incorporate prior knowledge in the form of constraints. The decomposition of heuristic and cost functions makes the representation more expressive and learning more modular. Our experimental results show that HC-Nets achieves better performance than prior methods on multiple benchmark domains.