Parsing with Recurrent Neural Networks Public Deposited

http://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/hm50tv954

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • Machine learning models for natural language processing have traditionally relied on large numbers of discrete features, built up from atomic categories such as word forms and part-of-speech labels, which are considered completely distinct from each other. Recently however, the advent of dense feature representations coupled with deep learning techniques has led to powerful new models which can automatically learn to exploit various dimensions of implicit similarity between such discrete linguistic entities. This work extends that line of research as it applies to syntactic parsing, particularly by introducing recurrent network models which can encode the entirety of a sentence in context and by proposing novel parsing systems to take advantage of such models. Syntactic parsing is an inherently difficult problem in natural language processing because of the ambiguous and highly compositional nature of language itself. Perfect agreement is not possible even among expert human annotators. Statistical and machine learning prediction of the syntactic structure of sentences has been the subject of decades of study. Recent advances in applying deep neural models to language problems, however, have led to rapid strides in this domain, with models which are able to automatically exploit a whole new realm of hidden regularities in language. We continue this trend with feature-learning recurrent networks to model entire sentences, which allow the parser to incorporate information from the entire sentence context when making every decision. We also introduce new parsing paradigms designed explicitly to leverage this new representational power, including a state-of-the-art transition-based constituency parser, the first ever to achieve competitive results with greedy decoding. We also introduce a straightforward dynamic oracle for the aforementioned constituency parsing system, and show that it is optimal in both label recall and precision. This is the first ever provably optimal dynamic oracle for a transition-based constituency parser. In addition to its optimality, our dynamic oracle is computable in amortized constant time per step, a dramatic improvement over its forerunners for arc-standard dependency parsing, which required worst-case cubic time per step. Extending the optimality proof for that dynamic oracle, we show the surprising result that the entire space of possible parser states for a sentence of length n can be reduced to O(n²) using a further simplified feature space. This simplification could have important future impact for search-based or globally-optimized training methods. Finally, we extend our parsing model still further, by applying it to morphologically rich languages, using continuous embeddings over previously predicted morphological features. We find that we achieve very competitive results over a range of languages de- spite no language-specific architectural or hyper-parameter tuning, including achieving the best reported parsing results on the French Treebank.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Subject
Rights Statement
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Made available in DSpace on 2016-12-21T23:32:51Z (GMT). No. of bitstreams: 1 CrossJamesH2016.pdf: 1075740 bytes, checksum: 25a2f1b3b6a95d209a78164e0c5631cb (MD5) Previous issue date: 2016-12-08
  • description.provenance : Approved for entry into archive by Julie Kurtz(julie.kurtz@oregonstate.edu) on 2016-12-14T16:13:50Z (GMT) No. of bitstreams: 1 CrossJamesH2016.pdf: 1075740 bytes, checksum: 25a2f1b3b6a95d209a78164e0c5631cb (MD5)
  • description.provenance : Submitted by James Cross (crossj@oregonstate.edu) on 2016-12-09T19:42:14Z No. of bitstreams: 1 CrossJamesH2016.pdf: 1075729 bytes, checksum: 7969905beade4917cb94c01501a1f2b9 (MD5)
  • description.provenance : Approved for entry into archive by Laura Wilson(laura.wilson@oregonstate.edu) on 2016-12-21T23:32:51Z (GMT) No. of bitstreams: 1 CrossJamesH2016.pdf: 1075740 bytes, checksum: 25a2f1b3b6a95d209a78164e0c5631cb (MD5)
  • description.provenance : Submitted by James Cross (crossj@oregonstate.edu) on 2016-12-13T21:47:46Z No. of bitstreams: 1 CrossJamesH2016.pdf: 1075740 bytes, checksum: 25a2f1b3b6a95d209a78164e0c5631cb (MD5)
  • description.provenance : Rejected by Julie Kurtz(julie.kurtz@oregonstate.edu), reason: Hi James, Rejecting to add the commencement date back on the bottom of the title page to read - Commencement June 2017 Everything else looks good and you have met the deadline for a fall diploma. Once revised, log back into ScholarsArchive and go to the upload page. Replace the attached file with the revised PDF and resubmit. Thanks, Julie on 2016-12-13T21:40:39Z (GMT)

Relationships

In Administrative Set:
Last modified: 08/16/2017

Downloadable Content

Download PDF
Citations:

EndNote | Zotero | Mendeley

Items