Information about named entities (real-world objects) is usually harvested from different sources and organized as a multiple relational directed graph in Knowledge Bases (KBs). KBs play essential roles in many NLP modules including question answering, fact-checking, search engines, etc. KBs are big but still incomplete: relational information among entities is...
Natural Language Comprehension is a challenging domain of Natural Language Processing. To improve a model’s language comprehension/understanding, one approach would be to enrich the structure of the model to enhance its capability in learning the latent rules of the language.
In this dissertation, we will first introduce several deep models...
The advent of deep learning models leads to a substantial improvement in a wide range of NLP tasks, achieving state-of-art performances without any hand-crafted features. However, training deep models requires a massive amount of labeled data. Labeling new data as a new task or domain emerges consumes time and efforts...
Automatic event extraction from natural text is an important and challenging task for natural language understanding. Traditional event detection methods heavily rely on manually engineered rich features. Recent deep learning approaches alleviate this problem by automatic feature engineering. But such efforts, like tradition methods, have so far only focused on...
In bioacoustics, automatic animal voice detection and recognition from audio recordings is an emerging topic for animal preservation. Our research focuses on bird bioacoustics, where the goal is to segment bird syllables from the recording and predict the bird species for the syllables. Traditional methods for this task addresses the...
Bioacoustics analysis can be used to conduct environmental monitoring by detecting the presence of birds species. This analysis usually involves identifying the species from their calls. In most frameworks, bird song syllables are extracted from audio recordings and individual syllables are input to a classifier to identify the species. Extraction...
Easy-first, a search-based structured prediction approach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scoring function) to make easy decisions first, which constrains the remaining decisions and makes them easier. This thesis studies the problem of learning...
Semi-supervised clustering aims to improve clustering performance by considering user supervision in the form of pairwise constraints. In this paper, we study the active learning problem of selecting pairwise must-link and cannot-link constraints for semisupervised clustering. We consider active learning in an iterative manner where in each iteration queries are...
Bayesian Optimization (BO) methods are often used to optimize an unknown function f(•) that is costly to evaluate. They typically work in an iterative manner. In each iteration, given a set of observation points, BO algorithms select k ≥ 1 points to be evaluated. The results of those points are...
Recent work in machine learning concerns the detection and identification of bird species from audio recordings of their vocalizations. Such analysis can yield valuable ecological information concerning the activity and distribution of species in the wild. Current species-identification methods require individual syllables of bird audio as input, but field-collected audio...
The performance of deep learning frameworks could be significantly improved through considering the particular underlying structures for each dataset. In this thesis, I summarize our three work about boosting the performance of deep learning models through leveraging structures of the data. In the first work, we theoretically justify that, for...
In open set recognition, a classifier must label instances of known classes while detecting instances of unknown classes not encountered during training. To detect unknown classes while still generalizing to new instances of existing classes, this thesis introduces a dataset augmentation technique called counterfactual image generation. This approach, based on...
As the link between human microbiomes and health has become more established, the interest in applying statistical approaches to microbiome data to understand the mechanisms behind these links has grown. However, microbiome data is often of unmanageable size, and consequently, producing quality lower dimensional representations of samples is a significant...
In the field of machine learning, clustering and classification are two fundamental tasks. Traditionally, clustering is an unsupervised method, where no supervision about the data is available for learning; classification is a supervised task, where fully-labeled data are collected for training a classifier. In some scenarios, however, we may not...
This paper introduces an approach to text classification for semi-structured label systems that have poor performance with standard methods. With the perspective that perfect classification for such a system is unattainable, we demonstrate an automated procedure to isolate the learnable elements of the problem. Through analysis of an example dataset,...
We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their...
Many methods have been explored in the literature of multi-label learning, ranging from simple problem transformation to more complex method that capture correlation among labels. However, mostly all existing works do not address the challenge with incomplete label data. The goal of this project is to extend the work of...
The problem of document classification has been widely studied in machine learning and data mining. In document classification, most of the popular algorithms are based on the bag-of-words representation. Due to the high dimensionality of the bag-of-words representation, significant research has been conducted to reduce the dimensionality via different approaches....
:
_____________________________________________________________________
XiaoliFern
The problem of document classification has been widely studied in machine
Macrosomia is a medical term describing a new baby born with an excessive birth weight (greater than 4000g). Fetal macrosomia may lead to both pregnancy complications, and increased risk of mother's and baby's health problems after birth. But the potential complications may be mitigated by a cesarean delivery. As such,...