As the link between human microbiomes and health has become more established, the interest in applying statistical approaches to microbiome data to understand the mechanisms behind these links has grown. However, microbiome data is often of unmanageable size, and consequently, producing quality lower dimensional representations of samples is a significant...
In the field of machine learning, clustering and classification are two fundamental tasks. Traditionally, clustering is an unsupervised method, where no supervision about the data is available for learning; classification is a supervised task, where fully-labeled data are collected for training a classifier. In some scenarios, however, we may not...
This paper introduces an approach to text classification for semi-structured label systems that have poor performance with standard methods. With the perspective that perfect classification for such a system is unattainable, we demonstrate an automated procedure to isolate the learnable elements of the problem. Through analysis of an example dataset,...
Macrosomia is a medical term describing a new baby born with an excessive birth weight (greater than 4000g). Fetal macrosomia may lead to both pregnancy complications, and increased risk of mother's and baby's health problems after birth. But the potential complications may be mitigated by a cesarean delivery. As such,...
We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their...
Many methods have been explored in the literature of multi-label learning, ranging from simple problem transformation to more complex method that capture correlation among labels. However, mostly all existing works do not address the challenge with incomplete label data. The goal of this project is to extend the work of...
The problem of document classification has been widely studied in machine learning and data mining. In document classification, most of the popular algorithms are based on the bag-of-words representation. Due to the high dimensionality of the bag-of-words representation, significant research has been conducted to reduce the dimensionality via different approaches....