The problem of document classification has been widely studied in machine learning and data mining. In document classification, most of the popular algorithms are based on the bag-of-words representation. Due to the high dimensionality of the bag-of-words representation, significant research has been conducted to reduce the dimensionality via different approaches....
In this work, I examine the problem of understanding American football in video. In particular, I present several mid-level computer vision algorithms that each accomplish a different sub-task within a larger system for annotating, interpreting, and analyzing collections of American football video. The analysis of football video is useful in...
It is common practice in the unsupervised anomaly detection literature to create experimental benchmarks by sampling from existing supervised learning datasets. We seek to improve this practice by identifying four dimensions important to real-world anomaly detection applications --- point difficulty, clusteredness of anomalies, relevance of features, and relative frequency of...
Accessing information on the Web has become ingrained into our daily lives, and we seek information from many different sources, including conference and journal publications, personal web pages, and others. Increasingly, web-based information retrieval systems such as web-based search engines, library on-line catalog systems, and subscription-based federated search systems are...
In the field of machine learning, clustering and classification are two fundamental tasks. Traditionally, clustering is an unsupervised method, where no supervision about the data is available for learning; classification is a supervised task, where fully-labeled data are collected for training a classifier. In some scenarios, however, we may not...
Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute...
The thesis focuses on activity recognition from sensor data, which has spurred a great deal of interest due to its impact on health care and security. Previous work on activity recognition from multivariate time series data has mainly applied supervised learning techniques which require a high degree of annotation effort...
Although machine learning systems are often effective in real-world applications, there are situations in which they can be even better when provided with some degree of end user feedback. This is especially true when the machine learning system needs to customize itself to the end user's preferences, such as in...
Direct red Solid Oxide Fuel Cell (SOFC) Turbine hybrid plants have the potential to dramatically increase power plant efficiency, decrease emissions, and provide fast response to transient loads. The US Department of Energy's (DOE) Hybrid Performance Project is an experimental hybrid SOFC plant, built at the National
Energy Technology Laboratory...
The advent of deep learning models leads to a substantial improvement in a wide range of NLP tasks, achieving state-of-art performances without any hand-crafted features. However, training deep models requires a massive amount of labeled data. Labeling new data as a new task or domain emerges consumes time and efforts...