A probabilistic framework and algorithms for modeling and analyzing multi-instance data Public Deposited

http://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/f7623g11r

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • Multi-instance data, in which each object (e.g., a document) is a collection of instances (e.g., word), are widespread in machine learning, signal processing, computer vision, bioinformatic, music, and social sciences. Existing probabilistic models, e.g., latent Dirichlet allocation (LDA), probabilistic latent semantic indexing (pLSI), and discrete component analysis (DCA), have been developed for modeling and analyzing multiinstance data. Such models introduce a generative process for multi-instance data which includes a low dimensional latent structure. While such models offer a great freedom in capturing the natural structure in the data, their inference may present challenges. For example, the sensitivity in choosing the hyper-parameters in such models, requires careful inference (e.g., through cross-validation) which results in large computational complexity. The inference for fully Bayesian models which contain no hyper-parameters often involves slowly converging sampling methods. In this work, we develop approaches for addressing such challenges and further enhancing the utility of such models. This dissertation demonstrates a unified convex framework for probabilistic modeling of multi-instance data. The three main aspects of the proposed framework are as follows. First, joint regularization is incorporated into multiple density estimation to simultaneously learn the structure of the distribution space and infer each distribution. Second, a novel confidence constraints framework is used to facilitate a tuning-free approach to control the amount of regularization required for the joint multiple density estimation with theoretical guarantees on correct structure recovery. Third, we formulate the problem using a convex framework and propose efficient optimization algorithms to solve it. This work addresses the unique challenges associated with both discrete and continuous domains. In the discrete domain we propose a confidence-constrained rank minimization (CRM) to recover the exact number of topics in topic models with theoretical guarantees on recovery probability and mean squared error of the estimation. We provide a computationally efficient optimization algorithm for the problem to further the applicability of the proposed framework to large real world datasets. In the continuous domain, we propose to use the maximum entropy (MaxEnt) framework for multi-instance datasets. In this approach, bags of instances are represented as distributions using the principle of MaxEnt. We learn basis functions which span the space of distributions for jointly regularized density estimation. The basis functions are analogous to topics in a topic model. We validate the efficiency of the proposed framework in the discrete and continuous domains by extensive set of experiments on synthetic datasets as well as on real world image and text datasets and compare the results with state-of-the-art algorithms.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Keyword
Subject
Rights Statement
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Submitted by Behrouz Behmardi (behmardb@onid.orst.edu) on 2012-12-14T20:51:50Z No. of bitstreams: 1 AProbabilisticFrameworkandAlgorithmsforModelingandAnalyzingMulti-instanceData.pdf: 2693256 bytes, checksum: 99ad3b156899fa6aed1a12a73d781ca3 (MD5)
  • description.provenance : Made available in DSpace on 2012-12-14T22:03:43Z (GMT). No. of bitstreams: 1 AProbabilisticFrameworkandAlgorithmsforModelingandAnalyzingMulti-instanceData.pdf: 2693256 bytes, checksum: 99ad3b156899fa6aed1a12a73d781ca3 (MD5) Previous issue date: 2012-11-28
  • description.provenance : Approved for entry into archive by Laura Wilson(laura.wilson@oregonstate.edu) on 2012-12-14T22:03:43Z (GMT) No. of bitstreams: 1 AProbabilisticFrameworkandAlgorithmsforModelingandAnalyzingMulti-instanceData.pdf: 2693256 bytes, checksum: 99ad3b156899fa6aed1a12a73d781ca3 (MD5)
  • description.provenance : Approved for entry into archive by Julie Kurtz(julie.kurtz@oregonstate.edu) on 2012-12-14T21:20:20Z (GMT) No. of bitstreams: 1 AProbabilisticFrameworkandAlgorithmsforModelingandAnalyzingMulti-instanceData.pdf: 2693256 bytes, checksum: 99ad3b156899fa6aed1a12a73d781ca3 (MD5)

Relationships

In Administrative Set:
Last modified: 08/09/2017

Downloadable Content

Download PDF
Citations:

EndNote | Zotero | Mendeley

Items