Conference Proceedings Or Journal
 

WIDIT in TREC-2006 Blog track

Öffentlich Deposited

Herunterladbarer Inhalt

PDF Herunterladen
https://ir.library.oregonstate.edu/concern/conference_proceedings_or_journals/5q47rt47n

Descriptions

Attribute NameValues
Creator
Abstract
  • Web Information Discovery Integrated Tool (WIDIT) Laboratory at the Indiana University School of Library and Information Science participated in the Blog track’s opinion task in TREC- 2006. The goal of opinion task is to "uncover the public sentiment towards a given entity/target", which involves not only retrieving topically relevant blogs but also identifying those that contain opinions about the target. To further complicate the matter, the blog test collection contains considerable amount of noise, such as blogs with non-English content and non-blog content (e.g., advertisement, navigational text), which may misdirect retrieval systems. Based on our hypothesis that noise reduction (e.g., exclusion of non-English blogs, navigational text) will improve both on-topic and opinion retrieval performances, we explored various noise reduction approaches that can effectively eliminate the noise in blog data without inadvertently excluding valid content. After creating two separate indexes (with and without noise) to assess the noise reduction effect, we tackled the opinion blog retrieval task by breaking it down to two sequential subtasks: on-topic retrieval followed by opinion classification. Our opinion retrieval approach was to first apply traditional IR methods to retrieve on-topic blogs, and then boost the ranks of opinionated blogs based on opinion scores generated by opinion assessment methods. Our opinion module consists of Opinion Term Module, which identify opinions based on the frequency of opinion terms (i.e., terms that only occur frequently in opinion blogs), Rare Term Module, which uses uncommon/rare terms (e.g., “sooo good”) for opinion classification, IU Module, which uses IU (I and you) collocations, and Adjective-Verb Module, which uses computational linguistics’ distribution similarity approach to learn the subjective language from training data.
License
Resource Type
Date Available
Date Issued
Citation
  • Yang, K., Yu, N., Valerio, A., & Zhang, H. (2006). WIDIT in TREC 2006 Blog Track. Paper presented at the TREC.
Conference Name
Non-Academic Affiliation
Urheberrechts-Erklärung
Language
Replaces

Beziehungen

Parents:

This work has no parents.

In Collection:

Artikel