Conference Proceedings Or Journal
 

WIDIT in TREC-2006 Blog track

Público Deposited

Contenido Descargable

Descargar PDF
https://ir.library.oregonstate.edu/concern/conference_proceedings_or_journals/5q47rt47n

Descriptions

Attribute NameValues
Creator
Abstract
  • Web Information Discovery Integrated Tool (WIDIT) Laboratory at the Indiana University School of Library and Information Science participated in the Blog track’s opinion task in TREC- 2006. The goal of opinion task is to "uncover the public sentiment towards a given entity/target", which involves not only retrieving topically relevant blogs but also identifying those that contain opinions about the target. To further complicate the matter, the blog test collection contains considerable amount of noise, such as blogs with non-English content and non-blog content (e.g., advertisement, navigational text), which may misdirect retrieval systems. Based on our hypothesis that noise reduction (e.g., exclusion of non-English blogs, navigational text) will improve both on-topic and opinion retrieval performances, we explored various noise reduction approaches that can effectively eliminate the noise in blog data without inadvertently excluding valid content. After creating two separate indexes (with and without noise) to assess the noise reduction effect, we tackled the opinion blog retrieval task by breaking it down to two sequential subtasks: on-topic retrieval followed by opinion classification. Our opinion retrieval approach was to first apply traditional IR methods to retrieve on-topic blogs, and then boost the ranks of opinionated blogs based on opinion scores generated by opinion assessment methods. Our opinion module consists of Opinion Term Module, which identify opinions based on the frequency of opinion terms (i.e., terms that only occur frequently in opinion blogs), Rare Term Module, which uses uncommon/rare terms (e.g., “sooo good”) for opinion classification, IU Module, which uses IU (I and you) collocations, and Adjective-Verb Module, which uses computational linguistics’ distribution similarity approach to learn the subjective language from training data.
License
Resource Type
Fecha Disponible
Fecha de Emisión
Citation
  • Yang, K., Yu, N., Valerio, A., & Zhang, H. (2006). WIDIT in TREC 2006 Blog Track. Paper presented at the TREC.
Conference Name
Non-Academic Affiliation
Declaración de derechos
Language
Replaces

Relaciones

Parents:

This work has no parents.

En Collection:

Elementos