WIDIT in TREC-2006 Blog track Public Deposited

http://ir.library.oregonstate.edu/concern/defaults/5q47rt47n

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • Web Information Discovery Integrated Tool (WIDIT) Laboratory at the Indiana University School of Library and Information Science participated in the Blog track’s opinion task in TREC- 2006. The goal of opinion task is to "uncover the public sentiment towards a given entity/target", which involves not only retrieving topically relevant blogs but also identifying those that contain opinions about the target. To further complicate the matter, the blog test collection contains considerable amount of noise, such as blogs with non-English content and non-blog content (e.g., advertisement, navigational text), which may misdirect retrieval systems. Based on our hypothesis that noise reduction (e.g., exclusion of non-English blogs, navigational text) will improve both on-topic and opinion retrieval performances, we explored various noise reduction approaches that can effectively eliminate the noise in blog data without inadvertently excluding valid content. After creating two separate indexes (with and without noise) to assess the noise reduction effect, we tackled the opinion blog retrieval task by breaking it down to two sequential subtasks: on-topic retrieval followed by opinion classification. Our opinion retrieval approach was to first apply traditional IR methods to retrieve on-topic blogs, and then boost the ranks of opinionated blogs based on opinion scores generated by opinion assessment methods. Our opinion module consists of Opinion Term Module, which identify opinions based on the frequency of opinion terms (i.e., terms that only occur frequently in opinion blogs), Rare Term Module, which uses uncommon/rare terms (e.g., “sooo good”) for opinion classification, IU Module, which uses IU (I and you) collocations, and Adjective-Verb Module, which uses computational linguistics’ distribution similarity approach to learn the subjective language from training data.
License
Resource Type
Date Available
Date Issued
Citation
  • Yang, K., Yu, N., Valerio, A., & Zhang, H. (2006). WIDIT in TREC 2006 Blog Track. Paper presented at the TREC.
Non-Academic Affiliation
Rights Statement
Peer Reviewed
Replaces
Additional Information
  • description.provenance : Made available in DSpace on 2016-04-13T21:46:33Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: bb87e2fb4674c76d0d2e9ed07fbb9c86 (MD5) indianau.blog.final.pdf: 305385 bytes, checksum: 5f8ea0ce38241d758f9d332f7e95f24c (MD5) Previous issue date: 2006
  • description.provenance : Approved for entry into archive by Michael Boock(michael.boock@oregonstate.edu) on 2016-04-13T21:46:33Z (GMT) No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: bb87e2fb4674c76d0d2e9ed07fbb9c86 (MD5) indianau.blog.final.pdf: 305385 bytes, checksum: 5f8ea0ce38241d758f9d332f7e95f24c (MD5)
  • description.provenance : Submitted by Hui Zhang (hui.zhang@oregonstate.edu) on 2016-04-13T19:40:25Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: bb87e2fb4674c76d0d2e9ed07fbb9c86 (MD5) indianau.blog.final.pdf: 305385 bytes, checksum: 5f8ea0ce38241d758f9d332f7e95f24c (MD5)

Relationships

In Administrative Set:
Last modified: 10/26/2017

Downloadable Content

Download PDF
Citations:

EndNote | Zotero | Mendeley

Items