code4lib Conference 2006, February 16, 2006 : morning session

Permanent citation URL: http://hdl.handle.net/1957/2946
Title:code4lib Conference 2006, February 16, 2006 : morning session
Other Titles:Generating recommendatons in OPACS : initial results and open areas for exploration
Library text mining
Anatomy of aDORe
Authors:Whitney, Colleen
Sanderson, Robert
Chute, Robert
Keywords:aDORe Archive
Digital storage
Online catalogs
Recommendation systems
Text mining
Information retrieval systems
Data mining
Issue Date:16-Feb-2006
Series/Report no.:code4lib Conference 2006
Abstract:Generating recommendations in OPACS: initial results and open areas for exploration : In the context of a research and prototyping project, the California Digital Library is using catalog content indexed in XTF, along with over 9 million historical circulation transaction records and other external data, to generate recommendations for an academic audience. Early results are promising. This talk will focus on methods, challenges, and plans for further development. -- Library Text Mining : Using the TeraGrid1 and the SRB DataGrid2, we have sufficient computational and storage facilities to run normally prohibitively expensive processing tasks. By integrating text and data mining tools3[4] within the Cheshire35 information architecture, we can parse the natural language present in 20 million MARC records (the University of California's MELVYL collection) and extract information to provide to search/retrieve applications. In this talk, we'll discuss the results of applying new techniques to "old" data. -- Anatomy of aDORe : The aDORe Archive is a write-once/read-many storage approach for Digital Objects and their constituent datastreams. First, XML-based representations of multiple Digital Objects are concatenated into a single, valid XML file named an XMLtape. Second, ARC files, as introduced by the Internet Archive, are used to contain the constituent datastreams of the Digital Objects. The software was developed by the LANL Digital Library Research & Prototyping Team and is available under GNU LGPL license.
Description:Presentation given on the second day of the code4lib Conference held Feb. 15-17, 2006, at LaSells Stewart Center, Oregon State University, Corvallis, Oregon.
URI:http://hdl.handle.net/1957/2946
Appears in Collections:code4lib Conference 2006

Items in ScholarsArchive are protected by copyright, with all rights reserved, unless otherwise indicated.