Article
 

Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/articles/bk128b744

Descriptions

Attribute NameValues
Creator
Abstract
  • WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology.
  • Keywords: Resource, Biology, Databases, Annotations, Ontology, Caenorhabditis elegans, Information, Community
Resource Type
DOI
Date Available
Date Issued
Citation
  • Van Auken, K., Muller, H., Chisholm, R., Huala, E., Sternberg, P. W., Fey, P., . . . WormBase Consortium. (2012). Text mining in the biocuration workflow: Applications for literature curation at WormBase, dictyBase and TAIR. Database : The Journal of Biological Databases and Curation, 2012, bas040. doi:10.1093/database/bas040
Journal Title
Journal Volume
  • 2012
Journal Issue/Number
  • bas040
Academic Affiliation
Rights Statement
Funding Statement (additional comments about funding)
  • The US National Human Genome Research Institute (HG02223 to WormBase) and the British Medical Research Council (G070119 to WormBase); The US National Human Genome Research Institute (HG004090 to Textpresso); The US National Institute of Health (GM64426 to dictyBase); The National Science Foundation (DBI-0850219 to TAIR], with additional support from TAIR sponsors (http://www. arabidopsis.org/doc/about/tair_sponsors/413); The National Science Foundation (0822201 to The Plant Ontology); US National Human Genome Research Institute (HG002273 to The Gene Ontology Consortium). PWS is an investigator with the Howard Hughes Medical Institute. Funding for open access charge: US National Human Genome Research Institute [Grant no. HG002273].
Publisher
Peer Reviewed
Language
Replaces

Relationships

Parents:

This work has no parents.

Items