Article

 

Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/articles/7m01bn45m

Additional files are available online at:  http://www.gigasciencejournal.com/content/4/1/28/additional

To the best of our knowledge, one or more authors of this paper were federal employees when contributing to this work. This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by BioMed Central. The published article can be found at:  http://www.gigasciencejournal.com/

Descriptions

Attribute NameValues
Creator
Abstract
  • Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km² ). LAGOS includes two modules: LAGOS[subscript]GEO , with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOS[subscript]LIMNO , with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
Resource Type
DOI
Date Available
Date Issued
Citation
  • Soranno, P. A., Bissell, E. G., Cheruvelil, K. S., Christel, S. T., Collins, S. M., Fergus, C. E., ... & Scott, C. E. (2015). Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse. GigaScience, 4(1), 28. doi:10.1186/s13742-015-0067-4
Series
Keyword
Rights Statement
Funding Statement (additional comments about funding)
  • Support was provided by the National Science Foundation MacroSystems Biology Program in the Emerging Frontiers Division of the Biological Sciences Directorate (EF-1065786 to MSU, EF-1065649 to ISU, and EF-1065818 to UW) and the USDA National Institute of Food and Agriculture, Hatch project 176820. KEW also thanks the STRIVE Programme (2011-W-FS-7) from the Environmental Protection Agency, Ireland, for support. We thank the numerous federal and state agency professionals, tribal agency personnel, non-profit agency personnel, and university principal investigators for providing both public and personal datasets to the LAGOSLIMNO database (listed in Additional file 17). We also thank the national lake survey programs, NSF Long Term Ecological Research (LTER) programs, and citizen monitoring programs that provided data to LAGOS. This is Great Lakes Environmental Research Laboratory contribution number 1371.
Publisher
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Approved for entry into archive by Patricia Black(patricia.black@oregonstate.edu) on 2016-01-06T18:16:45Z (GMT) No. of bitstreams: 2 license_rdf: 1370 bytes, checksum: cd1af5ab51bcc7a5280cf305303530e9 (MD5) HenryEmilyExtensionServiceBuildingMultiScaled.pdf: 3624116 bytes, checksum: 0ac2842e6194c2ed941c0e48ca3a76be (MD5)
  • description.provenance : Made available in DSpace on 2016-01-06T18:16:45Z (GMT). No. of bitstreams: 2 license_rdf: 1370 bytes, checksum: cd1af5ab51bcc7a5280cf305303530e9 (MD5) HenryEmilyExtensionServiceBuildingMultiScaled.pdf: 3624116 bytes, checksum: 0ac2842e6194c2ed941c0e48ca3a76be (MD5) Previous issue date: 2015-07-01
  • description.provenance : Submitted by Patricia Black (patricia.black@oregonstate.edu) on 2016-01-06T18:15:58Z No. of bitstreams: 2 license_rdf: 1370 bytes, checksum: cd1af5ab51bcc7a5280cf305303530e9 (MD5) HenryEmilyExtensionServiceBuildingMultiScaled.pdf: 3624116 bytes, checksum: 0ac2842e6194c2ed941c0e48ca3a76be (MD5)

Relationships

Parents:

This work has no parents.

Items