Article

 

Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/articles/p8418p08h

Descriptions

Attribute NameValues
Creator
Abstract
  • When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
License
Resource Type
DOI
Date Available
Date Issued
Citation
  • Mi G, Di Y, Emerson S, Cumbie JS, Chang JH (2012) Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression. PLoS ONE 7(10): e46128. doi:10.1371/journal.pone.0046128
Journal Title
Journal Volume
  • 7
Journal Issue/Number
  • 10
Academic Affiliation
Non-Academic Affiliation
Rights Statement
Funding Statement (additional comments about funding)
  • Work of YD, SE and JHC is supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award NumberR01GM104977. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.Work in the Chang lab is also supported by the National Research Initiative Competitive Grants Program Grant no. 2008-35600-04691 and Agriculture and FoodResearch Initiative Competitive Grants Program Grant no. 2011-67019-30192 from the USDA National Institute of Food and Agriculture. The funders had no role instudy design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Submitted by Deborah Campbell (deborah.campbell@oregonstate.edu) on 2013-02-13T16:41:33Z No. of bitstreams: 3 license_rdf: 22765 bytes, checksum: 56265f5776a16a05899187d30899c530 (MD5) license_text: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) MiGuStatisticsLengthBiasCorrection.pdf: 524581 bytes, checksum: 9f397caf09a40ed782d80f656a15d5f4 (MD5)
  • description.provenance : Made available in DSpace on 2013-02-13T17:48:50Z (GMT). No. of bitstreams: 3 license_rdf: 22765 bytes, checksum: 56265f5776a16a05899187d30899c530 (MD5) license_text: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) MiGuStatisticsLengthBiasCorrection.pdf: 524581 bytes, checksum: 9f397caf09a40ed782d80f656a15d5f4 (MD5) Previous issue date: 2012-10-02
  • description.provenance : Approved for entry into archive by Deborah Campbell(deborah.campbell@oregonstate.edu) on 2013-02-13T17:48:50Z (GMT) No. of bitstreams: 3 license_rdf: 22765 bytes, checksum: 56265f5776a16a05899187d30899c530 (MD5) license_text: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) MiGuStatisticsLengthBiasCorrection.pdf: 524581 bytes, checksum: 9f397caf09a40ed782d80f656a15d5f4 (MD5)

Relationships

Parents:

This work has no parents.

In Collection:

Items