Bioinformatics Advance Access originally published online on September 27, 2005
Bioinformatics 2005 21(23):4280-4288; doi:10.1093/bioinformatics/bti685
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data
1Division of Biostatistics, School of Public Health, University of Minnesota Minneapolis, MN 55455, USA
2Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota St Paul, MN 55108, USA
*To whom correspondence should be addressed.
Motivation: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods.
Results: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method.
Contact: yangxie{at}biostat.umn.ed
Received on June 30, 2005; revised on September 2, 2005; accepted on September 20, 2005
This article has been cited by other articles:
![]() |
Y. Xie Comments on 'On correcting the overestimation of the permutation-based false discovery rate estimator' Bioinformatics, October 15, 2008; 24(20): 2420 - 2420. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Zhang, C. Yao, Z. Guo, J. Zou, L. Zhang, H. Xiao, D. Wang, D. Yang, X. Gong, J. Zhu, et al. Apparently low reproducibility of true differential expression discoveries in microarray studies Bioinformatics, September 15, 2008; 24(18): 2057 - 2063. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Jiao and S. Zhang On correcting the overestimation of the permutation-based false discovery rate estimator Bioinformatics, August 1, 2008; 24(15): 1655 - 1661. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Qin, T. Feng, S. A. Harding, C.-J. Tsai, and S. Zhang An efficient method to identify differentially expressed genes in microarray experiments Bioinformatics, July 15, 2008; 24(14): 1583 - 1589. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Xu and X. Cui Robustified MANOVA with applications in detecting differentially expressed genes from oligonucleotide arrays Bioinformatics, April 15, 2008; 24(8): 1056 - 1062. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Cordero, M. Botta, and R. A. Calogero Microarray data analysis and mining approaches Brief Funct Genomic Proteomic, January 22, 2008; (2008) elm034v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. W. Li, M. J. Meyer, Curtis. P. Van Tassell, T. S. Sonstegard, E. E. Connor, M. E. Van Amburgh, Y. R. Boisclair, and A. V. Capuco Identification of estrogen-responsive genes in the parenchyma and fat pad of the bovine mammary gland by microarray analysis Physiol Genomics, January 12, 2007; 27(1): 42 - 53. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Yang and G. Churchill Estimating p-values in small microarray experiments Bioinformatics, January 1, 2007; 23(1): 38 - 43. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. S. Mehta, S. O. Zakharkin, G. L. Gadbury, and D. B. Allison Epistemological issues in omics and high-dimensional biology: give the people what they want Physiol Genomics, December 13, 2006; 28(1): 24 - 32. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rajicic, D. M. Finkelstein, D. A. Schoenfeld, and the Inflammation Host Response to Injury Research Survival analysis of longitudinal microarrays Bioinformatics, November 1, 2006; 22(21): 2643 - 2649. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Cui, J. Affourtit, K. R. Shockley, Y. Woo, and G. A. Churchill Inheritance Patterns of Transcript Levels in F1 Hybrid Mice Genetics, October 1, 2006; 174(2): 627 - 637. [Abstract] [Full Text] [PDF] |
||||
![]() |
G.J. McLachlan, R.W. Bean, and L. B.-T. Jones A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays Bioinformatics, July 1, 2006; 22(13): 1608 - 1615. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Gao Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments Bioinformatics, June 15, 2006; 22(12): 1486 - 1494. [Abstract] [Full Text] [PDF] |
||||



