dc.description.abstract | Measurement of gene expression using microarray has been an extremely important research tool in biology and medicine. However, poor reproducibility of array-based results remains a long-standing issue. Although the cause for the problem has not been firmly identified, platform design and test site have been ruled out in a large-scale study by the MicroArray Quality Control project. In such measurements, prehybridization error (biological variance, or BV) introduced during sample processing (e.g. culture and treatment) and platform-specific sample preparation, and inherent random error of the technology (technical variance, or TV) are coupled and difficult to quantify separately. Increasing evidence points to BV as the primary cause but lack of a method for assessing BV keeps the experimentalist in constant doubt of data reliability. Here, we developed a procedure, Measuring Improper Sample Handling (MISH), as a solution for the problem and produced a computer package for its implementation. MISH is novel, all-statistics procedure and does not require normalization. For demonstration, we applied MISH to study the BV in 350 public data sets. Part of the result may be taken as a characterization of BV of the Affymetrix GeneChip Human Genome U133 Plus 2.0 Array platform. We found that BV was the dominant error in the data sets studied and that, for data sets from biological replicates, sample processing introduced the most error. Our analysis showed that a large number of public cohort data sets had low sensitivity on contrasts, which may well explain why studies on same diseases yielded highly dissimilar lists of DEGs. This suggests that the reproducibility issue will remain a concern for measurements based on next-generation sequencing, and on any future technology that does not focus on improvement in sample processing.
| en_US |