01/08/2013 Biology Computer Science Mathematics Medicine
DOI: 10.1016/j.jbi.2013.05.008 SemanticScholar ID: 21782338 MAG: 2049871739

Partial least squares and logistic regression random-effects estimates for gene selection in supervised classification of gene expression data

Publication Summary

Display Omitted Gene selection is needed in supervised classification of gene expression.Respecting the characteristics of data is a key aspect to consider in gene selection.Selection based on logistic regression estimates is recommended in general.Selection based on PLS estimates is recommended when the number of samples is low.Selection based on modified t-statistics performs well on data with high variability. Our main interest in supervised classification of gene expression data is to infer whether the expressions can discriminate biological characteristics of samples. With thousands of gene expressions to consider, a gene selection has been advocated to decrease classification by including only the discriminating genes. We propose to make the gene selection based on partial least squares and logistic regression random-effects (RE) estimates before the selected genes are evaluated in classification models. We compare the selection with that based on the two-sample t-statistics, a current practice, and modified t-statistics. The results indicate that gene selection based on logistic regression RE estimates is recommended in a general situation, while the selection based on the PLS estimates is recommended when the number of samples is low. Gene selection based on the modified t-statistics performs well when the genes exhibit moderate-to-high variability with moderate group separation. Respecting the characteristics of the data is a key aspect to consider in gene selection.

CAER Authors

Avatar Image for Arief Gusnanto

Dr. Arief Gusnanto

University of Leeds

Share this

Next publication

2009 Psychology

The Dynamics of Category Conjunctions

R. Hutter, R. Crisp, G. Humphreys, Gillian. M. Waters + 1 more