Benchmarking Classification Models for Cancer Prediction from Gene Expression Data: A Novel Approach and New Findings

Geetha RAMANI, Shomona Gracia JACOB
Campus de Anna University (CEG Campus),
Guindy, Chennai, India – 600 025,
rgeetha@yahoo.com, graciarun@gmail.com

Abstract: Gene Selection from gene expression data for Cancer prediction has been an area of intensive research, aiming at identifying the minimal and optimal set of candidate genes that could generate accurate predictive performance. The two major problems encountered in this process are the high dimensionality of data with comparatively few instances and the need to categorize records under multiple classes. In this paper we propose a novel approach called Rank-Weight Feature Selection that utilizes the filtering capacity of more than one feature selection algorithm to detect the minimal set of predictive genes that generate higher predictor performance in categorizing and predicting diverse oncogenic gene expression data. The filtered features (genes) are weighted based on the number of feature relevance algorithms reporting them to be significant. The ranked genes are then used to validate the proposed method by utilizing ten classifiers over five diverse gene expression datasets. The results proved that the proposed approach generated higher predictive performance with fewer features than previously reported results with the most relevant and minimal set of genes and commend classifiers based on their accuracy and reliability in predicting cancer data.

Keywords: Cancer prediction, Gene Expression, Feature Relevance, Multi-class classification.

>>Full Text
CITE THIS PAPER AS:
R. Geetha RAMANI, Shomona Gracia JACOB, Benchmarking Classification Models for Cancer Prediction from Gene Expression Data:A Novel Approach and New Findings, Studies in Informatics and Control, ISSN 1220-1766, vol. 22 (2), pp. 133-142, 2013. https://doi.org/10.24846/v22i2y201303