Tuesday , October 23 2018

Parallelized Classification of Cancer Sub-types from Gene Expression Profiles
Using Recursive Gene Selection

Lokeswari VENKATARAMANA1, Shomona Gracia JACOB1*, Rajavel RAMADOSS2
1 Sri Sivasubramaniya Nadar College of Engineering, Department of CSE,
Kalavakkam, Chennai, 603110, India
lokeswaricts@gmail.com, graciarun@gmail.com (*Corresponding author)
2 Sri Sivasubramaniya Nadar College of Engineering, Department of ECE,
Kalavakkam, Chennai, 603110, India
rajavelr@ssn.edu.in

ABSTRACT: Cancer is a chronic disease that is caused mainly by irregularities in genes. It is important to identify such oncogenes that cause cancer. Biological data like gene expressions, protein sequences, RNA-sequences, pathway analysis, Pan-cancer analysis and structural biomarkers could aid in cancer diagnosis, classification and prognosis. This research focuses on classifying subtypes of cancer using Microarray Gene Expression (MGE) levels. Nature of MGE data is multidimensional with very few samples. It is necessary to perform dimensionality reduction to select the relevant genes and remove the redundant ones. The Recursive Feature Selection (RFS) method is proposed as it repeatedly performs the gene selection process until the best gene subset is found. The obtained best subset of genes is further employed for classification using different models and evaluated using 10-fold cross-validation. In order to scale for huge amount of gene expression data, the parallelized classification model was explored on the Spark framework. A comparison was drawn between the non-parallelized classification model on Weka and the parallelized classification model on Spark. The results revealed that the parallelized classification model performs better than non-parallelized classification model in terms of accuracy and execution time. Further, the performance of RFS and parallelized classifier was also compared with previous approaches. The proposed RFS and parallelized classifier outperformed previous methods.

KEYWORDS: Recursive Feature Selection, Gene Selection, Microarray Gene Expression, Parallelized classification, Random Forest.

>>FULL TEXT: PDF

CITE THIS PAPER AS:
Lokeswari VENKATARAMANA, Shomona Gracia JACOB*, Rajavel RAMADOSS, Parallelized Classification of Cancer Sub-types from Gene Expression Profiles Using Recursive Gene Selection, Studies in Informatics and Control, ISSN 1220-1766, vol. 27(2), pp. 213-222, 2018.

https://doi.org/10.24846/v27i2y201809