Classication and Clustering Using Intelligent Techniques: Application to Microarray Cancer Data

Bai, Anita (2013) Classication and Clustering Using Intelligent Techniques: Application to Microarray Cancer Data. MTech thesis.

[img]PDF
1428Kb

Abstract

Analysis and interpretation of DNA Microarray data is a fundamental task in bioinformatics. Feature Extraction plays a critical role in better performance of the classifier. We address the dimension reduction of DNA features in which relevant features are extracted among thousands of irrelevant ones through dimensionality reduction. This enhances the speed and accuracy of the classifiers. Principal Component Analysis is a technique used for feature extraction which helps to retrieve intrinsic information from high dimensional data in eigen spaces to solve the curse of dimensionality problem. Neural Networks and Support Vector Machine are implemented on reduced data set and their performances are measured in terms of predictive accuracy, specificity, and sensitivity. Next, we propose a Multiobjective Genetic Algorithm-based fuzzy clustering technique using real coded encoding of cluster centers for clustering and classification. This technique is implemented on microarray cancer data to select training data using multiobjective genetic algorithm with non-dominated sorting. The two objective functions for this multiobjective techniques are optimization of cluster compactness as well as separation simultaneously. This approach identifies the solution. Support Vector Machine classifier is further trained by the selected training points which have high confidence value. Then remaining points are classified by trained SVM classifier. Finally, the four clustering label vectors through majority voting ensemble are combined. The performance of the proposed MOGA-SVM, classification and clustering method has been compared to MOGA-BP, SVM, BP. The performance are measured in terms of Silhoutte Index, ARI Index respectively. The experiment were carried on three public domain cancer data sets, viz., Ovarian, Colon and Leukemia cancer.

Item Type:Thesis (MTech)
Uncontrolled Keywords:Cancer Classification; Feature Reduction; Multiobjective genetic algorithm; Neural Network; Principal components; Support Vector Machine
Subjects:Engineering and Technology > Computer and Information Science > Data Mining
Divisions: Engineering and Technology > Department of Computer Science
ID Code:4748
Deposited By:Hemanta Biswal
Deposited On:31 Oct 2013 09:12
Last Modified:20 Dec 2013 11:28
Supervisor(s):Rath, S K

Repository Staff Only: item control page