MapReduce Based Feature Selection and Classification of Microarray Dataset

Rath, Nitish Kumar (2015) MapReduce Based Feature Selection and Classification of Microarray Dataset. BTech thesis.



Gene expression profiling has emerged as an efficient technique for classification, diagnosis and treatment of various diseases. The data retrieved from microarray contains the gene expression values of the genes present in a tissue. The size of such data varies from some kilobytes to thousand of Gigabytes. Therefore, the analysis of microarray dataset in a very short period of time is essential. The major setback of microarray dataset is the presence of a large number of irrelevant information, which hinders the amount of useful information present in the dataset and results in a large number of computations. Therefore, selection of relevant genes is an important step in microarray data analysis. After retrieving the required number of features, classification of the dataset is done. In this project, various methods based on MapReduce are proposed to select the relevant number of feature. After feature selection, Naïve Bayes Classifier and N-Nearest Neighbor is used to classify the datasets. These algorithms are implemented on Hadoop framework. A comparative analysis is done on these methodologies using microarray data of different sizes

Item Type:Thesis (BTech)
Uncontrolled Keywords:Mapreduce, Distributed Computing, Hadoop, Feature Selection, Microarray
Subjects:Engineering and Technology > Computer and Information Science > Data Mining
Divisions: Engineering and Technology > Department of Computer Science
ID Code:7791
Deposited By:Mr. Sanat Kumar Behera
Deposited On:16 Sep 2016 17:48
Last Modified:16 Sep 2016 17:48
Supervisor(s):Rath, S K

Repository Staff Only: item control page