A Study on Pattern Classification of Bioinformatics Datasets

Jain, Yogesh Kumar (2012) A Study on Pattern Classification of Bioinformatics Datasets. BTech thesis.



Pattern Classification is a supervised technique in which the patterns are organized into groups of pattern sharing the same set of properties. Classification involves the use of techniques including applied mathematics, informatics, statistics, computer science and artificial intelligence to solve the classification problem at the attribute level and return to an output space of two or more than two classes. Probabilistic Neural Networks(PNN) is an effective neural network in the field of pattern classification. It uses training and testing data samples to build a model. However, the network becomes very complex and difficult to handle when there are large numbers of training data samples. Many other approaches like K-Nearest Neighbour (KNN) algorithms have been implemented so far to improve the performance accuracy and the convergence rate. K-Nearest Neighbour is a supervised classification scheme in which we select a subset from our whole dataset and that is used to classify the samples. Then we select a classified dataset subset and that is used to classify the training dataset. The Computation cost becomes too expensive when we have a larger dataset. Then we use genetic algorithm to design a classifier. Here we use genetic algorithm to divide the samples into different class boundaries by the help of different lines. After each generation we get the accuracy of our algorithm then we continue till we get our desired accuracy or our desired number of generation. In this project, a comparative study of Probabilistic Neural Network, K-Nearest Neighbour and Genetic Algorithm as a Classifier is done. We have tested these different algorithms using instances from lung cancer dataset, Libra Movement dataset, Parkinson dataset and Iris dataset (taken from the UCI repository and then normalized). The efficiency of the three techniques are compared on the basis of the performance accuracy on the test data, convergence time and on the implementation complexity.

Item Type:Thesis (BTech)
Uncontrolled Keywords:KNN,PNN,PCA,GA
Subjects:Engineering and Technology > Computer and Information Science > Data Mining
Divisions: Engineering and Technology > Department of Computer Science
ID Code:3751
Deposited By:Mr Yogesh Kumar Jain
Deposited On:06 Jun 2012 11:29
Last Modified:06 Jun 2012 11:29
Supervisor(s):Rath, S K

Repository Staff Only: item control page