Study of clustering algorithms for Gene expression analysis

Guntur, Sunil Babu (2007) Study of clustering algorithms for Gene expression analysis. MTech thesis.

[img]
Preview
PDF
362Kb

Abstract

Data Mining refers to as \the nontrivial process of identifying valid, novel, potentially useful and ultimately understandable pattern in data". Based on the type of knowledge that is mined, data mining can be classi¯ed in to di®erent models such as Clustering, Decision trees, Association rules, and Sequential pattern and time series. In this thesis work, an attempt has been made to study theoretical background and applications of Clustering techniques in data mining with a special emphasis on analysis of Gene Expression under Bioinformatics. Bioinformatics is the study of genetic and other biological information using computer and statistical techniques. DNA microarray technology has now made it possible to simul- taneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. A °ood of data means that many of the challenges in biology are now challenges in computing. A ¯rst step toward addressing this challenging is the use of clustering technique, which is essential in the data mining process to reveal natural structures and identifying interesting patterns in the underlying data. In this thesis work, e®ort has been made to compare between few Clustering algorithms such as: K means, Hierarchical, Self Organization Map(SOM), and Cluster A±nity Search Technique(CAST) with proposed algorithm called CAST+. Strengths and Weaknesses of the above Clustering algorithms are identi¯ed and drawbacks like knowing number of clusters before clustering, and taking a±nity threshold as input from the users are recti¯ed by the proposed algorithm. Results show that Proposed Algorithm is e±cient in comparison with other Clustering algorithms mentioned above. The Clustering algorithms are compared on the basis of few Evaluation Indices such as Homogeneity Vs separation, and Silhouette width.

Item Type:Thesis (MTech)
Uncontrolled Keywords:Clustering algorithms, DNA, CAST
Subjects:Engineering and Technology > Computer and Information Science
Divisions: Engineering and Technology > Department of Computer Science
ID Code:4355
Deposited By:Hemanta Biswal
Deposited On:11 Jul 2012 17:01
Last Modified:11 Jul 2012 17:01
Supervisor(s):Rath, S K

Repository Staff Only: item control page