Effective Unsupervised Learning Techniques for Outlier Detection

Abhaya, . (2023) Effective Unsupervised Learning Techniques for Outlier Detection. PhD thesis.

[img]PDF (Restricted upto 14/06/2027)
Restricted to Repository staff only

2318Kb

Abstract

Unsupervised learning approaches are widely used in outlier detection domain as training data is not required for decision making process. Unsupervised approaches based on clustering, distance, density have been observed to be effective for identifying outlier instances over the last several years. Finding the structure of clusters is the main objective of the clustering-based approach. Distance-based and density-based unsupervised approaches focus on finding outliers. However, they are not very effective to find outliers in varying density datasets. This thesis addresses the issues of finding outliers in varying density datasets. An effective and efficient approach termed as Reversed Density Peak for Outlier Detection (RDPOD) which is a two phase method is proposed. In the first stage, widely used K-means clustering technique is utilized for grouping the instances of a dataset into a number of clusters. The characteristics of each instance such as density and relative distance with higher density points are exploited to detect the probable outlier instances from each obtained cluster. Finally, genuine outliers are detected based on proposed outlier factors of probable outlier instances. Recently, outlier detection using deep learning models (specially Autoencoder and Generative Adversarial Network (GAN) ) has drawn attention of researchers. Autoencoder based models obtain the abnormal instances based on the reconstruction error. However, reconstruction procedure of autoencoder based models may be contaminated in the presence of anomalous instances in dataset. Therefore, effectiveness of the model may be significantly deteriorated because of sensitivity to abnormal instances. To address the issue of reconstruction error, Self Organizing Map (SOM) and Autoencoder are exploited in another proposed outlier detection approach. Main aim of the introduced approach is to get the autoencoder learnt over only ‘normal points’. In the proposed technique, the Self Organizing Map (SOM) is intelligently utilized as clustering approach for identifying the probable outliers from each cluster and exclude them temporarily to obtain only ‘normal points’. Finally, learnt model is applied over whole dataset to find outliers instances. Generally, normal samples are very large in number compared to abnormal samples in a dataset. Generative Adversarial Networks can be exploited to achieve a balance between normal and abnormal samples. Existing outlier detection methods using GAN follow a distribution in generating fake samples. However, abnormal samples in the original data may not be from the assumed distribution. To address this issue, a method named Generative Adversarial Learning for Outlier Detection (GALOD) which generates fake samples based on the identified probable outliers is introduced in this thesis. There are three phases in our proposed model. First phase is dimensionality reduction (feature extraction) phase, where autoencoder is applied for reducing the number of features. In second phase, having applied spectral clustering, density information is utilized to find probable outlier instances considered as noisy instances. Finally, generative adversarial network (GAN) is exploited on each cluster for detecting outlier instances. Experimental results on synthetic as well as real world datasets validate the effectiveness of proposed outlier detection techniques.

Item Type:Thesis (PhD)
Uncontrolled Keywords:Outlier Detection; LOF; LDOF; Nearest Neighbor; Autoencoder; Reverse Nearest Neighbor; Generative Adversarial Network
Subjects:Engineering and Technology > Computer and Information Science > Wireless Local Area Network
Engineering and Technology > Computer and Information Science > Networks
Engineering and Technology > Computer and Information Science > Information Security
Divisions: Engineering and Technology > Department of Computer Science Engineering
ID Code:10544
Deposited By:IR Staff BPCL
Deposited On:13 Jun 2025 17:53
Last Modified:13 Jun 2025 17:53
Supervisor(s):Patra, Bidyut Kumar

Repository Staff Only: item control page