Outlier Detection for Categorical Data

Sharma, Khushboo (2017) Outlier Detection for Categorical Data. MTech thesis.

[img]PDF (Full text is restricted up to 18.01.2020)
Restricted to Repository staff only

345Kb

Abstract

Outlier detection or anomaly detection is a very important process to detect instances with unexpected behavior that occurs in a given system. From many years, outlier detection has gained a significant consideration due to its applications in various areas such as credit cards fraud in banking sector, illegal access in networking field, data analysis in medical field, weather prediction etc. Till now, many techniques have been developed to detect outliers. However, most existing techniques focus on numerical data and they can not be applied directly for categorical data because of the difficulty of defining a meaningful similarity measure for categorical data. Also, high dimensional categorical data impose significant challenges due to their unique data discreteness. To handle this type of data we can use entropy related measures. The concept of entropy is developed over the probabilistic explanation of data distribution which quantifies the variation or diversity of a discrete variable. For outlier detection, we applied a simple and effective ranking based algorithm based on entropy and mutual information, and we also analyzed corresponding time complexity of the algorithm. Experimental results on car evaluation data set and two other data sets demonstrate the effectiveness and efficiency of our algorithm.

Item Type:Thesis (MTech)
Uncontrolled Keywords:Outlier detection; Categorical data; Entropy; Mutual information
Subjects:Engineering and Technology > Computer and Information Science > Information Security
Divisions: Engineering and Technology > Department of Computer Science
ID Code:8836
Deposited By:Mr. Kshirod Das
Deposited On:15 Mar 2018 16:46
Last Modified:15 Mar 2018 16:46
Supervisor(s):Patra, Bidyut Kumar

Repository Staff Only: item control page