Thakur, Pooja Singh (2017) Outlier Detection on Streaming Data using Local Outlier and Clustering Techniques. MTech thesis.
PDF (Full text is restricted up to 18.01.2020) Restricted to Repository staff only 439Kb |
Abstract
In recent years, advances in hardware technology have facilitated new ways of collecting data continuously. In many applications such as network monitoring, the volume of such data is so large that it may be difficult to store the data on disk. Furthermore, even when the data can be stored, the volume of the incoming data may be so large that it may be impossible to process any particular record more than once. Therefore, many data mining and database operations such as classification, clustering, frequent pattern mining and indexing become significantly more challenging in this context. To tackle this challenge, such data is treated as a sequence of objects called data stream. A data stream is a huge volume of data coming as an unlimited sequence, where typically recent data objects are more important than older ones, and thus should participate more. Outliers detection is a common operation in data mining processes. The performance of a learning algorithm is affected by the presence of outliers. Outlier detection, i.e. the detection of data objects with abnormal behavior, has gained its importance in the past years because of its varied applications such as Fraud detection, network security, public health and so on. In this work, we present an analysis of different outlier detection schemes on streaming data and also propose a hybrid approach to detect outliers. Outlier detection in streaming data signifies detecting the most exceptional objects among the incoming data. In the state-of-the-art, distance based anomaly detection in data streams have issues in the detection process. So, at first, we analyze the previous outlier detection methods on streaming data, evaluate them and find their disadvantages. Then, we propose to combine two outlier detection techniques, one approach is k nearest neighbour which is based on clustering techniques and another approach is local outlier factor which is based on density-based techniques are used to reduce the false positives and improve the overall performance of the outlier detection methodology. This is done by removing the dependency of the parameters used in the state-of-the-art. The proposed framework for detecting algorithm is validated on three different datasets.
Item Type: | Thesis (MTech) |
---|---|
Uncontrolled Keywords: | Streaming data; Outlier Detection; Clustering |
Subjects: | Engineering and Technology > Computer and Information Science > Information Security |
Divisions: | Engineering and Technology > Department of Computer Science |
ID Code: | 8838 |
Deposited By: | Mr. Kshirod Das |
Deposited On: | 15 Mar 2018 16:54 |
Last Modified: | 15 Mar 2018 16:54 |
Supervisor(s): | Patra, Bidyut Kumar |
Repository Staff Only: item control page