Determining t in t-closeness using Multiple Sensitive Attributes

Roy, Debaditya (2013) Determining t in t-closeness using Multiple Sensitive Attributes. MTech thesis.

PDF
1228Kb

Abstract

Many government agencies and other organizations often need to publish microdata, e.g., medical data or census data, for research and other purposes. Typically, such data is stored in a table, and each record (row) corresponds to one individual. Each record has a number of attributes, which can be divided into the following three categories. (1) Attributes that clearly identify individuals. These are known as explicit identiers and include Social Security Number, Address, and Name, and so on. (2) Attributes whose values when taken together can potentially identify an individual. These are known as quasi-identiers (QI), and may include, e.g., Zip-code, Birthdate, and Gender. (3) Attributes that are considered sensitive, such as Disease and Salary are known as Sensitive Attributes (SA). When releasing microdata, it is necessary to prevent the sensitive information of the individuals from being disclosed. Therefore, the objective is to limit the disclosure risk to an acceptable level while maximizing the utility.This can be achieved by anonymizing the data before release.Models like k-anonymity(to prevent linkage attacks), l-diversity(to prevent skewness attacks), t-closeness(to prevent background knowledge attacks) etc. have been proposed over the years which are collectively known as Privacy Preserving Data Publishing models. Here, a novel way in determining t and applying t-closeness for multiple sensitive attributes is presented. The only information required beforehand is the partitioning classes of Sensitive Attribute(s). Since, t-closeness is an NP-Hard problem, so knowing thee value of t greatly reduces the time required for anonymizing with various values of t . The rationale of using the measure of determining t is discussed with conclusive proof and speedup achieved is also shown.

Item Type:	Thesis (MTech)
Uncontrolled Keywords:	Privacy Preserving Data Mining, Privacy Preserving Data Publishing, t-closeness, Multiple Sensitive Attributes
Subjects:	Engineering and Technology > Computer and Information Science > Data Mining
Divisions:	Engineering and Technology > Department of Computer Science
ID Code:	4842
Deposited By:	Hemanta Biswal
Deposited On:	04 Nov 2013 11:18
Last Modified:	20 Dec 2013 16:10
Supervisor(s):	Jena, S K

Repository Staff Only: item control page