Title: Investigating Approaches for Privacy Preserving Distributed K-Means Clustering in Semi-honest and Malicious Adversary Model

Ph.D. Supervisor: Dr. D C Jinwala, Professor, Department of Computer Engineering, S V National Institute of Technology, Surat.

Thesis Abstract:

Data mining tools allow extraction of potentially useful information from a collection of data. Due to the increasing need of distributed databases in business environment, the need for distributed data mining becomes imperative. However, not all the participants in a joint activity of data mining would share their data due to privacy concerns. Hence, incorporating privacy-preserving mechanism into the existing distributed data mining tools becomes imperative. Our focus is on the cryptography based approach of Privacy Preserving Data Mining (PPDM); owing to its higher level of privacy.

As per our observations, there is a scope of improvements in the existing cryptography based approaches in terms of computational and communication overheads. In addition, the security in the existing cryptography based approaches relies on the assumption of non-colluding (trusted) and semi-honest parties. Therefore, we propose algorithms to improve the state-of-the-art in cryptography based approach for PPDM using clustering.

Firstly, we propose privacy preserving clustering algorithms for horizontally and vertically partitioned datasets using Shamir's secret sharing scheme. Our approach is computationally faster than the existing approaches, collusion resistant and avoids the trusted third party.

Secondly, we incorporate the security in the malicious model in our secret sharing based approach by utilizing zero knowledge proof and verifiable secret sharing scheme.

Thirdly, we propose a homomorphic encryption based approach for PPDM that utilizes Elliptic Curve Cryptography (ECC). We show a reduction in communication overheads from O (n2) (in secret sharing based approach) to O(n) (in ECC based approach).

Subsequently, we observe that existing approaches that implement secure add and compare protocol for PPDM based on Yao's garbled circuit incur higher computational and communication cost. Therefore, finally, we propose a novel approach to privacy preserving distributed K-Means clustering using a Fully Homomorphic Encryption(FHE) that improves the cost of Yao's approach.