The AIDA paper introduces an efficient and secure distributed K-Means algorithm

The AIDA project consortium has published a paper at the 19th Symposium on Intelligent Data Analysis (IDA 2021), which took place online, between April 26 and 28.

The article is entitled “Efficient Privacy Preserving Distributed K-Means for Non-IID Data” and it introduces an efficient and secure distributed K-Means algorithm, that is robust to non-IID data. The base idea of our proposal consists in each client computing the K-Means algorithm locally, with a variable number of clusters. The server will use the resultant centroids to apply the K-Means algorithm again, discovering the global centroids. To maintain the client’s privacy, homomorphic encryption and secure aggregation is used in the process of learning the global centroids. This algorithm is efficient and reduces transmission costs, since only the local centroids are used to find the global centroids. In our experimental evaluation, we demonstrate that our strategy achieves a similar performance to the centralized version even in cases where the data follows an extreme non-IID form.

The authors of this paper are André Brandão, Ricardo Mendes, and João Vilela, from INESC TEC.