Time: 2025-06-16 | Counts: |
LI S Y, XING Y M, XU R.Convolutional autoencoder deep clustering for preserving the local structure of data[J]. Journal of Henan Polytechnic University(Natural Science), doi:10.16186/j.cnki.1673-9787.2025010018.
doi: 10.16186/j.cnki.1673-9787.2025010018
Received:2025-01-12
Revised:2025-03-25
Online:2025-06-16
Convolutional autoencoder deep clustering for preserving the local structure of data(Online)
Li Shunyong1,2, Xing Yuman1, Xu Rui1
(1. School of Mathematics and Statistics, Shanxi University, Taiyuan 030006, Shanxi, China; 2. Key Laboratory of Complex Systems and Data Science of Ministry of Education, Shanxi University, Taiyuan 030006, Shanxi, China)
Abstract: Objectives Deep clustering algorithms are increasingly surpassing traditional clustering methods in improving clustering performance. This study aims to propose a novel deep clustering algorithm to efficiently handle high-dimensional data and explore its underlying manifold structure, thereby enhancing clustering quality. Methods A convolutional autoencoder deep clustering algorithm for preserving the local structure of data (CADC) was proposed in this paper. A convolutional autoencoder was used to learn low-dimensional embedding representations of the original high-dimensional data, with the local structure of the data being preserved. Manifold learning was further performed on the low-dimensional embeddings. Based on this, the Gaussian Mixture Model (GMM) was employed to cluster the data on the underlying manifold. Unlike other deep clustering algorithms, additional training of the clustering network was not required for CADC, simplifying the algorithm's complexity. Additionally, key parameters (n_neighbors and min_dist) in the UMAP dimensionality reduction method were analyzed, their impact on clustering performance was investigated, and their optimal values were determined through experiments. Results Experiments were conducted on four real-world datasets, MNIST, Fashion-MNIST, USPS and Pendigits. It was demonstrated that the CADC algorithm significantly outperformed traditional clustering algorithms and some existing deep clustering algorithms. It was revealed by parameter analysis that the settings of the n_neighbors and min_dist parameters in UMAP had a significant impact on clustering performance. Specifically, the most favorable clustering results were yielded when n_neighbors was set to 20 and min_dist was set to 0. Conclusions The CADC algorithm, which utilized convolutional autoencoders and local manifold learning, could effectively improve clustering performance without the need for additional training of the clustering network. This algorithm provided a new methodological option for deep clustering and held great potential for applications in clustering complex high-dimensional data.
Key words: deep clustering; convolutional auto-encoder; UMAP manifold; GMM clustering; feature extraction; dimensionality reduction