Convolutional autoencoder deep clustering for preserving the local structure of data-河南理工大学出版中心

>> English >> Online First >> 正文

Convolutional autoencoder deep clustering for preserving the local structure of data

Time: 2025-06-16

Counts:

LI S Y, XING Y M, XU R．Convolutional autoencoder deep clustering for preserving the local structure of data[J]. Journal of Henan Polytechnic University(Natural Science), doi:10.16186/j.cnki.1673-9787.2025010018.

doi: 10.16186/j.cnki.1673-9787.2025010018

Received：2025-01-12

Revised：2025-03-25

Online：2025-06-16

Convolutional autoencoder deep clustering for preserving the local structure of data(Online)

Li Shunyong^1,2, Xing Yuman¹, Xu Rui¹

(1. School of Mathematics and Statistics, Shanxi University, Taiyuan 030006, Shanxi, China; 2. Key Laboratory of Complex Systems and Data Science of Ministry of Education, Shanxi University, Taiyuan 030006, Shanxi, China)

Abstract: Objectives Deep clustering algorithms are increasingly surpassing traditional clustering methods in improving clustering performance. This study aims to propose a novel deep clustering algorithm to efficiently handle high-dimensional data and explore its underlying manifold structure, thereby enhancing clustering quality. Methods A convolutional autoencoder deep clustering algorithm for preserving the local structure of data (CADC) was proposed in this paper. A convolutional autoencoder was used to learn low-dimensional embedding representations of the original high-dimensional data, with the local structure of the data being preserved. Manifold learning was further performed on the low-dimensional embeddings. Based on this, the Gaussian Mixture Model (GMM) was employed to cluster the data on the underlying manifold. Unlike other deep clustering algorithms, additional training of the clustering network was not required for CADC, simplifying the algorithm's complexity. Additionally, key parameters (n_neighbors and min_dist) in the UMAP dimensionality reduction method were analyzed, their impact on clustering performance was investigated, and their optimal values were determined through experiments. Results Experiments were conducted on four real-world datasets, MNIST, Fashion-MNIST, USPS and Pendigits. It was demonstrated that the CADC algorithm significantly outperformed traditional clustering algorithms and some existing deep clustering algorithms. It was revealed by parameter analysis that the settings of the n_neighbors and min_dist parameters in UMAP had a significant impact on clustering performance. Specifically, the most favorable clustering results were yielded when n_neighbors was set to 20 and min_dist was set to 0. Conclusions The CADC algorithm, which utilized convolutional autoencoders and local manifold learning, could effectively improve clustering performance without the need for additional training of the clustering network. This algorithm provided a new methodological option for deep clustering and held great potential for applications in clustering complex high-dimensional data.

Key words: deep clustering; convolutional auto-encoder; UMAP manifold; GMM clustering; feature extraction; dimensionality reduction

附件【2025010018-李顺勇-保持数据局部结构的卷积自编码深度聚类（最终稿）.docx】Download 次

Lastest

Enrichment and source analysis of critical elements in No.6 coal seam from Late Permian in Northern Guizhou Province[12/12]

An optimization method for laser SLAM point clouds based on pose graph[12/12]

Mechanism of the effect of naphthalene-based high efficient water reducing agent on the properties of red mud-based cementitious materials[11/12]

Study of anti interference mechanism and intelligent control of top-coal drawing mechanism of four-leg hydraulic support[04/12]

Effects of constraints on explosion characteristics of biogas/air premixed gas in duct[04/12]

Research on the barrier model for controlling accident hazards in hazardous chemical enterprises[03/12]

Carbon footprint analysis and assessment of municipal wastewater treatment based on life cycle assessment[05/11]

Effect of multi-stage solution treatment on the microstructure and overall properties of 7075 aluminium alloy[27/10]

Research on intelligent analysis of stratigraphic information based on sparse geological survey data[23/10]

First-principles study on electronic structure and magnetic properties of (V, Mn) co-doped SnSe2[21/10]