Few-shot action recognition in video method based on second-order spatiotemporal adaptation-河南理工大学出版中心

>> English >> Current Issue >> 正文

Few-shot action recognition in video method based on second-order spatiotemporal adaptation

Time: 2025-07-23

Counts:

ZHANG B B, LI H B, MA Y C,et al.Few-shot action recognition in video method based on second-order spatiotemporal adaptation[J].Journal of Henan Polytechnic University(Natural Science) ,2025,44(5):43-51.

DOI:10.16186/j.cnki.1673-9787.2027070013

Received: 2024/07/02

Revised: 2024/09/19

Published:2025/07/23

Few-shot action recognition in video method based on second-order spatiotemporal adaptation

Zhang Bingbing, Li Haibo, Ma Yuanchen, Zhang Jianxin

School of Computer Science and Engineering， Dalian Minzu University， Dalian 116650， Liaoning， China

Abstract: Objectives In the field of few-shot video action recognition， existing methods generally face challenges in adequately processing global spatiotemporal information. These methods typically rely on large amounts of annotated data to train deep models， but with only a limited number of training samples available， they often struggle to effectively capture and utilize the spatiotemporal dynamics in video data. Methods To address this issue， an innovative second-order spatiotemporal adaptive network architecture including a spatiotemporal adaptive module and a covariance aggregation module was proposed to significantly enhance the accuracy and robustness of few-shot learning in video action recognition tasks. The spatiotemporal adaptive module dynamically aggregated local and global spatiotemporal information based on changes in video content， thereby optimizing the process of global information extraction. The covariance aggregation module utilized second-order statistical methods to enhance the global spatiotemporal feature representation of videos， providing a more robust global depiction of video content. Results Extensive experiments were conducted on four mainstream video action recognition benchmark datasets. The results demonstrated that the proposed method achieved accuracies of 52.2% and 72.4% for 1-shot and 5-shot tasks on the Something-Something V2 dataset， significantly outperforming the baseline model. Strong performance was also observed on Kinetics100， UCF101， and HMDB51 datasets， fully validating its effectiveness and practicality in few-shot video action recognition. Conclusions The proposed second-order spatiotemporal adaptive network effectively improved the accuracy and robustness of few-shot video action recognition. It demonstrated significant advantages in processing complex spatiotemporal information. This work provided an innovative and efficient solution addressing critical challenges in spatiotemporal modeling under limited data scenarios.

Key words:few-shot learning;action recognition in video;spatiotemporal representation learning;temporal modeling;covariance aggregation

附件【006_2027070013_张冰冰_L.pdf】Download 次

Lastest

Bearing fault diagnosis method based on dynamic clustering federated learning[23/07]

A green task offloading strategy for mobile edge computing based on MDP and Q-learning[23/07]

Study on the method of converting asynchronous event stream into grid representation[23/07]

A 3D facial feature template protection scheme based on line cloud[23/07]

Lightweight image segmentation method for transmission line insulators based on MobileNetV2[23/07]

Few-shot action recognition in video method based on second-order spatiotemporal adaptation[23/07]

Geometric interaction-based discrete dynamic graph link prediction model[23/07]

Study on coordinated control of urban trunk traffic based on cellular automata simulation[23/07]

Analysis of factors influencing the control range of grouting and hole sealing with gel-like materials in fractured rock masses[23/07]

Study on the development characteristics of the water-conducting fissure zone in large-height and ultra-wide working faces at Sihe Coal Mine[23/07]