Temporal hierarchical dictionary guided decoding for online gesture segmentation and recognition
Chen, Haoyu; Liu, Xin; Shi, Jingang; Zhao, Guoying (2020-10-14)
H. Chen, X. Liu, J. Shi and G. Zhao, "Temporal Hierarchical Dictionary Guided Decoding for Online Gesture Segmentation and Recognition," in IEEE Transactions on Image Processing, vol. 29, pp. 9689-9702, 2020, doi: 10.1109/TIP.2020.3028962
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
https://urn.fi/URN:NBN:fi-fe2020120399293
Tiivistelmä
Abstract
Online segmentation and recognition of skeleton- based gestures are challenging. Compared with offline cases, the inference of online settings can only rely on the current few frames and always completes before whole temporal movements are performed. However, incompletely performed gestures are ambiguous and their early recognition is easy to fall into local optimum. In this work, we address the problem with a temporal hierarchical dictionary to guide the hidden Markov model (HMM) decoding procedure. The intuition is that, gestures are ambiguous with high uncertainty at early performing phases, and only become discriminate after certain phases. This uncertainty naturally can be measured by entropy. Thus, we propose a measurement called “relative entropy map” (REM) to encode this temporal context to guide HMM decoding. Furthermore, we introduce a progressive learning strategy with which neural networks could learn a robust recognition of HMM states in an iterative manner. The performance of our method is intensively evaluated on three challenging databases and achieves state-of-the-art results. Our method shows the abilities of both extracting the discriminate connotations and reducing large redundancy in the HMM transition process. It is verified that our framework can achieve online recognition of continuous gesture streams even when they are halfway performed.
Kokoelmat
- Avoin saatavuus [34547]