SaTPhys: Sandglass Transformer for Efficient Video-based Remote Physiological Measurement
Chu, Shuyang; Shi, Jingang; Yuan, Mengyao; Li, Xuqi; Jiang, Zhengdong; Zhao, Guoying (2026-04-21)
Chu, Shuyang
Shi, Jingang
Yuan, Mengyao
Li, Xuqi
Jiang, Zhengdong
Zhao, Guoying
IEEE
21.04.2026
S. Chu, J. Shi, M. Yuan, X. Li, Z. Jiang and G. Zhao, "SaTPhys: Sandglass Transformer for Efficient Video-based Remote Physiological Measurement," in IEEE Transactions on Circuits and Systems for Video Technology, doi: 10.1109/TCSVT.2026.3686300.
https://rightsstatements.org/vocab/InC/1.0/
© 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
© 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202605063034
https://urn.fi/URN:NBN:fi:oulu-202605063034
Tiivistelmä
Abstract
The effectiveness of Transformers has been proven in video-based remote photoplethysmography (rPPG) measurement. However, the inherently high computational cost of the Transformer poses limitations of these methods on resource-constrained devices. This paper presents a novel aggregating-and-distributing sandglass-like framework, called SaTPhys, for efficient Transformer-based rPPG measurement. Our SaTPhys initiates by clustering physiological tokens that possess redundant spatio-temporal information and concludes with the recovery of full-length tokens. This process leads to fewer cluster centers passing through the intermediate Transformer block, consequently enhancing the model’s efficiency. To accomplish this effectively, we manually design a physiological context aggregation (PCA) module to generate representative cluster centers, thereby eliminating spatio-temporal redundancy. Subsequently, we employ an inter-cluster Transformer (ICT) to efficiently interact with these cluster centers on a global scale. Finally, we introduce a physiological context distributing (PCD) module to restore full-length tokens and distribute the aggregated global information. Furthermore, we develop a frequency modulator (FM) block to enhance the frequency information, thereby improving the periodic fidelity of the estimated rPPG signal. Comprehensive experiments across multiple benchmark datasets have shown that the proposed method achieves superior performance with minimal computational cost. For example, compared to the SOTA rPPG method, our method achieves a lower MAE on the VIPL-HR dataset (3.96 bpm vs. 4.32 bpm) with a significantly lower computational cost (3.94 GMACs vs. 12.9 GMACs). The code is available at https://github.com/xjtucsy/SaTPhys.
The effectiveness of Transformers has been proven in video-based remote photoplethysmography (rPPG) measurement. However, the inherently high computational cost of the Transformer poses limitations of these methods on resource-constrained devices. This paper presents a novel aggregating-and-distributing sandglass-like framework, called SaTPhys, for efficient Transformer-based rPPG measurement. Our SaTPhys initiates by clustering physiological tokens that possess redundant spatio-temporal information and concludes with the recovery of full-length tokens. This process leads to fewer cluster centers passing through the intermediate Transformer block, consequently enhancing the model’s efficiency. To accomplish this effectively, we manually design a physiological context aggregation (PCA) module to generate representative cluster centers, thereby eliminating spatio-temporal redundancy. Subsequently, we employ an inter-cluster Transformer (ICT) to efficiently interact with these cluster centers on a global scale. Finally, we introduce a physiological context distributing (PCD) module to restore full-length tokens and distribute the aggregated global information. Furthermore, we develop a frequency modulator (FM) block to enhance the frequency information, thereby improving the periodic fidelity of the estimated rPPG signal. Comprehensive experiments across multiple benchmark datasets have shown that the proposed method achieves superior performance with minimal computational cost. For example, compared to the SOTA rPPG method, our method achieves a lower MAE on the VIPL-HR dataset (3.96 bpm vs. 4.32 bpm) with a significantly lower computational cost (3.94 GMACs vs. 12.9 GMACs). The code is available at https://github.com/xjtucsy/SaTPhys.
Kokoelmat
- Avoin saatavuus [43095]
