PhySU-Net: Long Temporal Context Transformer for rPPG with Self-supervised Pre-training
Savic, Marko; Zhao, Guoying (2024-12-02)
Avaa tiedosto
Sisältö avataan julkiseksi: 02.12.2025
Savic, Marko
Zhao, Guoying
Springer
02.12.2024
Savic, M., Zhao, G. (2025). PhySU-Net: Long Temporal Context Transformer for rPPG with Self-supervised Pre-training. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15314. Springer, Cham. https://doi.org/10.1007/978-3-031-78341-8_15
https://rightsstatements.org/vocab/InC/1.0/
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG. This is a post-peer-review, pre-copyedit version of an article published in Pattern Recognition. The final authenticated version is available online at: https://doi.org/10.1007/978-3-031-78341-8_15
https://rightsstatements.org/vocab/InC/1.0/
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG. This is a post-peer-review, pre-copyedit version of an article published in Pattern Recognition. The final authenticated version is available online at: https://doi.org/10.1007/978-3-031-78341-8_15
https://rightsstatements.org/vocab/InC/1.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202412047057
https://urn.fi/URN:NBN:fi:oulu-202412047057
Tiivistelmä
Abstract
Remote photoplethysmography (rPPG) is a promising technology that consists of contactless measuring of cardiac activity from facial videos. However, current approaches are limited by data scarcity and environmental noise robustness. Most recent approaches utilize convolutional networks with limited temporal modeling capability or ignore long temporal context. Purely supervised rPPG methods are also severely limited by scarce data availability. In this work, we propose PhySU-Net, the first long temporal context rPPG transformer network and a novel self-supervised pre-training strategy that exploits unlabeled data to improve our model. Our strategy leverages traditional methods and image masking to provide pseudo-labels for physiologically relevant self-supervised pre-training. Our model is tested on three public benchmark datasets (OBF, VIPL-HR and MMSE-HR) and shows state-of-the-art performance in supervised training. Furthermore, we demonstrate that our self-supervised pre-training strategy further improves our model’s performance by leveraging representations learned from unlabeled data. Our code is available at: https://github.com/marukosan93/PhySU-Net.
Remote photoplethysmography (rPPG) is a promising technology that consists of contactless measuring of cardiac activity from facial videos. However, current approaches are limited by data scarcity and environmental noise robustness. Most recent approaches utilize convolutional networks with limited temporal modeling capability or ignore long temporal context. Purely supervised rPPG methods are also severely limited by scarce data availability. In this work, we propose PhySU-Net, the first long temporal context rPPG transformer network and a novel self-supervised pre-training strategy that exploits unlabeled data to improve our model. Our strategy leverages traditional methods and image masking to provide pseudo-labels for physiologically relevant self-supervised pre-training. Our model is tested on three public benchmark datasets (OBF, VIPL-HR and MMSE-HR) and shows state-of-the-art performance in supervised training. Furthermore, we demonstrate that our self-supervised pre-training strategy further improves our model’s performance by leveraging representations learned from unlabeled data. Our code is available at: https://github.com/marukosan93/PhySU-Net.
Kokoelmat
- Avoin saatavuus [38840]