Information-enhanced Network for Noncontact Heart Rate Estimation from Facial Videos
Liu, Lili; Xia, Zhaoqiang; Zhang, Xiaobiao; Peng, Jinye; Feng, Xiaoyi; Zhao, Guoying (2023-08-04)
Liu, Lili
Xia, Zhaoqiang
Zhang, Xiaobiao
Peng, Jinye
Feng, Xiaoyi
Zhao, Guoying
IEEE
04.08.2023
L. Liu, Z. Xia, X. Zhang, J. Peng, X. Feng and G. Zhao, "Information-Enhanced Network for Noncontact Heart Rate Estimation From Facial Videos," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2136-2150, April 2024, doi: 10.1109/TCSVT.2023.3301962.
https://rightsstatements.org/vocab/InC/1.0/
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202403182269
https://urn.fi/URN:NBN:fi:oulu-202403182269
Tiivistelmä
Abstract
Remote photoplethysmography (rPPG) is a vital way of measuring heart rate (HR) to reflect human physical and mental health, which is useful for diagnosing cardiovascular and neurological diseases. Many non-contact HR estimation methods have been proposed gradually in recent years, but the majority of approaches are based on a single-modal HR information source, resulting in ineffective and unsatisfactory estimation results due to noise and insufficient information. This paper proposes a novel information-enhanced network for HR estimation based on multimodal (e.g., RGB and NIR) sources to address these problems. In the network, context and modal difference information are sequentially enhanced from spatiotemporal and modal views for accurately describing HR-aware features, while maximum frequency information is enhanced for inhibiting heartbeat noise. Specifically, a context-enhanced video Swin-Transformer (CET) module is exploited to extract useful rPPG signal features from facial visible-light and near-infrared videos. Then, a novel modal difference enhanced fusion (MDEF) module is designed to acquire a fused rPPG signal, which is taken as the input of the frequency-enhanced estimation (FEE) module to obtain the corresponding HR value. These three modules are integrated and jointly learned in an end-to-end way, and the multimodal combinations can provide highly complementary information for estimating HR value. Experimental and evaluation results on three multimodal datasets show that the proposed model achieves a superior effect compared to the state-of-the-art methods.
Remote photoplethysmography (rPPG) is a vital way of measuring heart rate (HR) to reflect human physical and mental health, which is useful for diagnosing cardiovascular and neurological diseases. Many non-contact HR estimation methods have been proposed gradually in recent years, but the majority of approaches are based on a single-modal HR information source, resulting in ineffective and unsatisfactory estimation results due to noise and insufficient information. This paper proposes a novel information-enhanced network for HR estimation based on multimodal (e.g., RGB and NIR) sources to address these problems. In the network, context and modal difference information are sequentially enhanced from spatiotemporal and modal views for accurately describing HR-aware features, while maximum frequency information is enhanced for inhibiting heartbeat noise. Specifically, a context-enhanced video Swin-Transformer (CET) module is exploited to extract useful rPPG signal features from facial visible-light and near-infrared videos. Then, a novel modal difference enhanced fusion (MDEF) module is designed to acquire a fused rPPG signal, which is taken as the input of the frequency-enhanced estimation (FEE) module to obtain the corresponding HR value. These three modules are integrated and jointly learned in an end-to-end way, and the multimodal combinations can provide highly complementary information for estimating HR value. Experimental and evaluation results on three multimodal datasets show that the proposed model achieves a superior effect compared to the state-of-the-art methods.
Kokoelmat
- Avoin saatavuus [34150]