Hyppää sisältöön
    • FI
    • ENG
  • FI
  • /
  • EN
OuluREPO – Oulun yliopiston julkaisuarkisto / University of Oulu repository
Näytä viite 
  •   OuluREPO etusivu
  • Oulun yliopisto
  • Avoin saatavuus
  • Näytä viite
  •   OuluREPO etusivu
  • Oulun yliopisto
  • Avoin saatavuus
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Importance-aware information bottleneck learning paradigm for lip reading

Sheng, Changchong; Liu, Li; Deng, Wanxia; Bai, Liang; Liu, Zhong; Lao, Songyang; Kuang, Gangyao; Pietikäinen, Matti (2022-09-29)

 
Avaa tiedosto
nbnfi-fe20231004138661.pdf (5.342Mt)
nbnfi-fe20231004138661_meta.xml (44.09Kt)
nbnfi-fe20231004138661_solr.xml (33.91Kt)
Lataukset: 

URL:
https://doi.org/10.1109/TMM.2022.3210761

Sheng, Changchong
Liu, Li
Deng, Wanxia
Bai, Liang
Liu, Zhong
Lao, Songyang
Kuang, Gangyao
Pietikäinen, Matti
Institute of Electrical and Electronics Engineers
29.09.2022

C. Sheng et al., "Importance-Aware Information Bottleneck Learning Paradigm for Lip Reading," in IEEE Transactions on Multimedia, vol. 25, pp. 6563-6574, 2023, doi: 10.1109/TMM.2022.3210761

https://rightsstatements.org/vocab/InC/1.0/
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
doi:https://doi.org/10.1109/tmm.2022.3210761
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe20231004138661
Tiivistelmä

Abstract

Lip reading is the task of decoding text from speakers’ mouth movements. Numerous deep learning-based methods have been proposed to address this task. However, these existing deep lip reading models suffer from poor generalization due to overfitting the training data. To resolve this issue, we present a novel learning paradigm that aims to improve the interpretability and generalization of lip reading models. In specific, a Variational Temporal Mask (VTM) module is customized to automatically analyze the importance of frame-level features. Furthermore, the prediction consistency constraints of global information and local temporal important features are introduced to strengthen the model generalization. We evaluate the novel learning paradigm with multiple lip reading baseline models on the LRW and LRW-1000 datasets. Experiments show that the proposed framework significantly improves the generalization performance and interpretability of lip reading models.

Kokoelmat
  • Avoin saatavuus [38840]
oulurepo@oulu.fiOulun yliopiston kirjastoOuluCRISLaturiMuuntaja
SaavutettavuusselosteTietosuojailmoitusYlläpidon kirjautuminen
 

Selaa kokoelmaa

NimekkeetTekijätJulkaisuajatAsiasanatUusimmatSivukartta

Omat tiedot

Kirjaudu sisäänRekisteröidy
oulurepo@oulu.fiOulun yliopiston kirjastoOuluCRISLaturiMuuntaja
SaavutettavuusselosteTietosuojailmoitusYlläpidon kirjautuminen