AGIL-SwinT: Attention-guided inconsistency learning for face forgery detection
Xiong, Wuti; Chen, Haoyu; Zhao, Guoying; Li, Xiaobai (2024-09-12)
Avaa tiedosto
Sisältö avataan julkiseksi: 12.09.2026
Xiong, Wuti
Chen, Haoyu
Zhao, Guoying
Li, Xiaobai
Elsevier
12.09.2024
Xiong, W., Chen, H., Zhao, G., & Li, X. (2024). AGIL-SwinT: Attention-guided inconsistency learning for face forgery detection. Image and Vision Computing, 151, 105274. https://doi.org/10.1016/j.imavis.2024.105274
https://creativecommons.org/licenses/by-nc-nd/4.0/
© 2024. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/
https://creativecommons.org/licenses/by-nc-nd/4.0/
© 2024. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/
https://creativecommons.org/licenses/by-nc-nd/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202503262219
https://urn.fi/URN:NBN:fi:oulu-202503262219
Tiivistelmä
Abstract
Face forgery detection (FFD) plays a vital role in maintaining the security and integrity of various information and media systems. Forgery inconsistency caused by manipulation techniques has been proven to be effective for generalizing to the unseen data domain. However, most existing works rely on pixel-level forgery annotations to learn forgery inconsistency. To address the problem, we propose a novel Swin Transformer-based method, AGIL-SwinT, that can effectively learn forgery inconsistency using only video-level labels. Specifically, we first leverage the Swin Transformer to generate the initial mask for the forgery regions. Then, we introduce an attention-guided inconsistency learning module that uses unsupervised learning to learn inconsistency from attention. The learned inconsistency is used to revise the initial mask for enhancing forgery detection. In addition, we introduce a forgery mask refinement module to obtain reliable inconsistency labels for supervising inconsistency learning and ensuring the mask is aligned with the forgery boundaries. We conduct extensive experiments on multiple FFD benchmarks, including intra-dataset, cross-dataset and cross-manipulation testing. The experimental results demonstrate that our method significantly outperforms existing methods and generalizes well to unseen datasets and manipulation categories. Our code is available at https://github.com/woody-xiong/AGIL-SwinT.
Face forgery detection (FFD) plays a vital role in maintaining the security and integrity of various information and media systems. Forgery inconsistency caused by manipulation techniques has been proven to be effective for generalizing to the unseen data domain. However, most existing works rely on pixel-level forgery annotations to learn forgery inconsistency. To address the problem, we propose a novel Swin Transformer-based method, AGIL-SwinT, that can effectively learn forgery inconsistency using only video-level labels. Specifically, we first leverage the Swin Transformer to generate the initial mask for the forgery regions. Then, we introduce an attention-guided inconsistency learning module that uses unsupervised learning to learn inconsistency from attention. The learned inconsistency is used to revise the initial mask for enhancing forgery detection. In addition, we introduce a forgery mask refinement module to obtain reliable inconsistency labels for supervising inconsistency learning and ensuring the mask is aligned with the forgery boundaries. We conduct extensive experiments on multiple FFD benchmarks, including intra-dataset, cross-dataset and cross-manipulation testing. The experimental results demonstrate that our method significantly outperforms existing methods and generalizes well to unseen datasets and manipulation categories. Our code is available at https://github.com/woody-xiong/AGIL-SwinT.
Kokoelmat
- Avoin saatavuus [38841]