Active Defense Against Voice Conversion Through Generative Adversarial Network
Dong, Shihang; Chen, Beijing; Ma, Kaijie; Zhao, Guoying (2024-02-12)
Dong, Shihang
Chen, Beijing
Ma, Kaijie
Zhao, Guoying
IEEE
12.02.2024
S. Dong, B. Chen, K. Ma and G. Zhao, "Active Defense Against Voice Conversion Through Generative Adversarial Network," in IEEE Signal Processing Letters, vol. 31, pp. 706-710, 2024, doi: 10.1109/LSP.2024.3365034
https://rightsstatements.org/vocab/InC/1.0/
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202404262963
https://urn.fi/URN:NBN:fi:oulu-202404262963
Tiivistelmä
Abstract
Active defense is an important approach to counter speech deepfakes that threaten individuals’ privacy, property, and reputation. However, the existing works in this field suffer from issues such as time-consuming and ordinary defense effectiveness. This letter proposes a Generative Adversarial Network (GAN) framework for adversarial attacks as a defense against malicious voice conversion. The proposed method uses a generator to produce adversarial perturbations and adds them to the mel-spectrogram of the target audio to craft adversarial example. In addition, in order to enhance the defense effectiveness, a spectrogram waveform conversion simulation module (SWCSM) is designed to simulate the process of reconstructing waveform from the adversarial mel-spectrogram example and re-extracting mel-spectrogram from the reconstructed waveform. Experiments on four state-of-the-art voice conversion models show that our method achieves the overall best performance among five compared methods in both white-box and black-box scenarios in terms of defense effectiveness and generation time.
Active defense is an important approach to counter speech deepfakes that threaten individuals’ privacy, property, and reputation. However, the existing works in this field suffer from issues such as time-consuming and ordinary defense effectiveness. This letter proposes a Generative Adversarial Network (GAN) framework for adversarial attacks as a defense against malicious voice conversion. The proposed method uses a generator to produce adversarial perturbations and adds them to the mel-spectrogram of the target audio to craft adversarial example. In addition, in order to enhance the defense effectiveness, a spectrogram waveform conversion simulation module (SWCSM) is designed to simulate the process of reconstructing waveform from the adversarial mel-spectrogram example and re-extracting mel-spectrogram from the reconstructed waveform. Experiments on four state-of-the-art voice conversion models show that our method achieves the overall best performance among five compared methods in both white-box and black-box scenarios in terms of defense effectiveness and generation time.
Kokoelmat
- Avoin saatavuus [34304]