Hyppää sisältöön
    • FI
    • ENG
  • FI
  • /
  • EN
OuluREPO – Oulun yliopiston julkaisuarkisto / University of Oulu repository
Näytä viite 
  •   OuluREPO etusivu
  • Oulun yliopisto
  • Avoin saatavuus
  • Näytä viite
  •   OuluREPO etusivu
  • Oulun yliopisto
  • Avoin saatavuus
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

End-to-end dual-branch network towards synthetic speech detection

Ma, Kaijie; Feng, Yifan; Chen, Beijing; Zhao, Guoying (2023-03-27)

 
Avaa tiedosto
nbnfi-fe2023061555562.pdf (632.8Kt)
nbnfi-fe2023061555562_meta.xml (34.74Kt)
nbnfi-fe2023061555562_solr.xml (39.02Kt)
Lataukset: 

URL:
https://doi.org/10.1109/LSP.2023.3262419

Ma, Kaijie
Feng, Yifan
Chen, Beijing
Zhao, Guoying
Institute of Electrical and Electronics Engineers
27.03.2023

K. Ma, Y. Feng, B. Chen and G. Zhao, "End-to-End Dual-Branch Network Towards Synthetic Speech Detection," in IEEE Signal Processing Letters, vol. 30, pp. 359-363, 2023, doi: 10.1109/LSP.2023.3262419

https://rightsstatements.org/vocab/InC/1.0/
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
doi:https://doi.org/10.1109/LSP.2023.3262419
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2023061555562
Tiivistelmä

Abstract

Synthetic speech attacks bring more threats to Automatic Speaker Verification (ASV) systems, thus many synthetic speech detection (SSD) systems have been proposed to help the ASV system resist synthetic speech attacks. However, existing SSD systems still lack the generalization ability for the attacks generated by unknown synthesis algorithms. This letter proposes an end-to-end ensemble system, namely Dual-Branch Network, in which linear frequency cepstral coefficients (LFCC) and constant Q transform (CQT) are used as the input of two branches respectively. In addition, four fusion strategies are compared for the fusion of two branches to obtain an optimal one; multi-task learning and convolutional block attention module (CBAM) are introduced into the Dual-Branch Network to help the network learn the common forgery features from different forgery types of speech and enhance the representation power of learned features. Experimental results on the ASVspoof 2019 logical access (LA) dataset demonstrate that the proposed system outperforms existing state-of-the-art systems on both t-DCF and EER scores and has good generalization for unknown forgery types of synthetic speech.

Kokoelmat
  • Avoin saatavuus [37957]
oulurepo@oulu.fiOulun yliopiston kirjastoOuluCRISLaturiMuuntaja
SaavutettavuusselosteTietosuojailmoitusYlläpidon kirjautuminen
 

Selaa kokoelmaa

NimekkeetTekijätJulkaisuajatAsiasanatUusimmatSivukartta

Omat tiedot

Kirjaudu sisäänRekisteröidy
oulurepo@oulu.fiOulun yliopiston kirjastoOuluCRISLaturiMuuntaja
SaavutettavuusselosteTietosuojailmoitusYlläpidon kirjautuminen