The Corpus of Australian and New Zealand Spoken English : a new resource of naturalistic speech transcripts
Coats, Steven (2022-12-30)
Steven Coats. 2022. The Corpus of Australian and New Zealand Spoken English: A new resource of naturalistic speech transcripts. In Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association, pages 1–5, Adelaide, Australia. Australasian Language Technology Association, https://aclanthology.org/2022.alta-1.1.pdf
© 1963–2023 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
The Corpus of Australian and New Zealand Spoken English (CoANZSE) is a 190-million-word corpus of Automatic Speech Recognition (ASR) transcripts from YouTube channels of local councils and other governmental bodies in 472 locations in Australia and New Zealand. CoANZSE can be used to examine grammar and syntax in Australian and New Zealand spoken English, and because tokens are word-timed and transcripts are linked to videos, it can serve as the starting point for phonetic or multi-modal studies. Two exploratory analyses demonstrate differences between Australia and New Zealand in the relative frequencies of double modals, a rare non-standard syntactic feature, and show that transcripts from Australia and New Zealand can be distinguished on the basis of common lexical items.
- Avoin saatavuus