A framework for analysis of speech and chat content in YouTube and Twitch streams
Coats, Steven (2024-08-24)
Coats, Steven
Université Côte d’Azur
24.08.2024
Coats, S. (2024). A framework for analysis of speech and chat content in YouTube and Twitch streams. In C. Poudat, M. Guernut (Eds.), Proceedings of the 11th Conference on Computer-Mediated Communication and Social Media Corpora for the Humanities. Université Côte d’Azur.
https://creativecommons.org/licenses/by/4.0/
This work is licensed under a Creative Commons "Attribution 4.0 International" license.
https://creativecommons.org/licenses/by/4.0/
This work is licensed under a Creative Commons "Attribution 4.0 International" license.
https://creativecommons.org/licenses/by/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202409065718
https://urn.fi/URN:NBN:fi:oulu-202409065718
Tiivistelmä
Abstract
Online streaming platforms have become important sites of interaction and communication, but relatively little research into streaming
platforms has considered the combined discourse of speech transcripts and live chat streams. In this paper we describe a pipeline approach that can integrate speech transcripts with live chat content in order to create structured documents from streams recorded on the platforms YouTube and Twitch. Built on common streaming protocols and the open-source Python library yt-dlp, the notebook comprises modular script components for data download and organization of transcripts and live chat and can additionally retrieve audio, video, and other streamed content. Additional pipeline modules can be used for automatic speech-to-text transcription of the video stream and incorporation of models for specific analytical tasks such as automatic video classification, gesture identification, or facial recognition. The paper demonstrates use of the notebook to output a time-stamped, structured combined speech/chat html file and proposes two possible analyses: consideration of chat density, and zero-shot classification of video content.
Online streaming platforms have become important sites of interaction and communication, but relatively little research into streaming
platforms has considered the combined discourse of speech transcripts and live chat streams. In this paper we describe a pipeline approach that can integrate speech transcripts with live chat content in order to create structured documents from streams recorded on the platforms YouTube and Twitch. Built on common streaming protocols and the open-source Python library yt-dlp, the notebook comprises modular script components for data download and organization of transcripts and live chat and can additionally retrieve audio, video, and other streamed content. Additional pipeline modules can be used for automatic speech-to-text transcription of the video stream and incorporation of models for specific analytical tasks such as automatic video classification, gesture identification, or facial recognition. The paper demonstrates use of the notebook to output a time-stamped, structured combined speech/chat html file and proposes two possible analyses: consideration of chat density, and zero-shot classification of video content.
Kokoelmat
- Avoin saatavuus [34508]