Sentiment analysis for low-resource languages using large language models with zero-shot, N-shot prompting, and LoRA fine-tuning
Polock, Shakib Ibna Shameem (2025-06-09)
Polock, Shakib Ibna Shameem
S. I. S. Polock
09.06.2025
© 2025, Shakib Ibna Shameem Polock. Tämä Kohde on tekijänoikeuden ja/tai lähioikeuksien suojaama. Voit käyttää Kohdetta käyttöösi sovellettavan tekijänoikeutta ja lähioikeuksia koskevan lainsäädännön sallimilla tavoilla. Muunlaista käyttöä varten tarvitset oikeudenhaltijoiden luvan.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202506094261
https://urn.fi/URN:NBN:fi:oulu-202506094261
Tiivistelmä
Large Language Models (LLMs) have shown significant capabilities in natural language processing, enabling new approaches to overcome the difficulties of low-resource languages. This thesis aims to explore sentiment analysis using LLMs through zero-shot and n-shot prompting, and fine-tuning strategies, for 12 African low-resource languages, including Bengali. Low-resource languages are critically underrepresented in computational linguistics, necessitating research to ensure digital inclusivity and equitable access to language technologies. The study incorporates back-translation to assess lexicon stability, embedding-based analysis for sentiment discrimination power, and polarity consistency evaluation of LLM-generated phrases across languages. Fine-tuning techniques, including Low-Rank Adaptation (LoRA), were applied for sentiment classification on the AfriSenti-SemEval2023 shared task. Results indicate that Bengali and Hausa exhibit high stability in sentiment polarity during back-translation, while lexicon discrimination power depends on embedding models and linguistic context. LLMs showed robust polarity retention across languages, though with varying performance for positive and negative sentiments. Though LLMs exhibit poor performance in zero-shot and n-shot prompting classification, fine-tuning them on language-specific datasets significantly improves sentiment classification capabilities, achieving an average F1-score of 72.50%, outperforming the top state-of-the-art submissions for AfriSenti SemEval shared Task 12, Sub-Task A, and the AfriSenti paper baseline. These findings demonstrate the potential of LLMs to address challenges in low-resource language sentiment analysis, underscoring the importance of tailored approaches and the need for ongoing research to bridge the linguistic resource gap.
Kokoelmat
- Avoin saatavuus [38618]