Enhancing trustworthiness of Arabic online health information quality evaluation using an enhanced BERT architecture with PCA and ICA feature weighting
Baqraf, Yousef; Keikhosrokiani, Pantea; Cheah, Yu-N (2026-03-06)
Baqraf, Yousef
Keikhosrokiani, Pantea
Cheah, Yu-N
Springer
06.03.2026
Baqraf, Y., Keikhosrokiani, P., & Cheah, Y.-N. (2026). Enhancing trustworthiness of Arabic online health information quality evaluation using an enhanced BERT architecture with PCA and ICA feature weighting. Scientific Reports, 16(1), 12434. https://doi.org/10.1038/s41598-026-43158-8
https://creativecommons.org/licenses/by-nc-nd/4.0/
© The Author(s) 2026. This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
https://creativecommons.org/licenses/by-nc-nd/4.0/
© The Author(s) 2026. This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
https://creativecommons.org/licenses/by-nc-nd/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202603182242
https://urn.fi/URN:NBN:fi:oulu-202603182242
Tiivistelmä
Abstract
Despite the exponential increase in the availability of online health information, its quality remains questionable, presenting a significant challenge to address. This study addresses this issue by using artificial intelligence techniques, such as deep learning, to evaluate the quality of health information and to mimic human-level evaluation capabilities. The key methodologies used in the study included an enhanced version of Arabic BERT for medical data, feature extraction techniques incorporating Principal Component Analysis (PCA) and Independent Component Analysis (ICA), and modified loss functions using information entropy to improve the model’s certainty and calibration during document classification. The results of the study were encouraging: the proposed PCA-based model achieved higher accuracy than the competing models and reached 94.7% on the dataset used, comparable to reported human-level performance. Finally, these findings may contribute to improving the reliability of online health information in Arabic contexts and provide a foundation for future efforts aimed at supporting healthcare decision-making. The methodologies and results presented here offer policymakers and researchers valuable tools to assess and ensure the trustworthiness of online health information.
Despite the exponential increase in the availability of online health information, its quality remains questionable, presenting a significant challenge to address. This study addresses this issue by using artificial intelligence techniques, such as deep learning, to evaluate the quality of health information and to mimic human-level evaluation capabilities. The key methodologies used in the study included an enhanced version of Arabic BERT for medical data, feature extraction techniques incorporating Principal Component Analysis (PCA) and Independent Component Analysis (ICA), and modified loss functions using information entropy to improve the model’s certainty and calibration during document classification. The results of the study were encouraging: the proposed PCA-based model achieved higher accuracy than the competing models and reached 94.7% on the dataset used, comparable to reported human-level performance. Finally, these findings may contribute to improving the reliability of online health information in Arabic contexts and provide a foundation for future efforts aimed at supporting healthcare decision-making. The methodologies and results presented here offer policymakers and researchers valuable tools to assess and ensure the trustworthiness of online health information.
Kokoelmat
- Avoin saatavuus [42834]

