Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects
Wang, Han; Yu, Sijia; Chen, Chunyang; Turhan, Burak; Zhu, Xiaodong (2023-12-20)
Wang, Han
Yu, Sijia
Chen, Chunyang
Turhan, Burak
Zhu, Xiaodong
ACM
20.12.2023
Han Wang, Sijia Yu, Chunyang Chen, Burak Turhan, and Xiaodong Zhu. 2024. Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects. ACM Trans. Softw. Eng. Methodol. 33, 4, Article 104 (May 2024), 22 pages. https://doi.org/10.1145/3638245
https://rightsstatements.org/vocab/InC/1.0/
© 2023 Copyright held by the owner/author(s). This is the authors' version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Software Engineering and Methodology, https://doi.org/10.1145/3638245
https://rightsstatements.org/vocab/InC/1.0/
© 2023 Copyright held by the owner/author(s). This is the authors' version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Software Engineering and Methodology, https://doi.org/10.1145/3638245
https://rightsstatements.org/vocab/InC/1.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202312284031
https://urn.fi/URN:NBN:fi:oulu-202312284031
Tiivistelmä
Abstract
Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: 1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests, 2) 68% of the sampled DL projects are not unit tested at all, 3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.
Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: 1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests, 2) 68% of the sampled DL projects are not unit tested at all, 3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.
Kokoelmat
- Avoin saatavuus [36548]