Descriptor: C++ Self-Admitted Technical Debt Dataset (CppSATD)
Pham, Phuoc; Sridharan, Murali; Esposito, Matteo; Lenarduzzi, Valentina (2025-06-04)
Pham, Phuoc
Sridharan, Murali
Esposito, Matteo
Lenarduzzi, Valentina
IEEE
04.06.2025
P. Pham, M. Sridharan, M. Esposito and V. Lenarduzzi, "Descriptor: C++ Self-Admitted Technical Debt Dataset (CppSATD)," in IEEE Data Descriptions, doi: 10.1109/IEEEDATA.2025.3576339
https://creativecommons.org/licenses/by/4.0/
© Copyright 2025 IEEE. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
https://creativecommons.org/licenses/by/4.0/
© Copyright 2025 IEEE. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
https://creativecommons.org/licenses/by/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202506064202
https://urn.fi/URN:NBN:fi:oulu-202506064202
Tiivistelmä
Abstract
In software development, technical debt (TD) refers to suboptimal implementation choices made by the developers to meet urgent deadlines and limited resources, posing challenges for future maintenance. Self-Admitted Technical Debt (SATD) is a sub-type of TD, representing specific TD instances “openly admitted” by the developers and often expressed through source code comments. Previous research on SATD has focused predominantly on the Java programming language, revealing a significant gap in cross-language SATD. Such a narrow focus limits the generalizability of existing findings as well as SATD detection techniques across multiple programming languages. Our work addresses such limitation by introducing CppSATD, a dedicated C++ SATD dataset, comprising over 531,000 annotated comments and their source code contexts. Our dataset can serve as a foundation for future studies that aim to develop SATD detection methods in C++, generalize the existing findings to other languages, or contribute novel insights to cross-language SATD research.
In software development, technical debt (TD) refers to suboptimal implementation choices made by the developers to meet urgent deadlines and limited resources, posing challenges for future maintenance. Self-Admitted Technical Debt (SATD) is a sub-type of TD, representing specific TD instances “openly admitted” by the developers and often expressed through source code comments. Previous research on SATD has focused predominantly on the Java programming language, revealing a significant gap in cross-language SATD. Such a narrow focus limits the generalizability of existing findings as well as SATD detection techniques across multiple programming languages. Our work addresses such limitation by introducing CppSATD, a dedicated C++ SATD dataset, comprising over 531,000 annotated comments and their source code contexts. Our dataset can serve as a foundation for future studies that aim to develop SATD detection methods in C++, generalize the existing findings to other languages, or contribute novel insights to cross-language SATD research.
Kokoelmat
- Avoin saatavuus [38320]