Natural language processing and code smell
Jalonen, Samppa (2024-06-19)
Jalonen, Samppa
S. Jalonen
19.06.2024
© 2024, Samppa Jalonen. Tämä Kohde on tekijänoikeuden ja/tai lähioikeuksien suojaama. Voit käyttää Kohdetta käyttöösi sovellettavan tekijänoikeutta ja lähioikeuksia koskevan lainsäädännön sallimilla tavoilla. Muunlaista käyttöä varten tarvitset oikeudenhaltijoiden luvan.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202406194764
https://urn.fi/URN:NBN:fi:oulu-202406194764
Tiivistelmä
Context: There is little understanding in the literature regarding how much and how NLP techniques are used in detection of code smell and technical debt and as such there is no overview over the field.
Objectives / methods: This systematic literature review set out to find out what kind of NLP techniques have been proposed for detection of code smell and technical debt, this consists of finding what kind of technical debt and code smell literature addresses alongside NLP, what kind of goals there is for usage of NLP, what techniques are employed, what kind of information sources have been used and if literature proposes any tools for detection of code smell and technical debt.
Methods: It performed literature review using 3 academic databases and finding 221 number of studies from which it selected 32 studies.
Results: This study found out what and how NLP (natural language processing) techniques has been used to technical debt and code smell, what kind of datasets are used. The study found 32 studies in total. Most of the studies addressed Self-Admitted Technical debt, followed up by code smells such as feature envy and God class. The usage of NLP is tightly aligned with the textual nature of the data that needs to be used. The used NLP techniques consistent of pre-processing of textual data and embedding of data before employment in proposed methods to find, the most frequent usage being with tokenization, stop-words and stemming, with word embedding close result. The datasets used in studies were made up of few public datasets that are often used alongside specifically made dataset with some modification or as is. The study also found several different approaches and methods proposed for detection of technical debt and code smell, however only a few proposed tools.
Conclusion: This study found several studies that employed NLP and NLP techniques for finding or as a part of finding code smell and technical debt. Alongside several different kinds of code smell and technical debt, several datasets used by the studies and few proposed tools.
Objectives / methods: This systematic literature review set out to find out what kind of NLP techniques have been proposed for detection of code smell and technical debt, this consists of finding what kind of technical debt and code smell literature addresses alongside NLP, what kind of goals there is for usage of NLP, what techniques are employed, what kind of information sources have been used and if literature proposes any tools for detection of code smell and technical debt.
Methods: It performed literature review using 3 academic databases and finding 221 number of studies from which it selected 32 studies.
Results: This study found out what and how NLP (natural language processing) techniques has been used to technical debt and code smell, what kind of datasets are used. The study found 32 studies in total. Most of the studies addressed Self-Admitted Technical debt, followed up by code smells such as feature envy and God class. The usage of NLP is tightly aligned with the textual nature of the data that needs to be used. The used NLP techniques consistent of pre-processing of textual data and embedding of data before employment in proposed methods to find, the most frequent usage being with tokenization, stop-words and stemming, with word embedding close result. The datasets used in studies were made up of few public datasets that are often used alongside specifically made dataset with some modification or as is. The study also found several different approaches and methods proposed for detection of technical debt and code smell, however only a few proposed tools.
Conclusion: This study found several studies that employed NLP and NLP techniques for finding or as a part of finding code smell and technical debt. Alongside several different kinds of code smell and technical debt, several datasets used by the studies and few proposed tools.
Kokoelmat
- Avoin saatavuus [34589]