Neural networks for multi-token code completion : a scoping review
Määttä, Samuli (2024-12-10)
Määttä, Samuli
S. Määttä
10.12.2024
© 2024 Samuli Määttä. Ellei toisin mainita, uudelleenkäyttö on sallittu Creative Commons Attribution 4.0 International (CC-BY 4.0) -lisenssillä (https://creativecommons.org/licenses/by/4.0/). Uudelleenkäyttö on sallittua edellyttäen, että lähde mainitaan asianmukaisesti ja mahdolliset muutokset merkitään. Sellaisten osien käyttö tai jäljentäminen, jotka eivät ole tekijän tai tekijöiden omaisuutta, saattaa edellyttää lupaa suoraan asianomaisilta oikeudenhaltijoilta.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202412107168
https://urn.fi/URN:NBN:fi:oulu-202412107168
Tiivistelmä
Context: Code completion is one of the most useful features of modern Integrated Development Environments (IDEs). Like the text autocompletion feature sometimes found in text editors and messaging apps, it can speed up the writing phase of programming. When the completion candidates are presented in a context-aware list format, the feature can also help in discovering half-remembered and unfamiliar pieces of functionality from diverse sources. Recently, tools that blur the line between traditional code completion and code generation have received widespread attention. Not restricted to completing partially written identifiers or predicting only the very next element (token), and not bound by the type or source of prediction, the generated code can introduce completely new variables and piece them together with operators and other language specific structures to form complete, semantically meaningful units of computation. Majority of these tools are powered by large language models, which calculate outputs not based on rules or reason, but on patterns and probabilities learned from large bodies of text, including source code repositories. This makes them highly unpredictable. The technology is as exciting, as it is immature.
Objective: The aim of this scoping review was to examine the state of research around the phenomenon of complex completions that can consist of multiple consecutive code elements. In this review, this phenomenon is called multi-token code completion.
Method: A scoping review was conducted, and the selected studies were classified according to their research types, research methods, modeling methods, completion granularity levels, evaluation metrics, evaluation granularity levels, and modeled programming languages.
Results: Although the field of research around neural network and deep learning-based code completion models is active, only a few of the studies analyzed during study selection attempted to push beyond predicting the very next code token, and only a few of the included studies present their findings in a way that would allow direct comparisons. The prediction accuracy is almost always reported as an aggregate score over all types of completions, without controlling for factors such as the length, number, frequency, type, or origin of the completed tokens, or the surrounding, contextual factors.
Conclusion: Since most of the research has taken place during the past few years, and since the aims, methods, models, and evaluation metrics of different studies vary quite a lot, no generalizable findings were found, or are likely to be found if a systematic review were to take place.
Objective: The aim of this scoping review was to examine the state of research around the phenomenon of complex completions that can consist of multiple consecutive code elements. In this review, this phenomenon is called multi-token code completion.
Method: A scoping review was conducted, and the selected studies were classified according to their research types, research methods, modeling methods, completion granularity levels, evaluation metrics, evaluation granularity levels, and modeled programming languages.
Results: Although the field of research around neural network and deep learning-based code completion models is active, only a few of the studies analyzed during study selection attempted to push beyond predicting the very next code token, and only a few of the included studies present their findings in a way that would allow direct comparisons. The prediction accuracy is almost always reported as an aggregate score over all types of completions, without controlling for factors such as the length, number, frequency, type, or origin of the completed tokens, or the surrounding, contextual factors.
Conclusion: Since most of the research has taken place during the past few years, and since the aims, methods, models, and evaluation metrics of different studies vary quite a lot, no generalizable findings were found, or are likely to be found if a systematic review were to take place.
Kokoelmat
- Avoin saatavuus [38841]