Hyppää sisältöön
    • FI
    • ENG
  • FI
  • /
  • EN
OuluREPO – Oulun yliopiston julkaisuarkisto / University of Oulu repository
Näytä viite 
  •   OuluREPO etusivu
  • Oulun yliopisto
  • Avoin saatavuus
  • Näytä viite
  •   OuluREPO etusivu
  • Oulun yliopisto
  • Avoin saatavuus
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Natural language or not (NLON) : a package for software engineering text analysis pipeline

Mäntylä, Mika V.; Calefato, Fabio; Claes, Maëlick (2018-05-28)

 
Avaa tiedosto
nbnfi-fe201901041335.pdf (400.0Kt)
nbnfi-fe201901041335_meta.xml (31.10Kt)
nbnfi-fe201901041335_solr.xml (28.56Kt)
Lataukset: 

URL:
https://doi.org/10.1145/3196398.3196444

Mäntylä, Mika V.
Calefato, Fabio
Claes, Maëlick
Association for Computing Machinery
28.05.2018

Mika V. Mäntylä, Fabio Calefato, and Maelick Claes. 2018. Natural language or not (NLON): a package for software engineering text analysis pipeline. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR '18). ACM, New York, NY, USA, 387-391. DOI: https://doi.org/10.1145/3196398.3196444

https://rightsstatements.org/vocab/InC/1.0/
© 2018 Copyright held by the owner/author(s). Publication rights licensed to Association for Computing Machinery. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 15th International Conference on Mining Software Repositories (MSR '18). DOI: https://doi.org/10.1145/3196398.3196444.
https://rightsstatements.org/vocab/InC/1.0/
doi:https://doi.org/10.1145/3196398.3196444
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe201901041335
Tiivistelmä

Abstract

textual information to separate natural language from other information, such as log messages, that are often part of the communication in software engineering. We present a simple approach for classifying whether some textual input is natural language or not. Although our NLoN package relies on only 11 language features and character tri-grams, we are able to achieve an area under the ROC curve performances between 0.976–0.987 on three different data sources, with Lasso regression from Glmnet as our learner and two human raters for providing ground truth. Cross-source prediction performance is lower and has more fluctuation with top ROC performances from 0.913 to 0.980. Compared with prior work, our approach offers similar performance but is considerably more lightweight, making it easier to apply in software engineering text mining pipelines. Our source code and data are provided as an R-package for further improvements.

Kokoelmat
  • Avoin saatavuus [38358]
oulurepo@oulu.fiOulun yliopiston kirjastoOuluCRISLaturiMuuntaja
SaavutettavuusselosteTietosuojailmoitusYlläpidon kirjautuminen
 

Selaa kokoelmaa

NimekkeetTekijätJulkaisuajatAsiasanatUusimmatSivukartta

Omat tiedot

Kirjaudu sisäänRekisteröidy
oulurepo@oulu.fiOulun yliopiston kirjastoOuluCRISLaturiMuuntaja
SaavutettavuusselosteTietosuojailmoitusYlläpidon kirjautuminen