Comparing techniques for aggregating interrelated replications in software engineering
Santos, Adrian; Juristo, Natalia (2018-10-11)
Adrian Santos and Natalia Juristo. 2018. Comparing techniques for aggregating interrelated replications in software engineering. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '18). ACM, New York, NY, USA, Article 8, 10 pages. DOI: https://doi.org/10.1145/3239235.3239239
© 2018 Association for Computing Machinery. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '18), https://doi.org/10.1145/3239235.3239239.
Context: Researchers from different groups and institutions are collaborating towards the construction of groups of interrelated replications. Applying unsuitable techniques to aggregate interrelated replications’ results may impact the reliability of joint conclusions.
Objectives: Comparing the advantages and disadvantages of the techniques applied to aggregate interrelated replications’ results in Software Engineering (SE).
Method: We conducted a literature review to identify the techniques applied to aggregate interrelated replications’ results in SE. We analyze a prototypical group of interrelated replications in SE with the techniques that we identified. We check whether the advantages and disadvantages of each technique—according to mature experimental disciplines such as medicine—materialize in the SE context.
Results: Narrative synthesis and Aggregation of p-values do not take advantage of all the information contained within the raw-data for providing joint conclusions. Aggregated Data (AD) meta-analysis provides visual summaries of results and allows assessing experiment-level moderators. Individual Participant Data (IPD) meta-analysis allows interpreting results in natural units and assessing experiment-level and participant-level moderators.
Conclusion: All the information contained within the raw-data should be used to provide joint conclusions. AD and IPD, when used in tandem, seem suitable to analyze groups of interrelated replications in SE.
- Avoin saatavuus