Variable Selection and Grouping for Large-scale Data-driven Modelling
Juuso, Esko K. (2022-03-31)
Juuso, Esko K.
Linköping university electronic press
31.03.2022
Juuso, E. K. (2022). Variable selection and grouping for large-scale data-driven modelling. The First SIMS EUROSIM Conference on Modelling and Simulation, SIMS EUROSIM 2021, and 62nd International Conference of Scandinavian Simulation Society, SIMS 2021, September 21-23, Virtual Conference, Finland, 38–45. https://doi.org/10.3384/ecp2118538
https://creativecommons.org/licenses/by/4.0/
Copyright (c) 2022 Esko K. Juuso. This work is licensed under a Creative Commons Attribution 4.0 International License.
https://creativecommons.org/licenses/by/4.0/
Copyright (c) 2022 Esko K. Juuso. This work is licensed under a Creative Commons Attribution 4.0 International License.
https://creativecommons.org/licenses/by/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202602171830
https://urn.fi/URN:NBN:fi:oulu-202602171830
Tiivistelmä
Abstract
For large-scale systems, the number of possible variable combinations becomes very large. Variable grouping means finding feasible groups of variables for modelling. Systems can be divided into subsystems but even then the number of available variables is often impractically high to be used with the data-based methods. Interactive variable selection and grouping by comparing the performance of the model alternatives is a good solution if there are not too many variables. This paper describes possibilities of variable selection in large-scale industrial systems. It classifies the variable selection and grouping into four categories: knowledge-based grouping, grouping with data analysis, decomposition, and model-based grouping and selection. The data analysis part consists of correlation analysis and handling of high dimension data with principal components. These originally linear methodologies were extended to nonlinear systems by using the nonlinear scaling approach. Decomposition can be realised with various clustering methods or learning with case-based reasoning. The multimodel systems are handled with fuzzy set systems. Numerous studies based on linear multivariate statistical modelling have been reported in literature. The methodologies approaches have been tested in several applications: bioprocesses, continuous brewing, condition monitoring, web break sensitivity analysis and wastewater treatment. Industrial process data, a pilot system and a test rig were used in the analysis. Uncertainty handling is a part of the analysis method: uncertainty is represented with the degrees of membership.
For large-scale systems, the number of possible variable combinations becomes very large. Variable grouping means finding feasible groups of variables for modelling. Systems can be divided into subsystems but even then the number of available variables is often impractically high to be used with the data-based methods. Interactive variable selection and grouping by comparing the performance of the model alternatives is a good solution if there are not too many variables. This paper describes possibilities of variable selection in large-scale industrial systems. It classifies the variable selection and grouping into four categories: knowledge-based grouping, grouping with data analysis, decomposition, and model-based grouping and selection. The data analysis part consists of correlation analysis and handling of high dimension data with principal components. These originally linear methodologies were extended to nonlinear systems by using the nonlinear scaling approach. Decomposition can be realised with various clustering methods or learning with case-based reasoning. The multimodel systems are handled with fuzzy set systems. Numerous studies based on linear multivariate statistical modelling have been reported in literature. The methodologies approaches have been tested in several applications: bioprocesses, continuous brewing, condition monitoring, web break sensitivity analysis and wastewater treatment. Industrial process data, a pilot system and a test rig were used in the analysis. Uncertainty handling is a part of the analysis method: uncertainty is represented with the degrees of membership.
Kokoelmat
- Avoin saatavuus [42834]

