Assessing feature importance for forecasting soil moisture in subarctic regions using gridded historical and forecasted climate data
Saboori, Mojtaba; Ghag, Kedar Surendranath; Panchanathan, Anandharuban; Patro, Epari Ritesh; Haghighi, Ali Torabi (2025-04-27)
Saboori, Mojtaba
Ghag, Kedar Surendranath
Panchanathan, Anandharuban
Patro, Epari Ritesh
Haghighi, Ali Torabi
Elsevier
27.04.2025
Saboori, M., Ghag, K. S., Panchanathan, A., Patro, E. R., & Haghighi, A. T. (2025). Assessing feature importance for forecasting soil moisture in subarctic regions using gridded historical and forecasted climate data. Geoderma, 458, 117304. https://doi.org/10.1016/j.geoderma.2025.117304.
https://creativecommons.org/licenses/by/4.0/
© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
https://creativecommons.org/licenses/by/4.0/
© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
https://creativecommons.org/licenses/by/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202505163519
https://urn.fi/URN:NBN:fi:oulu-202505163519
Tiivistelmä
Abstract
Continuous monitoring of soil moisture (SM) is essential in precision agriculture for effective irrigation management. However, SM forecasting in subarctic environments remains relatively unexplored. In this study, we forecast SM at a 30-centimeter soil depth over a 7-day period using Random Forest (RF) model. Two scenarios were evaluated: (a) relying solely on historical data (HIST), and (b) using forecasted environmental data along with recent SM measurements to predict SM levels iteratively, integrating next-day forecasts with current SM data (FORENV). The input features included daily gridded climate data (air temperature-Tair, relative humidity-RH, wind speed-WS, precipitation-P, and reference evapotranspiration-ET0), soil-vegetation (SV) features (gridded soil temperature-Tsoil and Normalized Difference Vegetation Index-NDVI) and lagged SM values. These data were gathered from six sites under different land covers in subarctic regions (Finland-Tyrnava) over approximately two growing seasons (July or August 2022–September 2023), yielding about 430 daily observations per site. The analysis showed that FORENV outperformed HIST for up to four days into the forecast horizon, highlighting the value of including forecasted variables for improved accuracy during these initial lead times. Longer lead times proved more site-dependent, influenced by the stability of historical SM correlations. Pearson correlation and RF-based stepwise forward feature selection revealed that using only lagged SM data, or combining it with SV features, yielded the most accurate forecasts. For instance, at t + 7 and across all case studies combined, models incorporating LaggedSM_SV achieved the lowest RMSE (0.019 m3.m−3) and highest R2 (0.67), followed by All_inputs (RMSE: 0.022 m3.m−3, R2: 0.61), and LaggedSM (RMSE: 0.025 m3.m−3, R2: 0.46). Daily P and RH exhibited consistently low correlations with subsurface SM, likely due to near-saturated soil conditions in many subarctic sites that buffer infiltration and reduce immediate sensitivity to these parameters. Overall, our results demonstrate that robust SM forecasts can be achieved even with limited data, making this approach particularly valuable in subarctic regions with near-saturated soil conditions or other areas where climate and soil-vegetation data may be sparse.
Continuous monitoring of soil moisture (SM) is essential in precision agriculture for effective irrigation management. However, SM forecasting in subarctic environments remains relatively unexplored. In this study, we forecast SM at a 30-centimeter soil depth over a 7-day period using Random Forest (RF) model. Two scenarios were evaluated: (a) relying solely on historical data (HIST), and (b) using forecasted environmental data along with recent SM measurements to predict SM levels iteratively, integrating next-day forecasts with current SM data (FORENV). The input features included daily gridded climate data (air temperature-Tair, relative humidity-RH, wind speed-WS, precipitation-P, and reference evapotranspiration-ET0), soil-vegetation (SV) features (gridded soil temperature-Tsoil and Normalized Difference Vegetation Index-NDVI) and lagged SM values. These data were gathered from six sites under different land covers in subarctic regions (Finland-Tyrnava) over approximately two growing seasons (July or August 2022–September 2023), yielding about 430 daily observations per site. The analysis showed that FORENV outperformed HIST for up to four days into the forecast horizon, highlighting the value of including forecasted variables for improved accuracy during these initial lead times. Longer lead times proved more site-dependent, influenced by the stability of historical SM correlations. Pearson correlation and RF-based stepwise forward feature selection revealed that using only lagged SM data, or combining it with SV features, yielded the most accurate forecasts. For instance, at t + 7 and across all case studies combined, models incorporating LaggedSM_SV achieved the lowest RMSE (0.019 m3.m−3) and highest R2 (0.67), followed by All_inputs (RMSE: 0.022 m3.m−3, R2: 0.61), and LaggedSM (RMSE: 0.025 m3.m−3, R2: 0.46). Daily P and RH exhibited consistently low correlations with subsurface SM, likely due to near-saturated soil conditions in many subarctic sites that buffer infiltration and reduce immediate sensitivity to these parameters. Overall, our results demonstrate that robust SM forecasts can be achieved even with limited data, making this approach particularly valuable in subarctic regions with near-saturated soil conditions or other areas where climate and soil-vegetation data may be sparse.
Kokoelmat
- Avoin saatavuus [38320]