Multi-modal adaptive sensor fusion for UAS odometry : a deep learning approach with hybrid temporal modeling
Rathnayaka, Kavinda (2025-06-16)
Rathnayaka, Kavinda
K. Rathnayaka
16.06.2025
© 2025 Kavinda Rathnayaka. Ellei toisin mainita, uudelleenkäyttö on sallittu Creative Commons Attribution 4.0 International (CC-BY 4.0) -lisenssillä (https://creativecommons.org/licenses/by/4.0/). Uudelleenkäyttö on sallittua edellyttäen, että lähde mainitaan asianmukaisesti ja mahdolliset muutokset merkitään. Sellaisten osien käyttö tai jäljentäminen, jotka eivät ole tekijän tai tekijöiden omaisuutta, saattaa edellyttää lupaa suoraan asianomaisilta oikeudenhaltijoilta.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202506164510
https://urn.fi/URN:NBN:fi:oulu-202506164510
Tiivistelmä
Accurate motion estimation is critical for autonomous unmanned aerial systems (UAS) in global navigation satellite system (GNSS) denied environments. This thesis presents a deep learning framework for six degrees of freedom (6-DOF) pose estimation by integrating RGB cameras, depth sensors, and inertial measurement units (IMU) through adaptive multi-modal sensor fusion.
The architecture comprises two core components: a visual-inertial-depth fusion layer and a hybrid temporal module. The fusion layer achieves adaptive sensor fusion through cross-modal attention and learned importance weighting, dynamically prioritizing the most informative sensors for pose prediction. Poses are predicted using a hybrid temporal modeling unit combining self-attention mechanisms with long short-term memory (LSTM) networks. The system is trained and evaluated on the Mid-Air dataset, incorporating diverse environmental conditions including sunny, cloudy, foggy, and sunset (low-light) scenarios to ensure resilience against varying weather conditions. Experimental evaluation demonstrates competitive performance with absolute trajectory error (ATE) of 0.2032 meters, relative pose error of 0.0821 meters for translation (RPE_trans) and 0.0116 radians for rotation (RPE_rot). The proposed multi-modal approach achieves an 85% improvement in ATE compared to single-modal baselines. The adaptive fusion demonstrates robust performance under sensor degradation, with a 74% degradation in foggy conditions while maintaining functionality. The hybrid temporal modeling outperforms individual LSTM or attention-only approaches, indicating strong potential for real-world UAS deployment.
The architecture comprises two core components: a visual-inertial-depth fusion layer and a hybrid temporal module. The fusion layer achieves adaptive sensor fusion through cross-modal attention and learned importance weighting, dynamically prioritizing the most informative sensors for pose prediction. Poses are predicted using a hybrid temporal modeling unit combining self-attention mechanisms with long short-term memory (LSTM) networks. The system is trained and evaluated on the Mid-Air dataset, incorporating diverse environmental conditions including sunny, cloudy, foggy, and sunset (low-light) scenarios to ensure resilience against varying weather conditions. Experimental evaluation demonstrates competitive performance with absolute trajectory error (ATE) of 0.2032 meters, relative pose error of 0.0821 meters for translation (RPE_trans) and 0.0116 radians for rotation (RPE_rot). The proposed multi-modal approach achieves an 85% improvement in ATE compared to single-modal baselines. The adaptive fusion demonstrates robust performance under sensor degradation, with a 74% degradation in foggy conditions while maintaining functionality. The hybrid temporal modeling outperforms individual LSTM or attention-only approaches, indicating strong potential for real-world UAS deployment.
Kokoelmat
- Avoin saatavuus [38865]