YOLO-SBA: A Multi-Scale and Complex Background Aware Framework for Remote Sensing Target Detection
Yuan, Yifei; Wei, Yingmei; Zhou, Xiaoyan; Guo, Yanming; Chen, Jiangming; Jiang, Tingshuai (2025-06-09)
Yuan, Yifei
Wei, Yingmei
Zhou, Xiaoyan
Guo, Yanming
Chen, Jiangming
Jiang, Tingshuai
MDPI
09.06.2025
Yuan, Y., Wei, Y., Zhou, X., Guo, Y., Chen, J., & Jiang, T. (2025). YOLO-SBA: A Multi-Scale and Complex Background Aware Framework for Remote Sensing Target Detection. Remote Sensing, 17(12), 1989. https://doi.org/10.3390/rs17121989
https://creativecommons.org/licenses/by/4.0/
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
https://creativecommons.org/licenses/by/4.0/
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
https://creativecommons.org/licenses/by/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202507015038
https://urn.fi/URN:NBN:fi:oulu-202507015038
Tiivistelmä
Abstract
Remote sensing target detection faces significant challenges in handling multi-scale targets, with the high similarity in color and shape between targets and backgrounds in complex scenes further complicating the detection task. To address this challenge, we propose a multi-Scale and complex Background Aware network for remote sensing target detection, named YOLO-SBA. Our proposed YOLO-SBA first processes the input through the Multi-Branch Attention Feature Fusion Module (MBAFF) to extract global contextual dependencies and local detail features. It then integrates these features using the Bilateral Attention Feature Mixer (BAFM) for efficient fusion, enhancing the saliency of multi-scale target features to tackle target scale variations. Next, we utilize the Gated Multi-scale Attention Pyramid (GMAP) to perform channel–spatial dual reconstruction and gating fusion encoding on multi-scale feature maps. This enhances target features while finely suppressing spectral redundancy. Additionally, to prevent the loss of effective information extracted by key modules during inference, we improve the downsampling method using Asymmetric Dynamic Downsampling (ADDown), maximizing the retention of image detail information. We achieve the best performance on the DIOR, DOTA, and RSOD datasets. On the DIOR dataset, YOLO-SBA improves mAP by 16.6% and single-category detection AP by 0.8–23.8% compared to the existing state-of-the-art algorithm.
Remote sensing target detection faces significant challenges in handling multi-scale targets, with the high similarity in color and shape between targets and backgrounds in complex scenes further complicating the detection task. To address this challenge, we propose a multi-Scale and complex Background Aware network for remote sensing target detection, named YOLO-SBA. Our proposed YOLO-SBA first processes the input through the Multi-Branch Attention Feature Fusion Module (MBAFF) to extract global contextual dependencies and local detail features. It then integrates these features using the Bilateral Attention Feature Mixer (BAFM) for efficient fusion, enhancing the saliency of multi-scale target features to tackle target scale variations. Next, we utilize the Gated Multi-scale Attention Pyramid (GMAP) to perform channel–spatial dual reconstruction and gating fusion encoding on multi-scale feature maps. This enhances target features while finely suppressing spectral redundancy. Additionally, to prevent the loss of effective information extracted by key modules during inference, we improve the downsampling method using Asymmetric Dynamic Downsampling (ADDown), maximizing the retention of image detail information. We achieve the best performance on the DIOR, DOTA, and RSOD datasets. On the DIOR dataset, YOLO-SBA improves mAP by 16.6% and single-category detection AP by 0.8–23.8% compared to the existing state-of-the-art algorithm.
Kokoelmat
- Avoin saatavuus [38841]