SheafAlign: A Sheaf-theoretic Framework for Decentralized Multimodal Alignment
Ghalkha, Abdulmomen; Tian, Zhuojun; Issaid, Chaouki Ben; Bennis, Mehdi (2026-02-09)
Ghalkha, Abdulmomen
Tian, Zhuojun
Issaid, Chaouki Ben
Bennis, Mehdi
IEEE
09.02.2026
A. Ghalkha, Z. Tian, C. B. Issaid and M. Bennis, "SheafAlign: A Sheaf-Theoretic Framework for Decentralized Multimodal Alignment," in IEEE Communications Letters, vol. 30, pp. 1175-1179, 2026, doi: 10.1109/LCOMM.2026.3663044
https://creativecommons.org/licenses/by/4.0/
© 2026 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
https://creativecommons.org/licenses/by/4.0/
© 2026 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
https://creativecommons.org/licenses/by/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202602201895
https://urn.fi/URN:NBN:fi:oulu-202602201895
Tiivistelmä
Abstract
Conventional multimodal alignment methods assume mutual redundancy across all modalities, an assumption that fails in real-world distributed scenarios. We propose SheafAlign, a sheaf-theoretic framework for decentralized multimodal alignment that replaces single-space alignment with multiple comparison spaces. This approach models pairwise modality relations through sheaf structures and leverages decentralized contrastive learning-based objectives for training. SheafAlign overcomes the limitations of prior methods by not requiring mutual redundancy among all modalities, preserving both shared and unique information. Experiments on multimodal sensing datasets show superior zero-shot generalization, cross-modal alignment, and robustness to missing modalities, with 34% lower communication cost than state-of-the-art baselines.
Conventional multimodal alignment methods assume mutual redundancy across all modalities, an assumption that fails in real-world distributed scenarios. We propose SheafAlign, a sheaf-theoretic framework for decentralized multimodal alignment that replaces single-space alignment with multiple comparison spaces. This approach models pairwise modality relations through sheaf structures and leverages decentralized contrastive learning-based objectives for training. SheafAlign overcomes the limitations of prior methods by not requiring mutual redundancy among all modalities, preserving both shared and unique information. Experiments on multimodal sensing datasets show superior zero-shot generalization, cross-modal alignment, and robustness to missing modalities, with 34% lower communication cost than state-of-the-art baselines.
Kokoelmat
- Avoin saatavuus [43406]

