2026

SwiftMSeg: lightweight multi-scale local–global context modeling with transformer for medical image segmentation

Jahid Hasan Rony, Md Shakhawat Hossain & Fazlul Hasan Siddiqui

Scientific Reports (Nature)

View Publication DOI: https://doi.org/10.1038/s41598-026-56845-3

SwiftMSeg: lightweight multi-scale local–global context modeling with transformer for medical image segmentation

Abstract

Accurate medical image segmentation requires both fine boundary localization and robust contextual understanding, which is often difficult to achieve simultaneously, particularly in lightweight architectures. In this paper, we propose SwiftMSeg, a lightweight encoder–decoder framework that integrates a convolutional encoder, a transformer-based local–global–local module, and a hierarchical multi-scale decoder. The proposed framework addresses the boundary–context challenge by effectively combining progressive multi-scale refinement for fine boundary separation with global context modeling through long-range dependency aggregation. Extensive evaluations on publicly available colonoscopy, pathology, ultrasound, and magnetic resonance imaging datasets demonstrated the capability of SwiftMSeg to accurately segment diverse anatomical structures, ranging from tiny nuclei to polyps and large tumor regions. The model further demonstrated moderate domain-independent generalization on an external dataset, achieving Dice scores of 0.896 (colonoscopy), 0.860 (pathology), 0.850 (ultrasound), and 0.870 (MRI), consistently outperforming most baseline methods. In addition, it achieved improved boundary localization with lower Hausdorff distance (e.g., 16.43 in MRI and 33.89 in ultrasound) and reduced average symmetric surface distance, indicating more precise and stable segmentation. Statistical analysis further confirmed that the improvements of SwiftMSeg are significant () with large effect sizes across modalities, validated by both paired t-tests and Wilcoxon tests. Despite its strong performance, SwiftMSeg remains highly efficient, requiring only 4.48M parameters and 0.940 giga floating-point operations per second (GFLOPs), reducing computational cost by approximately 53 compared to the U-Net-based baselines (standard U-Net 31M parameters and 50 GFLOPs), while maintaining high segmentation accuracy. These results highlight the effectiveness of SwiftMSeg as a practical and scalable solution for real-world medical image segmentation across diverse modalities.

Citation

Jahid Hasan Rony, Md Shakhawat Hossain & Fazlul Hasan Siddiqui. "SwiftMSeg: lightweight multi-scale local–global context modeling with transformer for medical image segmentation." Scientific Reports (Nature) (2026).

BibTeX

@article{pub50_2026,
  title={SwiftMSeg: lightweight multi-scale local–global context modeling with transformer for medical image segmentation},
  author={Jahid Hasan Rony, Md Shakhawat Hossain & Fazlul Hasan Siddiqui},
  journal={Scientific Reports (Nature)},
  year={2026},
  doi={https://doi.org/10.1038/s41598-026-56845-3}
}