Performance Improvement Strategy for Few-shot Semantic Segmentation Assisted by Large Language Models
Abstract
Few-shot semantic segmentation is a computer vision technology that is used to segment pixel-level objects in images segment a small number into labeled samples. Traditional methods have limited generalization capability, are sensitive to background interference, susceptible to class imbalance, and have inadequate feature representation. This study explored the use of Large Language Models (LLMs) for few-shot semantic segmentation tasks. Using ChatGLM (Chat Generative Language Model), with its powerful semantic understanding and feature extraction, Named Entity Recognition (NER) with image context was achieved. The NER results were used to enhance few-shot semantic segmentation, improving its ability to process and understand image semantics. Experimental results showed significant improvements in the mean Intersection over Union (mIoU) evaluation metric using a few-shot semantic segmentation model assisted by LLMs. The small-sample semantic segmentation model assisted by the large language model has a higher mIoU ratio scores on 5 validation sets than the SETR (Segmentation Transformer) model and Mask R-CNN (Mask Region-based Convolutional Neural Network), indicating that the application of the large language model effectively improves the accuracy and generalization ability of small-sample semantic segmentation.
Keywords: few-shot semantic segmentation, large language models, named entity recognition, mean intersection over union, feature extraction
Cite As
X. Han, S. Zhang, Y. Li, " Performance Improvement Strategy for Few-shot Semantic Segmentation
Assisted by Large Language Models", Engineering Intelligent Systems, vol. 33 no. 3, pp. 329-338, 2025.