近年來,已經有許多基於編碼器的Neural Architecture Search的方法,這些方法通常先把候選架構編碼成圖(Graph),圖包含了架構中從輸入到輸出的每一層擷取特徵的操作,例如卷積層、池化層等等。這類基於圖的方法著重在捕捉候選架構的拓樸特徵,圖裡頭的每個節點表示一項操作,點之間的邊則表示特徵的傳遞流向,這種表示法直覺且合理,但這種表示法缺乏候選架構的高層次語義特徵,從而限制了編碼器方法的穩健性與泛化能力。上述的這個議題可以從若干現象觀察得出,例如這類NAS方法無法有效詮釋從未見過的操作,例如較新的self-attention,也難以從多搜尋空間的聯合訓練中獲益。為了克服上述限制,我們提出 CLUE-NAS(Contrastive Learnable Unifying Encoder for NAS),這是一個新穎的框架,結合了對比式的預訓練語言模型(Contrastive Language Image Pre-training, CLIP)中的文字編碼器,賴以生成富含語意的上下文嵌入,並透過對比學習將其與圖整合。CLUE-NAS還模仿了人類專家的行為,採用由粗至細(coarse-to-fine)的策略以強化預測效能。實驗結果顯示,在 NASBench-101、NASBench-201 及 NASBench-301等三個NAS領域的重要資料集上,CLUE-NAS不僅展現出對未見操作的高度泛化能力,亦能在多搜尋空間的聯合訓練中顯著受益,整體效能比肩甚至優於多項先進NAS方法。;Conventional encoder-based neural architecture search (NAS) methods typically encode candidate architectures as graphs based on their information flow and operations. Such graph-based embeddings primarily capture topological features, such as nodes and edges, while lacking high-level semantic representations, which limits the robustness and generalization of encoder-based NAS. This issue is evident in several phenomena, such as the inability of typical NAS methods to interpret previously unseen operations or their limited capacity to benefit from joint training across multiple search spaces. To mitigate these limitations, we propose Contrastive Learnable Unifying Encoder for NAS (CLUE-NAS), a novel framework that leverages the text encoder of Contrastive Language Image Pre-training (CLIP) to generate context embeddings enriched with high-level semantics and integrates them with graph-based embeddings through contrastive learning. CLUE-NAS further emulates human expert behaviors by employing a coarse-to-fine strategy to enhance performance. Experiments on NASBench-101, NASBench-201, and NASBench-301 show that CLUE-NAS not only demonstrates strong generalization to unseen operations but also benefits substantially from joint training, achieving competitive results against state-of-the-art NAS baselines.