由於影像中的文字提供了豐富的資訊,場景文字定位有助於擷取影像 中的感興趣區域。現今許多場景文字定位方法採用基於裁切的像素預測方 式,即將每個像素分類為特定類型,經常是文字類別與背景類別,再將屬於 文字的像素聚集成需要偵測的文字區域。像素預測方式的優點包括易於實 現、良好的性能以及應用的靈活性。然而,自然場景中的文字有著不同大小、 形狀及顏色,要正確地分離文句仍是具有挑戰的議題。本研究提出運用邊界 距離的方式來協助分割文字像素,以達成更精確的場景文字定位。我們的方 法可用於提取單一字元、單詞、文字串或具有相似紋理的圖案,同時也適用 於檢測以矩形、四邊形或任意形狀包圍的文字框。此外,文字標記的過程相 比於其他方法亦更為簡便。我們探討了網路架構、分類不平衡與後處理等議 題。實驗結果顯示此設計的可行性,證實其有助於改進基於裁切的場景文字 定位方法。;Scene text spotting helps to locate regions of interest in images as texts inside pictures often provide abundant information. Many existing schemes adopted the segmentation-based methodology, which classifies each pixel as a specific type, usually text or background. Major advantages of pixel prediction include easy to implement, good performance and flexibility. However, appropriately separating words in such schemes remains a challenging issue. This research investigates the use of distance to boundary for partitioning texts to achieve more accurate scene text spotting. The proposed scheme can be used to extract single characters, words, text-lines or objects with similar textures. It is also applicable to detecting texts bounded by rectangles, quadrilaterals or boxes with arbitrary shapes. The labeling process is relatively efficient. The issues of network architecture, categorical imbalance and post-processing are discussed. The experimental results demonstrate the feasibility of the proposed design, which can help to improve segmentation-based scene-text spotting approaches.