應用自然語言處理於 ISO 27001 法規遵循 文本的多標籤分類研究;A Multi-Label Classification Study on Applying Natural Language Processing to ISO 27001 Regulatory Compliance Texts

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/98243

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98243

題名:	應用自然語言處理於 ISO 27001 法規遵循文本的多標籤分類研究;A Multi-Label Classification Study on Applying Natural Language Processing to ISO 27001 Regulatory Compliance Texts
作者:	王致鈞;Wang, Chih-Chun
貢獻者:	資訊管理學系
關鍵詞:	自然語言處理;多標籤分類;法規遵循;ISO 27001;機器學習;深度學習;Natural Language Processing;Multi-Label Classification;Regulatory Compliance;ISO 27001;Machine Learning;Deep Learning
日期:	2025-07-07
上傳時間:	2025-10-17 12:32:01 (UTC+8)
出版者:	國立中央大學
摘要:	隨著數位轉型推進，企業愈加重視資訊安全，以確保資料資產安全、客戶信任與品牌形象。為因應日益嚴格的資訊安全標準與法規遵循要求，企業面臨大量繁雜法規文件，亟需高效管理與準確對應，傳統依賴人工檢視與標記的方法耗時且易有錯誤。因此，自然語言處理在法律文本分類領域的應用逐漸受到重視，成為提升法規遵循效率與準確性的可行途徑。現有法律文本分類多侷限於單標籤或多類別任務，且較少討論ISO 27001控制項；此外，資料不平衡與樣本稀少也會使分類效能受限。為解決上述問題，本研究聚焦於金融與法律領域中的ISO 27001多標籤文本分類，目標在於協助企業法遵人員以自動化方式，迅速且精確地將大量法規文件對應至資訊安全管理之多個控制項，以提升文件處理效率。本研究蒐集美國金融監理機關文件，並經由專家依ISO 27001控制項標記後，隨後通過比較不同特徵提取技術、分類技術和資料平衡技術組合，以尋求最適合於多標籤情境下之分類策略。實驗結果顯示，在未進行資料平衡處理的情況下，使用Word2Vec搭配隨機森林 (Random Forest, RF) 在 Micro-F1 (0.694)、Exact Match Ratio (EMR) (0.578) 上表現最佳。進一步引入資料平衡技術後，Word2Vec特徵結合Synthetic Minority Oversampling Technique或Random Oversampling技術與RF模型的組合，其整體分類效能於Micro-F1提高了3.0%，EMR提升了0.4%。此外，本研究對各控制項的單項分類表現進行評估，結果顯示「供應鏈關係」控制項的F1分數最高 (0.824)；「資訊安全之組織」控制項的Recall僅0.171，顯示該類別較難預測。綜合而言，本研究針對金融法規文本與ISO 27001標準之多標籤分類任務提出最佳方法組合，提升了分類準確性與整體效能，也為企業在資訊安全稽核、法規遵循與文件治理上提供了實務參考。;Driven by digital transformation, enterprises place growing emphasis on information security to safeguard data assets, sustain customer trust, and uphold brand reputation. To satisfy increasingly stringent security standards and compliance mandates, organizations must efficiently interpret vast and complex regulatory documents, yet manual review and tagging remain time-consuming and error-prone. Consequently, natural language processing (NLP) for legal text classification has emerged as a viable approach to enhance compliance efficiency and accuracy; however, existing work predominantly addresses single-label or multi-class tasks and rarely considers ISO 27001 controls, while data imbalance and sparse samples further constrain performance. Targeting these gaps, this study investigates ISO 27001 multi-label text classification within financial and legal contexts, aiming to help compliance professionals automatically map large-scale regulations to multiple information-security controls and thereby accelerate document processing. U.S. financial supervisory documents were collected and annotated by domain experts according to ISO 27001 controls, after which various combinations of feature extraction, classification models, and data-balancing techniques were compared to identify optimal strategies for multi-label scenarios. Experimental results reveal that, without balancing, Word2Vec features coupled with a Random Forest (RF) classifier achieved the best performance, yielding a Micro-F1 of 0.694 and an Exact Match Ratio (EMR) of 0.578. Introducing balancing methods further improved outcomes: combining Word2Vec with either the Synthetic Minority Oversampling Technique or Random Oversampling in conjunction with RF boosted Micro-F1 by 3.0 percentage points and EMR by 0.4 percentage points. Control-wise evaluation showed the “Supply-Chain Relationships” control attaining the highest F1 score (0.824), whereas the “Organization of Information Security” control recorded a recall of only 0.171, indicating prediction difficulty for that category. Overall, this study proposes an effective method portfolio for ISO 27001 multi-label classification of financial regulatory texts, enhancing accuracy and holistic performance, and thus offers practical guidance for enterprises in information-security auditing, compliance, and document governance.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	90	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....