本研究為使用資訊安全相關之關鍵字收集Twitter平台之內容後利用多層級雙向編碼技術(Bidirectional Encoder Representations from Transformers, BERT)及進行微調,再以命名實體標籤識別出電腦軟硬體之廠商、系統名稱、版本、威脅等專有名詞,並以此與現有之電腦環境進行比對並發送預警訊息給使用者或管理者,以達到即時偵測及告警之目的。
本研究並與其他學者所提出之方法進行比較,實驗結果顯示本研究所採用之BERT優於多位學者曾提出之CNN+BiLSTM機器學習方法,本研究之方法於Precision, Recall, F1 Score皆可達到96%以上,且可依據上下文正確識別出未在訓練集內之單詞,以達到正確標示及即時預警之目的。;Continuously promoting the awareness of cybersecurity threats and establishing the preventive methods are important measures to ensure the cyber security for an enterprise.
Cybersecurity experts in the enterprise must be able to sense the newest vulnerabilities and threats in the virtual environment. The information identifying and collecting process relies on the source range the experts hold and the work efficiency of the personnel, in which the data is received passively and time consuming.
With the development of social media and the open-source intelligence such as Twitter, brings the instant updates and concern of cybersecurity to the public, and its immediacy and the post amount on the platform are expected to make up for the lack of sources and handling efficiency.
This research is expected to provide notification for users and managers to early response measures by collecting cybersecurity information on Twitter and through machine learning to identify related entity of software or hardware, and compared with the current virtual environment.
This research collects keywords of cybersecurity on Twitter and being processed by the BERT (Bidirectional Encoder Representations from Transformers) for named entity recognition to identify vendor, software, version and relevant term, and compare with the existing environment to send the warning message for users and managers to achieve the purpose of real-time detection and warning.
In this research, F1-Score is 0.96 and it is superior to CNN+BiLSTM, and BERT can correctly identify words that are not in the training set according to the context, to achieve the purpose of correct identity and immediate warning.