摘要: | 現代獲取資訊的方式多且雜亂,由於現如今軟硬體的計算能力提高,及多種相關應用技術的發展,對大量資料擷取與分析變得容易許多,也因此數據分析相關的領域與研究逐漸受到重視。 為了處理這些非結構化的資訊,透過某些統計方法和演算法(像是本篇論文將使用的隱含狄利克雷分布與文字情感分析),將文字量化,轉換為有意義的數字,進而得到重要的參考資料或可供決策的數據。 以隱含狄利克雷分布建構文本模型,提取出其中的主題占比,以這些主題資訊通過文本情感分析再加以分析並以得出的結果依照文字的兩極性進行分類,判斷出此文字中表述的觀點是正面的、負面的、或是中性的評價。 以此得出的分類結果,依照文章領域的不同,可以用作不同用途,若是應用在金融市場,可以判斷出目前金融的趨勢,或是普遍對於大環境或特定產業等的看好度,在投資領域中,資訊落差是一項需要持續克服的難關,若以金融相關新聞帶進模型,藉由這些整理過的資料,期望取得可以幫助投資人加強判斷或做預測的基礎資訊。 本論文將以財經金融新聞作為實驗對象,使用網路爬蟲的方式,將大量的新聞文章內容等精準提取,以整理過的資訊再透過隱含狄利克雷模型與文本情感分析做出新聞標題與內容的資訊提取,賦予這些資訊情感級別,將其量化,最終再與同個時段的金融市場加權指數對比,驗證此實驗是否適用於此金融分析方式。;Nowadays, information acquisition has become diverse and chaotic. With the improvement of computing power in both hardware and software, as well as the development of various related application technologies, handling large amounts of data extraction and analysis has become much easier, leading to increased attention to data analysis-related fields and research. To deal with this unstructured information, certain statistical methods and algorithms (such as the Latent Dirichlet Allocation model and text sentiment analysis used in this paper) are employed to quantify text, converting it into meaningful numerical data, which can then be used as important reference information or data for decision-making. Using the Latent Dirichlet Allocation model to construct a text model, the proportions of the underlying topics are extracted. Through text sentiment analysis, the sentiment of the text is analyzed based on these topic information, and the results are classified according to the polarity of the text, determining whether the expressed viewpoint is positive, negative, or neutral. The classification results obtained in this way can be used for different purposes depending on the field of the article. For example, if applied in the financial market, it can determine the current trends in finance or the overall sentiment towards the general environment or specific industries. In the investment field, overcoming information disparity is a continuous challenge. By incorporating financial news into the model, it is hoped that investors can obtain basic information that can help strengthen judgment or make predictions. This paper will use financial news as the experimental subject. Through web crawling, a large amount of news article content will be accurately extracted. The organized information will then be subjected to Latent Dirichlet Allocation modeling and text sentiment analysis to extract information from news headlines and content, assigning sentiment levels to them, quantifying them, and finally comparing them with the financial market composite index of the same period to verify whether this experimental method is suitable for financial analysis. |