摘要: | ?然語?處理進展快速運?多元如情緒解讀、破產預測,甚至利?推?預測股價,而股價預測這方面是研究重點之一。 過去相關研究中以消息?預測股價很多是利?情緒分析或是TF-IDF,?近期有?詞向量為前處理?式,那是否?加強前後?關聯性的句向量進?前處理更有優勢,本篇研究就是以歐美財?新聞在前處理利?句向量及詞向量再配合不同類神經模型訓練比較預測正確率的影響,更進?步以單?新聞來源去訓練並比對混合新聞來源及未混合的差異,也比較過去結論標題預測較內?預測為佳這個論點以不同技術確認是否依然,最後在詞向量會去除常?字?句向量因字詞順序及?法意義上的不同所以不需要進?,那在預測模型是否還是?樣或是去除之後有怎樣影響。實驗結果顯?在配合CNN下句向量略為詞向量優秀,?在新聞來源因其擴散性及接觸程度對預測有其影響,混合新聞來源對長期預測較為有?,短期預測則接觸廣的新聞網站較為優秀,?在新聞標題與內?比較上?與過去研究不?樣的顯?內?較為優秀,最後句向量若去除掉常?字其訓練效率會提昇,預測準確度會略微下降。 ;The natural language processing develops rapidly and be used in multiple purposes such as sentiment interpretation, bankruptcy prediction. Moreover, the twitter is used for stock price prediction, which is the main focus of research. In the past, relevant researches used financial news to predict stock prices by sentiment analysis or TF-IDF. Recently, word vectors have been used in related pre-processing methods. It uses sentence vectors to strengthen the contextual relevance of articles. This research extracts the sentence vector and word direction from European and American financial news. Then, the prediction accuracy produced by different types of deep learning models are compared. Particularly, the models are trained with single and multiple news sources individually. In addition, the feature representations by news headlines and news content are also compared. As a result, the word vector will remove the commonly used word sentence vector, which is because the difference in word orders and syntax meanings, so that there is no need to make it. In this case, the prediciton models based on the word vectors with and without sentence vectors are also compared. The experimental results show that the sentence vector under CNN performs slightly better than the word vector. On the other hand, the news sources have an impact on the prediction performance due to their spread and exposure. Mixed news sources are more useful for long-term forecasts, while short-term forecasts are exposed to a wide range of news websites. Compared with news headlines and contents, the models trained by news contents perform better, which are different from the findings of previous researches. Finally, if the last sentence vector removes the commonly used words, the training efficiency will increase, and the prediction accuracy will slightly decrease. |