股價預測不管在財務、經濟或是資訊科技領域都是十分重要的研究議題,但是股價預測受到眾多因素的影響使得難以準確地預測,因此過去許多研究利用歷史股價之關鍵指標或是時間序列模型演算法以預測股價漲跌,近年來也有些研究使用社群媒體或是財經新聞透過文字探勘技術分析文本,並搭配機器學習與深度學習技術提升預測效能。目前現有研究雖有針對傳統的文字特徵表現進行比較,但在新興的自然語言處理技術發展下,較少與傳統常見的技術於股價預測領域進行全面性比較,而過去也較少研究針對不同的財經新聞來源資料進行探討,因此本研究利用財經新聞,比較了上述相關文字技術何種對於股價預測會有較佳之表現以及該技術於機器學習或是深度學習分類器上的影響,亦會針對不同新聞來源是否影響股價預測結果進行探討,並且更進一步地探討在股價預測研究議題上,不同訓練資料量比例對預測效能之影響。 本研究實驗結果顯示 AUC 表現最佳的實驗組合為(CNN+Word2vec),大部分結果約在 0.53 至 0.56 之間;Apple 公司以新聞來源 Reuters 有較好的表現,代表該新聞對於該公司較能反映出股價漲跌;而 Bank of America 則是以 The Motley Fool 為最佳,因此可以發現 The Motley Fool 在股價預測上也是不錯的新聞來源對象,也從中發現近年來平均股價變化較小的公司比平均股價變化較大的公司在不同新聞來源中均有較好的表現;於不同訓練資料量比例上之實驗結果顯示 AUC 隨著訓練資料量比例的增加,預測效能也有所提高,表現最佳為在訓練資料比例為 70% 或是 50% 時,代表在資料收集的年份上4至6年有不錯的表現。 ;Stock prediction has long been regarded as a very interesting and important research problem in finance, economic, information technology, etc. To accurately predict stock prices is difficult because there are various factors affecting stock prices. In the past, many studies predicted stock prices through some technical indicators and time series forecasting algorithms. In recent years, some studies utilized financial news to predict the stock trend by text mining and machine learning techniques. Despite many different text feature representation methods being used for stock prediction, there is no a comprehensive study of comparing different kinds of text mining techniques. Therefore, one major research objective of this thesis is to develop effective prediction models with different text representations for performance comparisons. Moreover, the impacts of using different news sources and different ratios of training data on the prediction models are also examined. The experiment results demonstrate that the combination of deep learning method by CNN with the text representation by Word2vec achieves the best results, and most of the average AUC results are between 0.53 and 0.56. Moreover, the news articles collected from The Motley Fool and Reuters are the better choices to predict stock trends than CNBC. The results show that the company having a smaller level of stock price changes performs better than the company having a larger level of stock price changes. We also find that using the higher training data ratios can produce the higher prediction performance in general. In particular, using either 70% or 50% of the training data in the eight-year duration can make the prediction models reach relatively higher performances.