旅館評論自動分析與歸納系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：7

、訪客IP：3.133.143.187

姓名

賴威廷(Wei-Ting Lai) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

旅館評論自動分析與歸納系統
(Automatic Analysis and Summarization System for Hotel Reviews)

相關論文

★ 影片指定對象臉部置換系統	★ 以單一攝影機實現單指虛擬鍵盤之功能
★ 基於視覺的手寫軌跡注音符號組合辨識系統	★ 利用動態貝氏網路在空照影像中進行車輛偵測
★ 以視訊為基礎之手寫簽名認證	★ 使用膚色與陰影機率高斯混合模型之移動膚色區域偵測
★ 影像中賦予信任等級的群眾切割	★ 航空監控影像之區域切割與分類
★ 在群體人數估計應用中使用不同特徵與回歸方法之分析比較	★ 以視覺為基礎之強韌多指尖偵測與人機介面應用
★ 在夜間受雨滴汙染鏡頭所拍攝的影片下之車流量估計	★ 影像特徵點匹配應用於景點影像檢索
★ 自動感興趣區域切割及遠距交通影像中的軌跡分析	★ 基於回歸模型與利用全天空影像特徵和歷史資訊之短期日射量預測
★ Analysis of the Performance of Different Classifiers for Cloud Detection Application	★ 全天空影像之雲追蹤與太陽遮蔽預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-14以後開放)

摘要(中)

隨著現在網路的快速發展，任何人都可以輕而易舉的在網路上留下評論，表達自己的觀點或感受。這些評論都是很重要的數據，但如果單純利用人工進行查詢、統計這些龐大的數據，顯然是困難且缺乏效率的。而透過自然語言處理分析可以快速的知道人們對產品、服務等具體意見。因此本篇論文實作一個旅館評論自動分析與歸納系統，能夠有效的取代以往人工查看評論尋找資訊的過程，以節省時間。
在旅館評論自動分析與歸納系統中，我們訓練了句子邊界檢測模型，改善在面對網路評論中常見的文法或標點符號錯誤時，基於統計模型或文法規則的方式難以處理的問題，提升了句子分割的效果。並透過訓練情緒分析、關鍵字提取模型，找出旅館中最多人提到的優、缺點，將使用者評論按照關鍵字分類，接著透過分群演算法統整相關的關鍵字。最後將成果以網頁呈現，讓使用者能便利的進行查詢。

摘要(英)

With the rapid development of the Internet nowadays, it′s easy for anyone to leave a comment on the web, expressing their own opinions or feelings. These comments are important data, but it would be difficult and inefficient to conduct manual inquiries and statistics on these large data set. Using Natural Language Processing technology, it is possible to quickly know people′s specific opinions on products, services, etc. Therefore, this paper implements an automatic analysis and summarization system for hotel reviews. It can effectively save the time of manually checking comments and looking for information.
In the automatic analysis and summarization system, we have trained a sentence boundary detection model. Improve the problem of grammatical or punctuation errors commonly found in online comments, which difficult to handle by statistical models or grammatical rules. Through training on sentiment analysis and keyword extraction models, we identify the most frequently mentioned strengths and weaknesses of hotels. The comments are categorized by keywords, and the clustering algorithm is used to organize the relevant keywords. Finally, the results are presented on a web page for users to conveniently make inquiries.

關鍵字(中)

★ 深度學習
★ 句子邊界檢測
★ 文字分析

關鍵字(英)

★ Deep learning
★ Sentence Boundary Detection
★ Text Analysis

論文目次

摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VI
第一章緒論 1
1.1 研究動機 1
1.2 研究背景 1
1.3 研究目的 2
1.4 論文架構 3
第二章相關背景與研究 4
2.1 詞嵌入(Word Embedding) 4
2.2 降維(Dimensionality Reduction) 5
2.2.1 主成分分析(Principal Component Analysis) 5
2.2.2 後處理演算法(Post-Processing Algorithm) 6
2.2.3 詞嵌入降維 7
2.3 句子邊界檢測(Sentence Boundary Detection)相關研究 8
2.4 情緒分析(Sentiment Analysis)相關研究 9
2.5 關鍵字提取(Keyword Extraction)相關研究 10
2.6 語言模型(Language Model)相關研究 12
2.6.1 ELMo 13
2.6.2 BERT 13
第三章系統說明與分析方法 17
3.1 系統架構介紹 17
3.2 句子邊界檢測 18
3.3 情緒分析 21
3.4 關鍵字提取 23
3.5 關鍵字合併 25
3.6 關鍵字分群 27
3.7 使用者介面 29
第四章實驗結果與分析 32
4.1 測試資料 32
4.2 評估指標 32
4.3 實驗結果與分析 34
4.3.1 實驗一: 不同模型對情緒分析之結果 34
4.3.2 實驗二: 不同模型對句子邊界檢測之結果 35
4.3.3 實驗三: 不同模型對關鍵字提取之結果 36
4.3.4 實驗四: 不同維度對關鍵字分群之影響 36
4.3.5 實驗五: 時間評估 39
第五章結論與未來研究方向 40
參考文獻 41

參考文獻

[1] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural Language Processing (Almost) from Scratch,” J. Mach. Learn. Res., vol. 12, pp. 2493--2537, 2011.
[2] E. F. T. K. Sang and F. D. Meulder, “Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition,” in Proceedings of CoNLL-2003 and the 7th Conference on Natural Language Learning, pp. 142--147, 2003.
[3] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:1606.05250, 2016.
[4] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A Neural Probabilistic Language Model,” J. Mach. Learn. Res., vol. 3, pp. 1137--1155, 2003.
[5] R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” in ICML, 2008.
[6] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” ICLR Workshop, 2013.
[7] V. Raunak, “Effective Dimensionality Reduction for Word Embeddings.,” CoRR, vol. abs/1708.03629, 2019.
[8] J. Mu and P. Viswanath, “All-but-the-Top: Simple and Effective Postprocessing for Word Representations.,” in ICLR (Poster), 2018.
[9] K. Pearson, “On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, vol. 2, no. 6, pp. 559–572, 1901.
[10] Gregory Grefenstette and Pasi Tapanainen , “What is a word, what is a sentence?: problems of Tokenisation.,” In Proceedings of the 3rd International Conference on Computational Lexicography (COMPLEX ′94), pp. 79-87, 1994.
[11] J. C. Reynar and A. Ratnaparkhi, “A Maximum Entropy Approach to Identifying Sentence Boundaries.,” in ANLP, pp. 16–19, 1997.
[12] “DeepSegment.” [Online]. Available:https://github.com/notAI-tech/deepsegment.
[13] T. Kiss and J. Strunk, “Unsupervised Multilingual Sentence Boundary Detection.,” Computational Linguistics, vol. 32, no. 4, pp. 485–525, 2006.
[14] C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.,” in ICWSM, 2014.
[15] R. Socher et al., “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the conference on empirical methods in natural language processing (EMNLP), vol. 1631, p. 1642, 2013.
[16] K. S. Tai, R. Socher, and C. D. Manning, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks.,” in ACL (1), pp. 1556–1566, 2015.
[17] S. D. Gollapalli and X.-L. Li, “Keyphrase Extraction using Sequential Labeling.,” CoRR, vol. abs/1608.00329, 2016.
[18] R. Alzaidy, C. Caragea, and C. L. Giles, “Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents.,” in WWW, pp. 2551–2557, 2019.
[19] D. Sahrawat et al., “Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings.,” in ECIR (2), vol. 12036, pp. 328–335, 2020.
[20] R. Mihalcea and P. Tarau, “TextRank: Bringing Order into Texts,” in Proceedings of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelonaand Spain, 2004.
[21] S. Rose, D. Engel, N. Cramer, and W. Cowley, “Automatic Keyword Extraction from Individual Documents,” in Text Mining. Applications and Theory, M. W. Berry and J. Kogan, Eds. John Wiley and Sons, Ltd, pp. 1--20, 2010.
[22] R. Campos, V. Mangaravite, A. Pasquali, A. M. Jorge, C. Nunes, and A. Jatowt, “A Text Feature Based Automatic Keyword Extraction Method for Single Documents.,” in ECIR, vol. 10772, pp. 684–691, 2018.
[23] J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text Classification,” in ACL, 2018.
[24] M. E. Peters et al., “Deep Contextualized Word Representations.,” in NAACL-HLT, pp. 2227–2237, 2018.
[25] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training.,” 2018.
[26] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv preprint arXiv:1810.04805, 2018.
[27] A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, pp. 5998–6008, Inc., 2017.
[28] P. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. 1, pp. 53--65, 1987.
[29] H. Wachsmuth, M. Trenkmann, B. Stein, G. Engels, and T. Palakarska, “A Review Corpus for Argumentation Analysis.,” in CICLing (2), vol. 8404, pp. 115–127, 2014.
[30] Z. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF Models for Sequence Tagging.,” CoRR, vol. abs/1508.01991, 2015.

指導教授

鄭旭詠(Hsu-Yung Cheng)

審核日期

2020-7-21

推文