資料視覺化在社群媒體平台主題偵測與追蹤的應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：70

、訪客IP：18.188.249.160

姓名

侯貫中(Kuan-Chung Hou) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

資料視覺化在社群媒體平台主題偵測與追蹤的應用

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著社群媒體的興起，使用者願意在平台上以不同的形式表達立場、評論觀點及分享貼文。社群媒體強調其訊息的即時傳播性，導致串流不斷地產生，使用者如何更快速的從這樣大量的資訊中，瞭解目前熱門的主題、使用者關注的事件等，變成一大挑戰及困難。其中，應用在社群媒體中進行主題偵測與追蹤(Topic Detection and Tracking, TDT)變成一大熱門的研究項目。傳統的TDT研究主要針對結構化高的文章，如新聞文章等，本研究以Facebook作為研究平台，針對公開粉絲專頁的短貼文進行主題偵測與追蹤的研究。

本研究的研究目的為讓使用者更快速地掌握主題之下的事件，並透過資料視覺化的呈現，來將設計的架構以故事劃分、源頭故事偵測、群集偵測、追蹤及故事鏈結偵測，五個主題偵測及追蹤系統應具備的能力，做新聞實例的探討並解釋其商業用途。本研究主要將系統流程區分為三個階段。資料蒐集與擷取：透過Facebook Graph API抓取公開粉絲專頁的貼文資訊，並以關鍵字比對的方式將貼文映射到特定主題；資料分析：透過Incremental TF-DF來抓取貼文的核心特徵字詞並且避免字詞維度過高的問題，接著，透過k-medoids文件分群技術及自適應決定分群數目的演算法來達到自動分群辨別出事件；資料呈現：透過群集分析以及資料視覺化的技術來針對分析結果做大規模呈現。

摘要(英)

As the rise of social media, people are more willing to declare their position, give comments and share others’ posts on the platform. Social medias emphasize information immediacy, which leads to stream generate constantly. As a result, how users know the hot topics and the events users interest becomes a difficult challenge. In particular,“Topic Detention and Tracking”(TDT) becomes a popular research project applied on social medias. Traditional TDT research mainly focused on high structured articles, e.g., news articles. This research takes Facebook as the research platform and use “Topic Detention and Tracking” to discuss the short-text documents on the public fan page.

The primary purpose of the research is to allow users to realize events of topics through data visualization using five major themes of detections: story segmentation, first story detection, topic tracking, topic detection, and link detection. The application and capability of these detections and tracking system will then be used for discussion of news and explanation of its commercial purposes. This research divides the system procedure to three stages. The first is data collection and catch, which get the posts information on the public fan pages through the Facebook Graph API and map the posts to certain topic through the keyword mapping. The second stage is data analysis, which get the keywords from the posts by Incremental TF-DF and avoid the problem of excessive term dimension. Then, through the document clustering technology, k-medoids, and the auto-decide clustering numbers algorithm to achieve auto-clustering distinguish events. The third stage is data visualization, which through clustering analysis and data visualization technology to visualize the analysis result in a large scale.

關鍵字(中)

★ 主題偵測與追蹤
★ 資料視覺化
★ 中文語言處理
★ Facebook
★ TF-IDF
★ k-medoids

關鍵字(英)

★ Topic Detection and Tracking
★ Data visualization
★ Chinese natural language processing
★ Facebook
★ TF-IDF
★ k-medoids

論文目次

摘要 i
Abstract v
致謝 vi
目錄 vii
圖目錄 ix
表目錄 xi
一、緒論 1
1-1 研究背景 1
1-2 研究動機 2
1-3 研究目的 5
二、相關研究 6
2-1 主題偵測與追蹤 6
2-2 短文件故事的處理 7
2-2-1 文件基底法 7
2-2-2 特徵基底法 8
2-2-3 機率主題模型 9
2-3 OpView社群觀測平台 10
2-3-1 關鍵字風暴圖 11
三、系統架構 12
3-1 系統概念與流程 12
3-2 資料搜集與擷取 13
3-2-1 貼文評分 13
3-2-2 事件處理 14
3-3 資料分析 17
3-3-1 Jieba中文斷詞程式 17
3-3-2 文件特徵萃取 18
3-3-3 字詞的語義相似度 20
3-3-4 文件的相似度 23
3-3-5 k-medoids分群法 25
3-4 資料呈現 27
3-4-1 分群關鍵字標定 27
3-4-2 資料視覺化 28
四實驗結果與討論 39
4-1 評估方法 39
4-2 資料集 40
4-3 特徵選取字詞門檻數 41
4-4 同義詞過濾參數 41
4-5 主題自動分群參數 42
4-6 實驗1：系統參數配置 42
4-7 實驗2：系統執行效率比較 44
4-8 實驗3：Word2Vec語料庫對系統表現影響 46
五結論與未來研究方向 48
5-1 結論 48
5-2 研究限制 48
5-3 未來研究方向 49
文獻探討 50
英文文獻 50
中文文獻 53

參考文獻

英文文獻
［1］ Kemp S., “FUTURE FACTORS”, October 11, 2016, available at http://kepios.com/blog/2016/10/11/future-factors
［2］ Travers J., Milgram S., “An Experimental Study of the Small World Problem”, Sociometry, Vol. 32, No. 4, pp. 425-443, December 1969.
［3］ Bhagat S., Burke M., Diuk C., Fillz I. O., Edunov S., “Three and a half degrees of separation”, February 4 2016, available at https://research.facebook.com/blog/three-and-a-half-degrees-of-separation/
［4］ Kincaid J., “EdgeRank: The secret sauce that makes Facebook’s news feed tick”, Techcrunch, April 22, 2010, available at http://techcrunch.com/2010/04/22/facebook-edgerank.
［5］ Bucher T,, “Want to be on the top? Algorithmic power and the threat of invisibility on Facebook”, New Media & Society, Vol. 14, Issue 7, pp. 1164-1180, April 2012.
［6］ Weber M. S., Monge P., “The flow of digital news in a network of sources, authorities, and hubs”, Journal of Communication, Vol. 61, Vol. 6, Issue 6, pp.1062-1081, December 2011.
［7］ Long M. C., Noor Al-Deen H. S., Hendricks J. A. (Eds), Social Media: Usage and Impact, “Beyond the press release: Social media as a tool for consumer engagement”, Lanham, ML: Lexington Books, pp 145-149, 2012.
［8］ Allan J., Lavrenko V., Malin D., Swan R., 2000, “Detections, bounds, and timelines: UMass and TDT-3”, Proceedings of Topic Detection and Tracking Workshop, pp. 167–174, 2000.
［9］ Shiravi H., Shiravi A., Ghorbani A. A., “A survey of visualization systems for network security”, IEEE Transactions on Visualization and Computer Graphics, Vol. 18, No. 8, pp. 1313-1329, 2012
［10］ Fiscus G., Doddington G. R., Allan J. (Ed), Topic Detection and Tracking, Kluwer Academic Publishers, Norwell, MA, USA, pp. 17–31, February 2002.
［11］ Zheng Y., Meng Z., Xu C., “A Short-Text Oriented Clustering Method for Hot Topics Extraction”, International Journal of Software Engineering and Knowledge Engineering, Vol. 25, Issue 3, pp. 453, April 2015.
［12］ Kaleel S. B., Abhari A., “Cluster-discovery of Twitter messages for event detection and trending,” Journal of Computation Science, Vol. 6, pp. 45-57, January 2015.
［13］ Petkos G., Papadopoulos S., Aiello L., Skeaba R., Kompatsiaris Y., “A soft frequent pattern mining approach for textual topic detection”, Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics, No. 25, June 2014.
［14］ Gaglio S., Re G. L., Morana M., “A framework for real-time Twitter data analysis”, Computer Communications, Vol. 73, Part B, pp. 236-242, January 2016.
［15］ Song M., Kim M. C., Jeong Y. K., “Analyzing the Political Landscape of 2012 Korean Presidential Election in Twitter”, Intelligent System, IEEE, Vol. 29, Issue 2, pp. 18-26, March 2014.
［16］ Cleary I., “Facebook Analytics: The Only Guide You’ll Ever Need”, RAZORSOCIAL, June 9, 2017, available at http://www.razorsocial.com/facebook-analytics-reference-guide/.
［17］ Christopher H., “Brands Favor Social Shares Over Likes”, ADWEEK, April 1, 2013, available at http://www.adweek.com/news/advertising-branding/brands-favor-social-shares-over-likes-148256.
［18］ Fung G. P. C., Yu J. X. Y., Yu P. S., Lu H., “Parameter free bursty events detection in text streams”, Proceeding of the VLDB: 31st Int. Conf. Very Large Data Bases, pp. 181–192, August 2005.
［19］ Yang Y., Pierce T., Carbonell J., “A study of retrospective and on-line event detection”, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, pp. 28–36, August 1998.
［20］ Brants T., Chen F., Farachar A., “A system for new event detection”, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 330-337, August 2003.
［21］ Cilibrasi R. L., Vitanyi P., “The google similarity distance”, IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No.3, pp. 370-383, March 2007.
［22］ Makrehchi M., Kamel M. S., “Automatic Taxonomy Extraction Using Google and Term Dependency”, IEEE/WIC/ACM International Conference on Web Intelligence, pp. 321-325, 2007.
［23］ Woon W. L., Madnick S., “Asymmetric information distances for automated taxonomy construction”, Knowledge and information systems, Vol. 21, Vol. 1, Issue 1, pp. 91-111, October 2009.
［24］ Mikolov T., Chen K., Corrado G., et al., “Efficient Estimation of Word Representations in Vector Space”, Computer Science, pp. 28-36, Jan 2013.
［25］ Li Y., McLean D., Bandar Z. A., O’Shea J. D., Crockett L., “Sentence Similarity Based on Semantic Nets and Corpus Statistics”, IEEE Transaxtions on Knowledge and Data Engineering, Vol. 18, Issue 8, pp 1138-1150, June 2006.
［26］ Kaufman L., Rousseeuw P. J., “Clustering by means of Medoids.,” pp. 405–416, 1987.
中文文獻
［27］傅珮雯，「Facebook 網站上口碑行為之研究」，國立中山大學，企業管理學系碩士論文，民國100年。
［28］ Fukuball，結巴中文分詞，jieba-0.25，取自 https://github.com/fukuball/jieba-php。
［29］唐鳳，萌典，取自 https://www.moedict.tw/about.html。
［30］中研院，中文斷詞系統，取自 http:// ckipsvr.iis.sinica.edu.tw/。
［31］鄭奕駿，「離線搜尋 Wikipedia 以縮減 NGD 運算時間之研究」，國立中央大學，資訊管理學系碩士論文，民國101年。
［32］ Word2Vec中的數學原理詳解，取自http://blog.csdn.net/itplus/article/details/37969519。
［33］郭海蓉、張暉，「增量劇類在動太多文檔摘要中的研究與應用」，中國西南科技大學，西元2012年。
［34］林熙禎、侯貫中、張昇暉、趙濬、陳棅、郭台達，「資料視覺化在社群媒體下議題追蹤的應用」，TANET 台灣網際網路研討會，883-888頁，2016。
［35］ Wikipedia資料集，20161120更新，取自https://dumps.wikimedia.org/zhwiki/。

指導教授

林熙禎(She-Jen Lin)

審核日期

2017-7-21

推文