摘要(英) |
With the growth of the Information Technology and Smartphone popularity, electronic documents inside and outside the company will continue to increase exponentially. IDC now forecasts that we′ll be generating 40 ZB. They also state that unstructured information might account for more than 80% of all data in organizations. The new age text analysis tools have emerged as the must-have tools for enterprises in order to gain insights for informed decision making and other processes.
Today, an increasing amount of information is being held in unstructured and semi-structured formats which organizations manage (and the additional information that they’d like to include) continues to grow and diversify. The primary problem with the management of all of these unstructured and semi-structured text data is that there are no standard rules for writing text so that a computer can understand it. First, this paper extracts keywords and word frequency from classified documents. Second, this paper calculates the similarity between sample and model documents using cosine similarity. Finally, this paper clusters validity based on the most similarity.
In my case it would be extraneous specification turning into our product part number that it’s crucial and critical processes. Since all processes are being by manual, the mistake always occurs and it’s time-consuming. To combat the problem, I used the Cosine Similarity algorithm to work out the similarity between the specification and product part numbers. The salesperson then used the similarity to convert the specification into product part number rapidly. In this scenario, I developed a text mining system prototype to derive patterns from three different specifications and then did Cosine Similarity via random sampling, the most similarity would turn into product part number and the result turned out to be 100% accuracy. The text mining can solve high-value information comparison problems and mitigate heavy tasks and operational risks for sales team.
|
參考文獻 |
中文文獻:
﹝1﹞ 王禹衡,中華民國一零六年六月,國立中央大學,「運用文字探勘探討網路匿名性對個人發言之影響」
﹝2﹞ 呂國彥,中華民國一零一年七月,國立中央大學,「利用專利文件主題辨識科技趨勢」
﹝3﹞ 李文雄,中華民國一百零二年七月,南台科技大學,「使用分類與詞頻技術萃取領域專有名詞之研究」
﹝4﹞ 宋皇志,中華民國一零六年十月,全國律師雜誌,「人工智能在專利檢索之應用初探」
﹝5﹞ 何旻修,中華民國九十九年六月,國立中央大學,「運用凝聚模糊K-平均分群於潛在語意索引之研究」
﹝6﹞ 吳語婕,中華民國一百零七年六月,國立清華大學,「以內容特徵為基礎之流程文件品質判定」
﹝7﹞ 周紹文,中華民國一零五年七月,國立中山大學,「探討文字指標對於企業績效之影響」
﹝8﹞ 湯佾達,中華民國一百年七月,靜宜大學,「運用文字探勘及關鍵字相似度於標籤雲之研究」
﹝9﹞ 陳世昌,中華民國一百零四年七月,國立交通大學,「企業知識文件推薦系統建置之研究」
﹝10﹞ 陳良駒、張正宏、陳日鑫,中華民國九十九年,資訊管理學報 第十七卷 第四期,「以特徵詞共現特性探討知識管理研究議題相關性-使用共詞與關聯法則分析」
﹝11﹞ 陳宗權、陳俊育,中華民國一零五年,科技部105年度自行研究計畫成果報告,「運用文本探勘技術探索未來科技」
﹝12﹞ 黃娟娟,中華民國一零一年十二月,逢甲大學,「公司年報文字探勘與財務預警資訊內涵」
﹝13﹞ 黃佳新,中華民國九十三年六月,國立清華大學,「關鍵字擷取與文件分類之因子分析」
﹝14﹞ 許立憲,中華民國一百零五年八月,國立暨南大學,「應用文字探勘系統於偵測領域主題與發掘客戶需求」
﹝15﹞ 許邦輝,中華民國九十五年六月,國立清華大學,「以主成分分析法為基礎之文件自動分類模式」
﹝16﹞ 張佩慈,中華民國一百零四年六月,逢甲大學,「以專利分析探討積層製造技術發展趨勢」
﹝17﹞ 張曉珍,中華民國一百零二年六月,國立交通大學,「運用文字探勘技術在社群行為上之人格預測」
﹝18﹞ 曾富祥,中華民國一零七年四月,國立中央大學,「資料管理」
﹝19﹞ 鄭卜壬,李家豪,中華民國一零八年,中央研究院,「資訊檢索技術」
﹝20﹞ 潘揚燊,中華民國一零五年七月,元智大學,「運用文字探勘分析特定領域語詞組成-以台鐵郵輪列車為例」
﹝21﹞ 蕭丁友,中華民國一百零六年六月,國立雲林科技大學,「應用文字探勘技術於信用評估文件分析」
﹝22﹞ 蕭惠如,中華民國一百零六年六月,銘傳大學,「應用文字探勘於資訊管理領域研究趨勢」
﹝23﹞ 魏宇德,中華民國九十八年二月,中華大學,「文字探勘技術應用於自動化知識管理經驗學習系統之研究」
﹝24﹞ 譚克緯,中華民國一百零五年六月,國立勤益科技大學,「以關聯法則為基礎之雲端專利分類系統」
﹝25﹞ 鐘任明, 李維平, 吳澤明,2007,「運用文字探勘於日內股價漲跌趨勢預測之研究」
英文文獻:
﹝26﹞ Bausch, P. and Bumgardner, J.,2006,Make a Flickr-Style Tag Cloud,Flickr Hacks. O’Reilly Press
﹝27﹞ Chakraborty, Goutam and Pagolu, Murali and Garla , Satish,November 2013,Text Mining and Analysis Practical Methods,Example,and Case Studies Using SAS
﹝28﹞ Dang, Shilpa and Ahmad, Peerzada Hamid,2013,A Review of Text Mining Techniques Associated with Various Application Area
﹝29﹞ Deerwester, S. C., Dumais, S. T., Landauer, T.K., Furnas, G. W.,& Harshman, R.A.,Indexing by latent semantic analysis,JAsls,41(6)1990,pp.391-407
﹝30﹞ Dorre, Jochen and Gerstl, Peter and Seiffert, Roland,1999,Text Mining:Finding Nuggets in Mountains of Textual Data
﹝31﹞ Feldman, R. And Dagan, I.,1995,Knowledge discovery in textual database(KDT),Proceedings of the First ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
﹝32﹞ Gaikwad, Sonali Vijay and Chaugule, Archana and Pramod Patil,January 2014,Text Mining Methods And Techniques
﹝33﹞ Garcia, Edel,April 2015,cosine similarity tutorial
﹝34﹞ Ghosh, Sayantani and Roy, Sudipta and Bandyopadhyay, Samir K.,2012,A tutorial review on Text Mining Algorithms
﹝35﹞ Gomaa, Wael H. and Fahmy, Aly A.,April 2013,A Survey of Text Similarity Approaches
﹝36﹞ Kwartler, Ted,2017,TEXT MINING IN PRACTICE WITH R
﹝37﹞ Kaur, Arvinder and Chopra, Deepti,2016,Comparison of Text Mining Tools
﹝38﹞ Luhn, Hans Peter,1958,The Automatic Creation of Literature Abstracts
﹝39﹞ Mihalcea, Rada and Corley, Courtney and Strapparava , Carlo,2006,Corpus-based and Knowledge-based Measures of Text Semantic Similarity
﹝40﹞ Salton, G. and Buckley, C.,1988,Term-weighting Approaches in Automatic Text Retrieval,Information Processing & Management,24(5):513-523
﹝41﹞ Salton, G and Yang, C.S.,June 1973,on the specification of term values in automatic indexing,pp. 73-173
﹝42﹞ Shafique, Umair and Qaiser, Haseeb,2014,A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)
﹝43﹞ Sundar, N. Aditya and Latha, P. Pushpa and Chandra, M. Rama,2012,PERFORMANCE ANALYSIS OF CLASSIFICATION DATA MINING TECHNIQUES OVER HEART DISEASE DATA BASE
﹝44﹞ Tsai, Tzong-Han and Day, Min-Yuh and Wu, Shih-Hung and Hsu, Wen-Lian,2002,FAQ-Centered Organizational Memory,pp. 6-8
﹝45﹞ Wirth, Rüdiger and Hipp, Jochen,2000,CRISP-DM Towards a standard process model for data mining
網路資料:
﹝46﹞ https://www.ithome.com.tw/voice/90361,從搜尋引擎到文字探勘
﹝47﹞ http://www.sasresource.com/faq442.html,SAS線上課程,「從文字中掌握趨勢、創造價值」
﹝48﹞ https://zh.wikipedia.org/wiki/Tf-idf,TF-IDF
﹝49﹞ http://www.cc.ntu.edu.tw/chinese/epaper/0031/20141220_3103.html,國立台灣大學 計算機與資訊網路中心
﹝50﹞ https://molecular-service-science.com/2014/07/16/eigen-value-singular-value-decomposition-principal-component-analysis/,奇異值分解(SVD)
﹝51﹞ https://zh.wikipedia.org/wiki/%E4%BD%99%E5%BC%A6%E7%9B%B8%E4%BC%BC%E6%80%A7,餘弦相似性
﹝52﹞ https://www.ibm.com/support/knowledgecenter/zh-tw/SS3RA7_sub/modeler_crispdm_ddita/modeler_crispdm_ddita-gentopic1.html,IBM Knowledge Center > CRISP-DM概觀
﹝53﹞ https://ckip.iis.sinica.edu.tw/CKIP/,中央研究院詞庫小組,中華民國一百年五月
﹝54﹞ http://research.sinica.edu.tw/nlp-natural-language-processing-chinese-knowledge-information/,台北醫學大學大數據研究中心
﹝55﹞ research.sinica.edu.tw/nlp-natural-language-processing-chinese-knowledge-information/,中央研究院
﹝56﹞ http://www.asia-analytics.com.tw/tw/solution/s-industry.jsp,台灣析數
﹝57﹞ www.inside.com.tw/2013/07/22/how-forensic-linguistics-outed-j-k-rowling,語意分析技術,讓「哈利波特」作者羅琳改名出新書一事被曝光
|