運用文字探勘及餘弦相似度簡化客戶詢價流程

DC 欄位	值	語言
DC.contributor	工業管理研究所在職專班	zh_TW
DC.creator	臧自強	zh_TW
DC.creator	Tsang, Tzu-Chiang	en_US
dc.date.accessioned	2019-7-10T07:39:07Z
dc.date.available	2019-7-10T07:39:07Z
dc.date.issued	2019
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=106456009
dc.contributor.department	工業管理研究所在職專班	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	全球電腦設備及手持裝置的激增，新興市場連網越來越普及，各個企業組織內部與外部的電子文件呈現幾何級數的方式快速成長。根據IDC（the International Data Corporation）的報告指出，到了2020年預估每年產生40 ZB的資料。IDC也進一步說明，一個企業組織中幾乎80%的資料是屬於文字型態資料，從IDC的報導可知，「非結構化資料、文字型態資料」的資料探勘，即文字探勘還有很大的應用與發展空間，甚至美國麻省理工學院將自然語言處裡與文字探勘選為未來十年重要技術之一。企業組織中的文字型態資料，皆為人類自然語言所組成，內容充滿多樣性、複雜與獨特性。若由人工的方式判斷分類文字型態資料，不僅不符合經濟效益且難度甚高，更重要的是沒有一個公認標準。因此本研究提出從已分類文件中擷取出關鍵字以及字詞頻率，再透過餘弦相似度計算查詢文件與文字探勘模型之間的相似度，最後根據相似度，協助人工正確地分類與提升人工執行效率。本研究針對業務部門處理客戶詢價時，最繁重的環節就是將客戶需求規格轉換成產品料號的作業，現行是以人工方式執行客戶需求規格轉換產品料號。因為倚賴人工的方式執行，就有機會發生轉換成錯誤的料號並且人工作業的效率也不好。針對以上的問題，使用文字探勘技術與餘弦相似度計算，取得客戶需求規格與產品料號之間的相似度，業務部門人員再根據相似度，快速完成客戶需求規格轉換成產品料號的作業。測試資料集由個案公司提供進行測試與驗證，透過本研究開發的系統原型，分別進行三組客戶需求規格的文字探勘，然後在測試資料集隨機抽樣客戶需求規格，再透餘弦相似度計算，相似度最高者即是轉換成的產品料號，皆可以正確地轉換產品料號。經由使用者測試使用並討論後，認為具有高度的導入價值，確認是可以提升人工分類的正確率與客戶詢價作業的執行效率。	zh_TW
dc.description.abstract	With the growth of the Information Technology and Smartphone popularity, electronic documents inside and outside the company will continue to increase exponentially. IDC now forecasts that we′ll be generating 40 ZB. They also state that unstructured information might account for more than 80% of all data in organizations. The new age text analysis tools have emerged as the must-have tools for enterprises in order to gain insights for informed decision making and other processes. Today, an increasing amount of information is being held in unstructured and semi-structured formats which organizations manage (and the additional information that they’d like to include) continues to grow and diversify. The primary problem with the management of all of these unstructured and semi-structured text data is that there are no standard rules for writing text so that a computer can understand it. First, this paper extracts keywords and word frequency from classified documents. Second, this paper calculates the similarity between sample and model documents using cosine similarity. Finally, this paper clusters validity based on the most similarity. In my case it would be extraneous specification turning into our product part number that it’s crucial and critical processes. Since all processes are being by manual, the mistake always occurs and it’s time-consuming. To combat the problem, I used the Cosine Similarity algorithm to work out the similarity between the specification and product part numbers. The salesperson then used the similarity to convert the specification into product part number rapidly. In this scenario, I developed a text mining system prototype to derive patterns from three different specifications and then did Cosine Similarity via random sampling, the most similarity would turn into product part number and the result turned out to be 100% accuracy. The text mining can solve high-value information comparison problems and mitigate heavy tasks and operational risks for sales team.	en_US
DC.subject	文字探勘	zh_TW
DC.subject	文字權重	zh_TW
DC.subject	餘弦相似度	zh_TW
DC.subject	客戶詢價	zh_TW
DC.subject	text mining	en_US
DC.subject	term weight	en_US
DC.subject	cosine similarity	en_US
DC.subject	customer inquiry	en_US
DC.title	運用文字探勘及餘弦相似度簡化客戶詢價流程	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 106456009 完整後設資料紀錄