English  |  正體中文  |  简体中文  |  Items with full text/Total items : 68069/68069 (100%)
Visitors : 23152247      Online Users : 210
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version

    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/80269

    Title: 運用文字探勘及餘弦相似度簡化客戶詢價流程
    Authors: 臧自強;Tsang, Tzu-Chiang
    Contributors: 工業管理研究所在職專班
    Keywords: 文字探勘;文字權重;餘弦相似度;客戶詢價;text mining;term weight;cosine similarity;customer inquiry
    Date: 2019-07-10
    Issue Date: 2019-09-03 12:27:40 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 全球電腦設備及手持裝置的激增,新興市場連網越來越普及,各個企業組織內部與外部的電子文件呈現幾何級數的方式快速成長。根據IDC(the International Data Corporation)的報告指出,到了2020年預估每年產生40 ZB的資料。IDC也進一步說明,一個企業組織中幾乎80%的資料是屬於文字型態資料,從IDC的報導可知,「非結構化資料、文字型態資料」的資料探勘,即文字探勘還有很大的應用與發展空間,甚至美國麻省理工學院將自然語言處裡與文字探勘選為未來十年重要技術之一。
    ;With the growth of the Information Technology and Smartphone popularity, electronic documents inside and outside the company will continue to increase exponentially. IDC now forecasts that we′ll be generating 40 ZB. They also state that unstructured information might account for more than 80% of all data in organizations. The new age text analysis tools have emerged as the must-have tools for enterprises in order to gain insights for informed decision making and other processes.
    Today, an increasing amount of information is being held in unstructured and semi-structured formats which organizations manage (and the additional information that they’d like to include) continues to grow and diversify. The primary problem with the management of all of these unstructured and semi-structured text data is that there are no standard rules for writing text so that a computer can understand it. First, this paper extracts keywords and word frequency from classified documents. Second, this paper calculates the similarity between sample and model documents using cosine similarity. Finally, this paper clusters validity based on the most similarity.
    In my case it would be extraneous specification turning into our product part number that it’s crucial and critical processes. Since all processes are being by manual, the mistake always occurs and it’s time-consuming. To combat the problem, I used the Cosine Similarity algorithm to work out the similarity between the specification and product part numbers. The salesperson then used the similarity to convert the specification into product part number rapidly. In this scenario, I developed a text mining system prototype to derive patterns from three different specifications and then did Cosine Similarity via random sampling, the most similarity would turn into product part number and the result turned out to be 100% accuracy. The text mining can solve high-value information comparison problems and mitigate heavy tasks and operational risks for sales team.
    Appears in Collections:[工業管理研究所碩士在職專班 ] 博碩士論文

    Files in This Item:

    File Description SizeFormat

    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback  - 隱私權政策聲明