中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/78711
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41787396      線上人數 : 2094
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/78711


    題名: 具最小歸納成本之屬性導向歸納法;Attribute Oriented Induction Methods with Minimal Induction Cost
    作者: 陳彥良
    貢獻者: 國立中央大學資訊管理學系
    關鍵詞: 屬性導向歸納法;分群演算法;基因演算法;貪婪演算法;一般化表格;Attribute-Oriented Induction;Clustering;Genetic algorithm (GA);Greedy Algorithm;Generalized Table
    日期: 2018-12-19
    上傳時間: 2018-12-20 13:44:21 (UTC+8)
    出版者: 科技部
    摘要: 屬性導向歸納方法的目的乃要從關連表格中,找出資料的一般化特徵。過去的屬性 導向歸納方法雖然能歸納出一個一般化表格來描述原關連資料的特徵,但傳統方法的缺 點是它們缺乏一個可以衡量一般化資料好壞的明確定義,所以傳統方法不能保證找出具 有最佳歸納效果的一般化資料。據此,本研究提出一般化表格歸納成本的定義,在此基 礎上,本研究進一步提出三個方法以找出具最佳歸納效果的一般化表格,第一個方法是 套用傳統聚合群組概念的演算法,此方法的優點在於它根據歸納成本的大小來引導每一 次的群組合併,因此可以快速地找出更低歸納成本的一般化表格。第二個方法是套用 GA 概念的演算法,它的效率雖較差,但它所找出的一般化表格,其歸納成本預期可以 接近最佳解,因為它透過每一世代交配與突變的演化動作,不斷往最佳解趨近。第三個 方法背後的理念是:資料主要包含兩部分,資訊與雜訊。因此如果能過濾雜訊卻保留資 訊,則可以得到對資料最佳之歸納效果。因此第三個方法在最多可以過濾x%雜訊資料 前提下,找出歸納成本最低的一般化表格。 ;The purpose of attribute-oriented induction (AOI) method is to find out the generalized characteristics of data from a relation table. Although previous methods in AOI can find out a generalized table to describe the characteristics of the original relation, a common drawback of these methods is that they lack a formal definition to measure the induction effect of their results. Due to this reason, these methods cannot guarantee that the generalized tables found by their methods can achieve the best induction effect. Accordingly, this study formally define how to measure the induction cost of a generalized tuple, on this basis, the study further proposes three methods to find generalized tables from data. The first method is developed by applying the traditional framework of agglomerative clustering algorithms. The advantage of this approach is that, each time when it selects two generalized tuples to combine, it chooses the pair with minimum induction cost. Doing so makes the found generalized tables having lower induction cost than those found by traditional AOI methods. The second method is developed by applying traditional genetic algorithms (GA). Although GA-based algorithms are usually slower due to their costly evolution process, the proposed method can obtain generalized results very close to the optimal solution, because of its repetitive evolution process to improve the solution. Finally, the idea of the third method is based on the observation that that data is composed of information and noise. If we can remove noise but keep the information in the data, it would result in a better induction effect. Accordingly, the third method is a greedy method to find the minimal induction cost generalized table under the condition that at most x% noisy data can be discarded.
    關聯: 財團法人國家實驗研究院科技政策研究與資訊中心
    顯示於類別:[資訊管理學系] 研究計畫

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML257檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明