在成本限制下，以屬性導向歸納法為基礎歸納資料

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/77549

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/77549

題名:	在成本限制下，以屬性導向歸納法為基礎歸納資料
作者:	余美儒;YU, MEI-RU
貢獻者:	資訊管理學系
關鍵詞:	屬性導向歸納法;聚合式階層分群;成本限制;雜訊過濾;資訊挖掘
日期:	2018-07-02
上傳時間:	2018-08-31 14:48:05 (UTC+8)
出版者:	國立中央大學
摘要:	隨著科技的進步，資料的重要性逐漸被人們所看重，許多學者爭相投入資料探勘領域，期待從眾多資料中找出背後所隱含的價值。屬性導向歸納法（Attribute Oriented Induction，簡稱為AOI）為一種以歸納為基礎的資料分析技術，是重要的資料探勘方法之一，最早於1990年代首次被提出。主要將關聯式資料庫中的每一個屬性一般化以進行知識挖掘，此方法將屬性依據使用者背景知識設定而成的概念樹將資料進行歸納。使用屬性導向歸納法主要有三個問題，第一個是在歸納大量資料時會以限制最終歸納資料筆數為主，所以無法保證每筆歸納出來的結果都具有一定的明確度。第二個問題是傳統AOI歸納資料筆數可以由使用者自行設定，但因其設定會影響歸納結果明確度，所以不容易設定恰當的最終門檻值。第三個問題是大量資料中隱藏很多的雜訊，然而AOI並無過濾雜訊的功能，在歸納時會將資料中的資訊與雜訊混合在一起，使得歸納出來的資料不明確，變得模糊。因此，本研究導入成本的概念，將屬性一般化所喪失的詳細度量化為成本，以成本限制的方式，讓每一筆歸納結果都具有一定明確度。同時，以聚合式階層分群方法為基礎依照挑選資料歸納方式提出兩種演算法。評估兩種方法在不同成本限制以及不限制最終歸納資料筆數前提下，將資料庫歸納後的結果明確度和原始資料代表力。本研究改善傳統AOI方法歸納結果較粗略、無法過濾雜訊的問題，歸納出明確程度較高且能分辨原始資料中資訊和雜訊的表格。 ;With the progress of technology, the importance of "information" has gradually been valued by people. Therefore, researchers in many fields have started to dive into the field of data mining and have developed lots of solutions, looking forward to mine the information behind large database. One of the important methods called Attribute Oriented Induction (short for AOI) has been proposed in 1990. AOI generalizes each attribute in relational databases according to concept trees ascension for knowledge mining and summarizes the data based on the conceptual tree set by the user′s background knowledge. However, there are three main problems about AOI. The first is the threshold of the number of data items to be summarized when summarizing a large amount of data. Therefore, there is no guarantee that each of the summarized results will have a certain degree of certainty. The second problem is that the setting of the traditional AOI threshold will affect the clarity of the induction result. The third problem is that the AOI does not have the function of filtering noise. It will mix information and noise in the data when it is summarized, making the summarized data unclear and blurred. This study introduces cost to quantify the losing details when attribute generalizing and in a cost-constrained manner to make each generalized tuple with certain degree of certainty. According to the different data selection method (Minimum cost, Random), we proposed two algorithms based on the aggregate hierarchical clustering method. Finally, we find the performance of one of our method superior than traditional AOI and provide more useful information.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	144	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....