OLAP報表探勘– 應用資料探勘技術從OLAP報表集合中挖掘以比較相似性為基礎之知識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：40

、訪客IP：18.118.144.109

姓名

李明忠(Ming-Zhong Li) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

OLAP報表探勘– 應用資料探勘技術從OLAP報表集合中挖掘以比較相似性為基礎之知識
(OLAP Report Mining – Finding Similarity-based Knowledge from OLAP Reports via Data Mining Techniques)

相關論文

★ 針對提昇資料倉儲資料庫執行效能之知識管理與相關系統設計	★ 以關聯規則探勘為基礎，探討詐騙車手提領型態互動之研究
★ 部落格之網路口碑評比機制平台管理與應用	★ 虛擬貨幣交易平台之實現
★ 適用於多種設備的可否認鑑別協定之設計	★ 交易程式最佳化的多維度分析平台之設計與建置
★ 多商品多策略程式交易績效曲線比較和分群機制之研究	★ 整合分位數回歸分析與單因子選股的股票選股策略回測與績效評估之研究
★ 以工作流程與Portlet為基礎整合學習管理系統以支援課程編組	★ 使用服務導向技術建構具支援二線廠客製化能力的電子中樞系統之研究
★ 以流程為中心的Portlet重用性分析	★ 應用資料倉儲技術建構平衡計分卡資訊系統之研究-以某消費性電子製造公司人力資源計分卡為例
★ 自動化的產品平台管理與應用	★ 以代理人為基礎的資訊系統協助新產品開發流程的自動化
★ 以整合式的教練引導開發以框架為基礎的專案	★ 支援新產品研發的整合性知識管理系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

線上分析處理技術(OLAP)是目前業界十分普及的資料分析解決方案，可將資料倉儲所蒐集之企業營運資料轉化為OLAP報表，運用多維度分析技術輔助企業發覺潛在的營運問題或市場機會。然而，在企業自動化快速產生與大量累積OLAP報表的同時，分析者卻只能以人工的方式，憑藉個人的知識與經驗盲目的在難以計數的報表中發掘潛在的知識。
為能解決此問題，資料探勘(data mining)是最有可能的解決方案，目前已經針對多種問題領域，發展出各種演算方法以自動的從資料中挖掘出有趣的知識樣式，是一個十分成熟的技術。然而，經由研究發現，目前這些方法主要係針對「資料」挖掘知識，而仍缺乏以「報表」為主體之資料探勘方法。
有鑑於此，本研究提出一種資料探勘解決方案，稱之為OLAP 報表探勘(OLAP report mining)，係以OLAP報表為主體，在OLAP報表集合中挖掘潛在的知識訊息。本研究首先針對以比較相似性為基礎的報表分析需求，應用傳統之多元尺度分析、群聚分析與奇異值分析，提出OLAP_MDS、OLAP_CLU與OLAP_OUT三種方法，能夠從OLAP報表集合中挖掘出對應的知識樣式。研究工作包括：(1)定義OLAP報表間的可比較關係；(2)設計適合測量OLAP報表相似度之量度；(3)說明如何應用傳統資料探勘方法從OLAP報表集合中挖掘知識；以及，(4)說明「單獨表現」和「整合表現」二種知識表現的方法與適用時機。
本研究並透過二項實驗來驗證OLAP 報表探勘之可行性。第一項實驗是參考認知科學實驗方法驗證本研究所提出衡量OLAP報表間相似度之量度，實驗結果支持運用歐幾里德距離公式可以合適的表現OLAP報表間之相似程度。第二項實驗則是將本研究所提出的三種方法應用於知名之OLAP 範例資料庫(Foodmart 2000)，以驗證這些方法的適用性。透過此實驗，亦證明了本研究所提出的三種方法皆能夠從OLAP報表集合中挖掘出有用的知識。

摘要(英)

On Line Analysis Processing (OLAP) is a common solution that modern enterprises use to generate, monitor, share, and administrate their analysis reports. When daily, weekly, and/or monthly reports are generated or published by the OLAP operators, all the analysis on the contents of reports are left for the report readers. To discover hidden rules, similar reports, or trend inside the potentially huge amount of reports, the report readers can only rely on their smart eyes to find out any knowledge of such kinds.
Data mining is a well-developed field for finding hidden knowledge inside the data itself. However, there are few techniques focus on finding knowledge using OLAP reports as a major of data source.
Therefore, the research provided an approach for mining knowledge from OLAP reports, which is called OLAP report mining. There are three methods proposed in this thesis (called OLAP_MDS, OLAP_CLU, OLAP_OUT) which are applying traditional multi-dimensional scaling, clustering, and outlier analysis methods on OLAP reports. The work includes (1) defining the comparability relationship between two OLAP reports, (2) designing the similarity measurement for OLAP reports, (3) explaining how to apply traditional data mining methods for finding knowledge from OLAP reports, and (4) providing individual and integrative knowledge presentation methods.
Two kinds of experiments to verify the solution are conducted. The first kind of experiment is based on cognition science to validate the proposed definition of semantic distance between two OLAP reports. The experiment supports the rationale behind the definition of this semantic distance. The second kind of experiment is to apply our proposed methods on popular commercial OLAP databases (Foodmart 2000) to verify the applicability of these methods. All the proposed methods are confirmed that can sufficiently and efficiently find and represent similarity-based knowledge of OLAP reports.

關鍵字(中)

★ OLAP報表探勘
★ OLAP探勘
★ 資料探勘
★ OLAP報表
★ 線上分析處理技術
★ OLAP

關鍵字(英)

★ data mining
★ OLAP report
★ OLAP
★ OLAP report mining
★ OLAM
★ OLAP mining

論文目次

Table of Contents
Chinese Abstract i
Abstract iii
Table of Contents v
List of Figures vii
List of Tables ix
Chapter 1. Introduction 1
1.1 Research Background 1
1.2 Research Motivation 4
1.3 Research Objective 5
1.4 Limitations and Assumptions of the Research 8
1.5 Organization of the Dissertation 9
Chapter 2. Literature Survey 10
2.1 On-line Analytic Mining, OLAM 10
2.2 The Techniques for Finding Similarity-based Knowledge 12
2.2.1 Multidimensional Scaling 12
2.2.2 Clustering analysis 13
2.2.3 Outlier analysis 13
2.2.4 Comparison of MDS, Clustering and Outlier Analysis 14
Chapter 3. OLAP Report Mining 15
3.1 Data Source – Data Cube and OLAP Report 17
3.2 Data Selection and Data Preprocessing 19
3.3 Similarity Measurements among OLAP Reports 24
3.3.1 Report Distance of Two OLAP Reports 25
3.3.2 Average Report Distance of an OLAP Report 26
3.4 OLAP Report Mining: 27
3.4.1 OLAP_MDS method: Applying MDS method to OLAP reports 27
3.4.2 OLAP_CLU method: Applying Clustering to OLAP reports 31
3.4.3 OLAP_OUT method: Applying Outlier Analysis to OLAP reports 34
Chapter 4. Validation and Experiment 43
4.1 Validation of the Proposed Distance Function Between Two OLAP Reports 43
4.1.1 Experimental Design 44
4.1.2 Experimental Procedure 45
4.1.3 Experimental Results 45
4.2 Applying Methods to Foodmart 2000 48
4.2.1 Experimental Design 49
4.2.2 Experimental Result of OLAP_MDS Method 52
4.2.3 Experimental Result of OLAP_CLU Method 54
4.2.4 Experimental Result of OLAP_OUT Method 57
4.2.5 Summary 63
Chapter 5. Conclusion 65
Reference 67

參考文獻

Reference
[1] M. Hope, et al., Ovum Evaluates: OLAP: OVUM, 2003.
[2] E. Thomsen, OLAP solutions: building multidimensional information systems: John Wiley & Sons, Inc. New York, NY, USA, 2002.
[3] B. Larson, Delivering Business Intelligence with Microsoft SQL Server 2008: McGraw-Hill Osborne Media, 2008.
[4] R. Kimball, The data warehouse toolkit: Wiley-India, 2006.
[5] P. Turley and R. Bruckner, Microsoft SQL Server Reporting Services Recipes: for Designing Expert Reports: Wrox Press Ltd. Birmingham, UK, UK, 2010.
[6] L. Moss and S. Atre, Business intelligence roadmap: the complete project lifecycle for decision-support applications: Addison-Wesley Professional, 2003.
[7] M. Raisinghani, Business intelligence in the digital economy: opportunities, limitations and risks: Igi Global, 2004.
[8] R. Goldstone, "Similarity, interactive activation, and mapping," Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 20, pp. 3-28, 1994.
[9] W. Quine and W. Quine, Ontological relativity and other essays: Columbia Univ Pr, 1977.
[10] K. Hsu and M. Li, "Applying Multi-dimensional Scaling Analysis for Finding Similarity Knowledge in OLAP Reports," in 2010 Second International Conference on Computer Engineering and Applications, 2010, vol. 2, pp. 269-275.
[11] K. Hsu and M. Li, "Applying Clustering Analysis on Grouping Similar OLAP Reports," in 2010 Second International Conference on Computer Engineering and Applications, 2010, vol. 2, pp. 417-423.
[12] K. Hsu and M. Li, "Techniques for Finding Similarity Knowledge in OLAP Reports," Expert Systems with Applications, DOI:10.1016/j.eswa.2010.09.033, 2010.(Accepted)
[13] S. Chaudhuri and U. Dayal, "An overview of data warehousing and OLAP technology," ACM SIGMOD Record, vol. 26, pp. 65-74, 1997.
[14] J. Han and M. Kamber, Data Mining: Concepts and Techniques: Morgan Kaufmann, 2001.
[15] U. M. Fayyad, et al., "From data mining to knowledge discovery: an overview," in Advances in Knowledge Discovery and Data Mining, ed: American Association for Artificial Intelligence, 1996, pp. 1-34.
[16] J. Han, "OLAP Mining: An Integration of OLAP with Data Mining," in In Proceedings of 1997 IFIP Conference on Database Semantics (DS7), 1997, pp. 1-11.
[17] J. Han, "Towards on-line analytical mining in large databases," ACM SIGMOD Record, vol. 27, pp. 97-107, 1998.
[18] M. Kaya and R. Alhajj, "Integrating fuzziness with OLAP association rules mining," Machine Learning and Data Mining in Pattern Recognition, pp. 65-81, 2003.
[19] M. Kaya and R. Alhajj, "Extending OLAP with fuzziness for effective mining of fuzzy multidimensional weighted association rules," Advanced Data Mining and Applications, pp. 64-71, 2006.
[20] J. Fong, et al., "Online analytical mining association rules using Chi-square test," International Journal of Business Intelligence and Data Mining, vol. 2, pp. 311-327, 2007.
[21] T. Imieli ski, et al., "Cubegrades: Generalizing association rules," Data Mining and Knowledge Discovery, vol. 6, pp. 219-257, 2002.
[22] G. Dong, et al., "Mining constrained gradients in large databases," IEEE Transactions on Knowledge and Data Engineering, vol. 16, pp. 922-938, 2004.
[23] J. Han, et al., "Constraint-based, multidimensional data mining," Computer, vol. 32, pp. 46-50, 1999.
[24] S. Sarawagi, et al., "Discovery-driven exploration of OLAP data cubes," Advances in Database Technology, pp. 168-182, 1998.
[25] A. Laurent, "A new approach for the generation of fuzzy summaries based on fuzzy multidimensional databases," Intelligent Data Analysis, vol. 7, pp. 155-177, 2003.
[26] R. Ackoff, "From data to wisdom," Journal of Applied Systems Analysis, vol. 16, pp. 3-9, 1989.
[27] J. Carroll and P. Arabie, "Multidimensional scaling," Annual review of psychology, vol. 31, pp. 607-649, 1980.
[28] P. Green, et al., Multidimensional scaling: concepts and applications: Allyn and Bacon Boston, 1989.
[29] P. Berkhin, "A survey of clustering data mining techniques," Grouping Multidimensional Data, pp. 25-71, 2006.
[30] A. Jain, et al., "Data clustering: a review," ACM computing surveys (CSUR), vol. 31, pp. 264-323, 1999.
[31] C. Romesburg, Cluster analysis for researchers: Lulu press, 2004.
[32] R. Xu and I. Donald Wunsch, "Survey of clustering algorithms," IEEE Transactions on Neural Networks, vol. 16, pp. 645-678, 2005.
[33] M. Anderberg, Cluster analysis for applications: Academic press New York, 1973.
[34] M. Berry and M. Castellanos, Survey of text mining II: clustering, classification, and retrieval: Springer-Verlag New York Inc, 2007.
[35] D. Ketchen and C. Shook, "The application of cluster analysis in strategic management research: an analysis and critique," Strategic Management Journal, vol. 17, pp. 441-458, 1996.
[36] G. Punj and D. Stewart, "Cluster analysis in marketing research: review and suggestions for application," Journal of Marketing Research, vol. 20, pp. 134-148, 1983.
[37] G. Pallis, et al., "Model-based cluster analysis for web users sessions," Foundations of Intelligent Systems, pp. 219-227, 2005.
[38] J. Srivastava, et al., "Web usage mining: Discovery and applications of usage patterns from web data," ACM SIGKDD Explorations Newsletter, vol. 1, p. 23, 2000.
[39] A. Sturn, et al., "Genesis: cluster analysis of microarray data," Bioinformatics, vol. 18, p. 207, 2002.
[40] D. Jiang, et al., "Cluster analysis for gene expression data: A survey," IEEE Transactions on Knowledge and Data Engineering, vol. 16, pp. 1370-1386, 2004.
[41] V. Hodge and J. Austin, "A survey of outlier detection methodologies," Artificial Intelligence Review, vol. 22, pp. 85-126, 2004.
[42] M. Sap and E. Mohebi, "Outlier Detection Methodologies: A Review," Journal of Information Technology, UTM, vol. 20, pp. 87-105, 2008.
[43] S. Walfish, "A review of statistical outlier methods," Pharmaceutical Technology, vol. 30, pp. 82-86, 2006.
[44] B. Iglewicz and D. Hoaglin, How to detect and handle outliers: Asq Pr, 1993.
[45] V. Barnett, et al., "Outliers in statistical data," Physics Today, vol. 32, p. 73, 1979.
[46] M. Rafanelli, Multidimensional databases: problems and solutions: IGI Global, 2003.
[47] J. Kruskal, "Nonmetric multidimensional scaling: a numerical method," Psychometrika, vol. 29, pp. 115-129, 1964.
[48] J. Kruskal and M. Wish, Multidimensional scaling: Sage Publications, Inc, 1978.

指導教授

許智誠(Kevin Chihcheng Hsu)

審核日期

2010-10-10

推文