博碩士論文 91426003 詳細資訊


姓名 古傑煥(Chieh-Huan Ku)  查詢紙本館藏   畢業系所 工業管理研究所
論文名稱 在查詢為隨機需求與回覆時間被限制的環境下同時決定資料倉儲所需儲存的資料與資料維護策略
(Simultaneous Determination of View Selection and Update Policy with Stochastic Query and Response Time Constraints)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    至系統瀏覽論文 ( 永不開放)
摘要(中) 建構資料倉儲系統是為了更有效率的回覆各種不同的績效指標。決定哪些資料要儲存在資料倉儲之中,是必須在滿足各種不同的限制下,使整個系統的績效指標運算成本與維護成本最小化。資料維護策略決定了什麼時候要去更新資料倉儲內的資料。在過去的相關文獻資料當中,分別去研究哪些資料要事先經過一些運算後存放在資料倉儲當中以及資料倉儲的資料維護策略。但在實際情況下,這兩個決策是會相互影響的。因此,在查詢的需求遵從卜瓦松分配以及每一次的查詢會有一定的機率在限制時間內被回覆的狀況下,我們同時去決定哪些資料要存放在資料倉儲以及這些資料的維護策略。
在我們的研究當中,我們提出了一個數學模型,在已知哪些資料被存放在資料倉儲中的情況下,來決定最佳的資料維護策略。在這個數學模型當中,我們所採用的資料維護策略與過去研究所採用的資料更新頻率的差異點在於資料更新頻率所考慮的是資料在根源系統當中資料的更新頻率,而資料維護策略是去決定資料倉儲系統什麼時候要去更新儲存在內的資料。我們所提出的模型也考量了在現實情況下,查詢的需求為隨機的現象。除此之外,我們利用等候線理論當中的M/G/1模型去描述平均的系統查詢回覆時間並加以限制。更進一步地,我們發展了一個分為兩階段運算的貪進演算法來決定哪一些資料要事先被儲存在資料倉儲中。
在應用方面,我們將所提出的數學模型以及貪進演算法應用在不同的案例當中。我們利用數量分析的方法去探討各種不同的限制與系統參數對決定哪些資料要儲存在資料倉儲當中所造成的影響。除此之外,我們也設計了一些相關的實驗去評估窮舉演算法與貪進演算法求解所花的時間以及所決定存放在資料倉儲的資料的差異。
摘要(英) Data warehouse is built up to reply queries efficiently. The view selection is to select a set of views to materialize under constraints, when minimizing the total of query processing cost and view maintenance cost. The update policy decides when to refresh the data in a data warehouse. Previous researches dealt with these two problems independently, however under the real situation, they are correlated with each other. Therefore, we simultaneously determine view selection and update policy in designing a data warehouse when the arrival of query follows a Poisson process and the response time of query is within a given threshold with a desired probability.
In this research, we propose a mathematical model for determining optimal update policy when the set of materialized views are known. In the model, we adopt view maintenance policy for view update frequency, which does not change with the selected views in the former researches. Our model also incorporates the stochastic phenomenon to reflect the uncertain demand of query which is common in the real life. The mean system response time constrained by a specified time is formulated by a M/G/1 model. Furthermore, we develop a two-phase greedy algorithm for searching a better set of views to materialize.
As to application, we consider different special cases to implement the mathematical model and the greedy algorithm. A computational analysis is conducted to explore the impact of different constraints and system parameters on view selection. In addition, we also design some experiments to evaluate the difference of view selection and solution running time between the greedy algorithm and exhaustive algorithm.
關鍵字(中) ★ 再生理論
★ 資料倉儲
★ 資料維護策略
★ 資料倉儲建置
★ 卜瓦松分配
★ 等候線
★ 隨機模式
關鍵字(英) ★ View selection
★ View maintenance policy
★ Poisson process
★ Renewal theory
★ M/G/1
★ AMPL/MINOS
★ MVPP
★ Data warehouse configuration
論文目次 Contents
Abstract......................................................................I
Contents.....................................................................II
List of Figures..............................................................IV
List of Tables.............................................................VIII
Chapter 1 Introduction........................................................1
1.1 Research background and motivation........................................1
1.2 Problem description.......................................................3
1.3 Research objectives.......................................................4
1.4 Research methodology and procedure........................................4
Chapter 2 Literature Review...................................................7
2.1 Frameworks for view selection.............................................7
2.2 Cost model...............................................................11
2.3 View maintenance.........................................................12
2.4 Algorithm................................................................13
Chapter 3 Mathematical Model and Greedy Algorithm............................16
3.1 Definitions and assumptions..............................................16
3.1.1 Definitions............................................................16
3.1.2 Assumptions............................................................17
3.2 Notations and mathematical model.........................................18
3.2.1 Notations..............................................................18
3.2.2 Mathematical model.....................................................19
3.2.2.1 Modeling of view maintenance policy..................................22
3.2.2.2 Modeling of response time constraint.................................30
3.3 Greedy algorithm.........................................................34
3.4 An example for greedy algorithm..........................................37
Chapter 4 Model Application..................................................41
4.1 Implementation of the model: two special cases...........................41
4.2 Validation and evaluation................................................44
4.2.1 Validation.............................................................44
4.2.2 Evaluation.............................................................49
4.3 Remarks for computational analysis.......................................52
Chapter 5 Conclusion.........................................................53
5.1 Research Contribution....................................................53
5.2 Further Research.........................................................53
Reference....................................................................55
Appendix A. Mathematical model in AMPL format................................58
Appendix B. Greedy algorithm in AMPL format..................................63
Appendix C. Additional Cases.................................................67
List of Figures
Fig.1.1 Query answering path..................................................1
Fig.1.2 Research procedure....................................................6
Fig.2.1 Lattice (The eight TPC-D views).......................................8
Fig.2.2 (a) An expression A-DAG...............................................8
Fig.2.2 (b) An expression AO-DAG..............................................8
Fig.2.3 Multiple view processing plan........................................10
Fig.3.1 Generation of viewsets...............................................17
Fig.3.2 Hint Graph...........................................................19
Fig.3.3 Framework of case 1..................................................20
Fig.3.4 The relational matrix between queries and viewsets for case 1........20
Fig.3.5 The relational matrix between viewsets and views for case 1..........21
Fig.3.6 The relational matrix between views and update operations for case 1.21
Fig.3.7 The relational matrix between viewsets and operations for case 1.....21
Fig.3.8 Query replying procedure.............................................24
Fig.3.9 Complete view selection model........................................33
Fig.3.10 Complete view selection model (cont.)...............................34
Fig.3.11 Algorithm (1).......................................................35
Fig.3.12 Algorithm (2).......................................................36
Fig.3.13 Example for greedy algorithm........................................37
Fig.3.14 The relational matrix between queries and viewsets for the example
in Fig.3.13.........................................................38
Fig.3.15 The relational matrix between viewsets and views for the example
in Fig.3.13.........................................................38
Fig.3.16 The relational matrix between views and update operations for the
example in Fig.3.13.................................................38
Fig.3.17 The relational matrix between viewsets and operations for the example
in Fig.3.13.........................................................38
Fig.4.1 Case 1...............................................................43
Fig.4.2 Case 2...............................................................44
Fig.4.3 The effect of different response time constraints on final view
selection............................................................45
Fig.4.4 The effect of different storage space constraints on final view
selection............................................................46
Fig.4.5 The effect of different currency constraints on final view
selection............................................................48
Fig.4.6 The effect of different probabilities that the response time is
within the response time constraint on final view selection..........49
Fig.4.7 The improvement of solution running time by greedy algorithm under
different number of viewsets.........................................51
Fig.C.1 The relational matrix between queries and viewsets for case 2........67
Fig.C.2 The relational matrix between viewsets and views for case 2..........67
Fig.C.3 The relational matrix between views and update operations for case 2.68
Fig.C.4 The relational matrix between viewsets and operations for case 2.....68
Fig.C.5 Case 3...............................................................69
Fig.C.6 The relational matrix between queries and viewsets for case 3........69
Fig.C.7 The relational matrix between viewsets and views for case 3..........70
Fig.C.8 The relational matrix between views and update operations for case 3.70
Fig.C.9 The relational matrix between viewsets and operations for case 3.....71
Fig.C.10 Case 4..............................................................72
Fig.C.11 The relational matrix between queries and viewsets for case 4.......72
Fig.C.12 The relational matrix between viewsets and views for case 4.........73
Fig.C.13 The relational matrix between views and update operations for
case 4..............................................................73
Fig.C.14 The relational matrix between viewsets and operations for case 4....74
Fig.C.15 Case 5..............................................................75
Fig.C.16 The relational matrix between queries and viewsets for case 5.......75
Fig.C.17 The relational matrix between viewsets and views for case 5.........76
Fig.C.18 The relational matrix between views and update operations for
case 5..............................................................76
Fig.C.19 The relational matrix between viewsets and operations for case 5....77
Fig.C.20 Case 6..............................................................78
Fig.C.21 The relational matrix between queries and viewsets for case 6.......78
Fig.C.22 The relational matrix between viewsets and views for case 6.........79
Fig.C.23 The relational matrix between views and update operations for
case 6..............................................................79
Fig.C.24 The relational matrix between viewsets and operations for
case 6..............................................................80
Fig.C.25 Case 7..............................................................81
Fig.C.26 The relational matrix between queries and viewsets for case 7.......82
Fig.C.27 The relational matrix between viewsets and views for case 7.........83
Fig.C.28 The relational matrix between views and update operations for
case 7..............................................................84
Fig.C.29 The relational matrix between viewsets and operations for
case 7..............................................................85
List of Tables
Table 2.1 Comparisons of literatures in framework............................10
Table 2.2 Comparisons of literatures in cost model...........................12
Table 2.3 Comparisons of literatures in algorithm............................15
Table 3.1 Modification of cost and Temp_SET_2 in the first loop..............39
Table 3.2 Modification of cost and Temp_SET_2 in the second loop.............40
Table 4.1 The effect of different response time constraints on final view
selection..........................................................45
Table 4.2 The effect of different storage space constraints on final view
selection..........................................................46
Table 4.3 Reasonable range of storage space constraint.......................47
Table 4.4 The effect of different currency constraints on final view
selection..........................................................47
Table 4.5 The effect of different probabilities that the response time is
within the response time constraint on final view selection........49
Table 4.6 Quality evaluation for model and algorithm in case 1...............50
Table 4.7 Quality evaluation for model and algorithm in case 2...............50
Table 4.8 Applicability evaluation for model and algorithm...................51
參考文獻 Reference
Allen, A. O., Probability, Statistics, and Queuing Theory with computer science applications, Academic Press, 1978
Baralis, E., S. Paraboschi, E. Teniente, “Materialized view selection in a multidimensional database”, 23rd VLDB Conference, pp. 156-165, 1997
Chaudhuri, S., U. Dayal, “An Overview of Data Warehousing and OLAP Technology”, http://www.cs.sfu.ca/CourseCentral/459/han/papers/chaudhuri97.pdf
Chen, J. W., “Determine the Planning Strategy for BOM Items in the Context of MTO Environment: Model and Application”, Master thesis, National Central University, 2003
Colby, L. S., A. Kawaguchi, D. F. Lieuwen, I. S. Mumick, K. A. Ross, “Supporting Multiple View Maintenance Policies”, ACM SIGMOD Conference, pp. 405-416, May 1997.
Fourer, R., D. M. Gay, B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, The Scientific Press Series, 1993.
Fourer, R., D. M. Gay, B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming with AMPL Plus Student Edition for Microsoft Windows, Duxbury Press, 1997.
Gupta, A., J. A. Blakeley, “Using partial information to update materialized views” Information Systems Vol. 20, No. 8 pp. 641-662, 1995
Gupta, H., V. Harinarayan, A. Rajaraman, J. D. Ullman, “Index Selection for OLAP”, In Proc. Of the Intl. Conf. on Data Engineering, 1997
Gupta, H. “Selections of views to materialize in a data warehouse”, in ICDT, 1997
Gupta, H., I. S. Mumick, “Selections of views to materialize under a maintenance cost constraint”, Proceedings of the International Conference on Data Engineering, 1998
Harinarayan, V., A. Rajaraman, J. D.Ullman, “Implementing data cubes efficiently”, ACM Sigmod, June 1996
Horng, J. T., C. W. Chen, “A mechanism for view consistency in a data warehousing system”, The journal of systems and software (56), pp. 23-37, 2001
Horng, J. T., Y. J. Chang, B. J. Lin, C. Y. Kao, “Materialized View Selection Using Genetic Algorithms in a Data Warehouse System”, IEEE, pp. 2221-2227, 1999
Inmon, W. H., Building the Data Warehouse, John Wiley & Sons, Inc., 3rd edition
Kuno, H. A., E. A. Rundensteiner, ”Materialized object–oriented views in multiview”, IEEE, 1995
Kuno, H. A., E. A. Rundensteiner, “Using object oriented principles to optimize update propagation to materialized views”, IEEE, pp. 310-317, 1996
Kuno, H. A., E. A. Rundensteiner, “Incremental maintenance of materialized views in multiview: Strategies and performance evaluation”, IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No.5, pp. 768-792, September/October 1998
Martin, J., System Analysis for Data Transmission, Prentice-Hall, Englewood Cliffs, New Jersey, 1972
Mistry, H., P. Roy, S. Sudarshan, K. Ramamritham, “Materialized view selection and maintenance using multi-query optimization”, http://www.cse.iitb.ac.in/~krithi/papers/ sigmod2001_viewmaint.pdf, 2000
Patel, J. K., C. B. Read, Handbook of the Normal Distribution, Marcel Dekker, 1982
Rajagopalan, S., “Make to Order or Make to Stock: Model and Application”, Management Science Vol. 48, No.2, pp. 241-256, February 2002.
Ross, S. M., Introduction to Probability Models, 8th edition, Academic Press, 2003
Salem, K., K. S. Beyer, R. Cochrane, B.G. Lindsay, “How To Roll a Join: Asynchronous Incremental View Maintenance”, SIGMOD Conference, pp. 129-140,2000..
Segev, A., W. Fang, “Currency based updates to distributed materialized views”, IEEE, pp.512-520, 1990
Segev, A., W. Fang, “Optimal update policies for distributed materialized views”, Management Science Vol.37, No7, pp. 851-870, July 1991
Soutyrina, E., F. Fotouhi, “Optimal View Selection for Multidimensional Database Systems”, IEEE, pp. 45-52, 1997
Theodoratos, D., T. Sellis, “Data warehouse configuration”, Proceedings of the 23rd VLDB Conference, pp. 126-135, 1997
Theodoratos, D., T. Sellis, “Designing data warehouses”, Elsevier, pp. 279-301, June 1999
Theodoratos, D., M. Bouzeghoub, “Data Currency Quality Satisfaction in the Design of Data Warehouse”, International Journal of Cooperative Information Systems Vol.10, No.3 pp. 299-326, 2001
Yang, J., K. Karlapalem, Q. Li, “A framework for designing materialized views in data warehousing environment”, Technical Report HKUST-CS96-35, October 1996
Yang, J., K. Karlapalem, Q. Li, “Algorithms for Materialized View Design in Data Warehouse Environment”, Proceedings of the 23rd VLDB Conference, pp. 136-145, 1997
Zhang, C., X. Yao and J. Yang, “Evolving Materialized Views in Data Warehouse”, IEEE, pp. 823- 829, 1999
Zhang, C., X. Yao, Senior Member, IEEE, J. Yang, “An Evolutionary Approach to Materialized Views Selection in a Data Warehouse Environment”, IEEE Transactions on Systems, Man, and Cybernetics-Part C: Application and Reviews, Vol. 31, No. 3, pp. 282-294, August 2001
Zhuge, Y., G. M. Hector, J. Hammer, J. Widom, “View Maintenance in a Warehousing Environment”, Technical Report, Stanford University, 1995
指導教授 沈國基(Gwo-Ji Sheen) 審核日期 2004-6-28
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡