以基因演算法探討 GSP 參數之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：84

、訪客IP：3.148.106.31

姓名

鄭洧奇(Wei-ci Jheng) 查詢紙本館藏

畢業系所

企業管理學系

論文名稱

以基因演算法探討 GSP 參數之研究
(Tuning GSP parameters with GA)

相關論文

★ 在社群網站上作互動推薦及研究使用者行為對其效果之影響	★ 以AHP法探討伺服器品牌大廠的供應商遴選指標的權重決定分析
★ 以AHP法探討智慧型手機產業營運中心區位選擇考量關鍵因素之研究	★ 太陽能光電產業經營績效評估－應用資料包絡分析法
★ 建構國家太陽能電池產業競爭力比較模式之研究	★ 以序列採礦方法探討景氣指標與進出口值的關聯
★ ERP專案成員組合對績效影響之研究	★ 推薦期刊文章至適合學科類別之研究
★ 品牌故事分析與比較-以古早味美食產業為例	★ 以方法目的鏈比較Starbucks與Cama吸引消費者購買因素
★ 探討創意店家創業價值之研究- 以赤峰街、民生社區為例	★ 以領先指標預測企業長短期借款變化之研究
★ 應用層級分析法遴選電競筆記型電腦鍵盤供應商之關鍵因子探討	★ 以互惠及利他行為探討信任關係對知識分享之影響
★ 結合人格特質與海報主色以類神經網路推薦電影之研究	★ 資料視覺化圖表與議題之關聯

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

中文摘要
在資料探勘的領域中，關聯法則可以顯示出當顧客購買產品時，哪些產品會同時被
購買，學者利用此特性發展出購物籃分析法則，來為企業擬訂銷售上的策略。
如同大家所知，資料無時無刻都在改變，當新的資料產生時，舊的資料將被取代。
在資料庫中，時間就成為一個非常重要的屬性，伴隨而生的探勘工具，稱之為序列挖掘
模式(GSP)。
GSP 法即是利用時間戳記的屬性，來找到具有序列模式的產品組合。然而，GSP 法
的參數是透過使用者自行輸入的，運算的結果可能會因為參數設置不當，導致每次運算
結果不穩定。本研究使用參數庫的設置結合 GSP 法以及基因演算法，透過不斷地演化改
進，找到適當參數使得結果越趨穩定。
本實驗以一中型超市驗證結果，發現與隨機輸入參數進行比較後，本研究所提出的
方法所找到的參數明顯優於隨機設定的參數。

關鍵字：序列模式挖掘、GSP 法、基因演算法

摘要(英)

Tuning GSP parameters with GA

ABSTRACT
In data mining, association rules can be shown when customers buy products, which
products will be purchased at the same time. Scholars use this feature to develop market basket
analysis to formulate marketing strategies for business.
As we know, the data are changing all the time. When new data generate, the old data will
be replaced. In the database, time become a very important attribute. And new data mining
method have been proposed, called generalized sequential patterns (GSP).
GSP uses time stamp to find the product portfolio with sequential patterns. However, the
GSP parameter is user-defined. The result of the operation may be unstable, because of the
parameter setting incorrectly. Tuning the parameters used in this study combined GSP and
genetic algorithm (GA) to improve the result continuously, to find the appropriate parameters.
In the experiment, we use a medium-sized supermarket verify the results and found that
after comparing with random input parameters, the parameters of the proposed method found
significantly better than a random set of parameters.

Keywords：Sequential pattern mining、GSP、GA

關鍵字(中)

★ 序列模式挖掘
★ GSP法
★ 基因演算法

關鍵字(英)

★ Sequential pattern mining
★ GSP
★ GA

論文目次

iv

目錄
中文摘要 ..................................................................................................................................... i
英文摘要 .................................................................................................................................... ii
誌謝 ........................................................................................................................................... iii
目錄 ........................................................................................................................................... iv
圖目錄 ....................................................................................................................................... vi
表目錄 ...................................................................................................................................... vii
第一章緒論 .............................................................................................................................. 1
第二章文獻探討 ...................................................................................................................... 5
2-1 時間序列資料(time-series data) .................................................................................. 5
2-2 序列模式挖掘(sequential patterns mining ) ................................................................ 7
2-3 基因演算法(genetic algorithm) ................................................................................. 11
第三章方法架構 .................................................................................................................... 13
3-1 研究架構 .................................................................................................................... 13
3-2 資料型態 .................................................................................................................... 15
3-3 產生參數組合 ............................................................................................................ 17
3-4 GSP 法 ....................................................................................................................... 18
3-5 計算適應函數 ............................................................................................................ 20
3-6 產生新染色體之過程 ................................................................................................ 21
3-6-1 選擇 SELECTION .............................................................................................. 21
3-6-2 交換 CROSSOVER ............................................................................................ 23
3-6-3 突變 MUTATION ............................................................................................... 24
3-7 找出表現較佳之可行解 ............................................................................................ 25
3-8 產生計算結果 ............................................................................................................ 27
3-9 參數庫設置 ................................................................................................................ 27
第四章研究分析與結果 ........................................................................................................ 29 4-1 資料描述與處理 ........................................................................................................ 29
4-2 實驗設計 .................................................................................................................... 30
4-2-1 參數配置 ............................................................................................................ 30
4-2-2 參數庫設置 ........................................................................................................ 31
4-2-3 結果評估 ............................................................................................................ 31
4-2-4 實驗參數設計 .................................................................................................... 32
4-3 實驗結果 .................................................................................................................... 32
4-3-1 整體實驗結果 ................................................................................................... 32
4-3-2 實驗結果參數組合測試 ................................................................................... 36
4-3-3 不同產品實驗結果 ........................................................................................... 38
第五章結論與建議 ................................................................................................................ 41
5-1 結論 ............................................................................................................................ 41
5-2 未來建議以及研究方向 ............................................................................................ 42
參考文獻 .................................................................................................................................. 44
附件 1 其他產品對應之產品 ................................................................................................. 49

參考文獻

參考文獻
一、中文部分
1. 吳孔玲, 繆裕青, 蘇傑, & 張曉華. (2012). 序列模式挖掘研究. 電腦系統應用, (6),
263-271.
2. 林昇甫、徐永吉，”遺傳演算法及其應用”，五南文化事業，2009
3. 胡世忠，“雲端時代的殺手級應用：Big Data 海量資料分析”，天下雜誌，2013
4. 馬豔花. (2010). 淺析購物籃分析在超市經營管理中的應用——以寧波加貝購物俱
樂部為例. 企業家天地: 下旬刊, (1), 181-182.
5. 經濟部投資業務處：流通服務產業分析及投資機會。 2008 年 2 月，取自
http://www.taiwantrade.com.tw/MAIN/resources/MAIN/TC/ATTACH/industry/09Retaili
ng%20Services_CN.pdf

二、英文部分
6. Agrawal, R., & Srikant, R. (1994, September). Fast algorithms for mining association rules.
In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499).
7. Agrawal, R., & Srikant, R. (1995, March). Mining sequential patterns. In Data
Engineering, 1995. Proceedings of the Eleventh International Conference on(pp. 3-14).
IEEE.
8. Aickelin, U., & Dowsland, K. (2008). Exploiting problem structure in a genetic algorithm
approach to a nurse rostering problem. arXiv preprint arXiv:0802.2001.
9. Aloysius, G., & Binu, D. (2013). An approach to products placement in supermarkets
using PrefixSpan algorithm. Journal of King Saud University-Computer and Information
Sciences, 25(1), 77-87.
10. Balland, L., Estel, L., Cosmao, J. M., & Mouhab, N. (2000). A genetic algorithm with
decimal coding for the estimation of kinetic and energetic parameters.Chemometrics and
Intelligent Laboratory Systems, 50(1), 121-135.
11. Brijs, T., Swinnen, G., Vanhoof, K., & Wets, G. (1999, August). Using association rules
for product assortment decisions: A case study. InProceedings of the fifth ACM SIGKDD
international conference on Knowledge discovery and data mining (pp. 254-260). ACM.
12. C. Fiot, A. Laurent, and M. Teisseire. Extended time constraints for generalized sequential
patterns. Technical Report 6051, LIRMM, 2005.
13. Chapman, C. D., Saitou, K., & Jakiela, M. J. (1994). Genetic algorithms as an approach to
configuration and topology design. Journal of Mechanical Design,116(4), 1005-1012.
14. Chen, M. C., Chiu, A. L., & Chang, H. H. (2005). Mining changes in customer behavior
in retail marketing. Expert Systems with Applications, 28(4), 773-781.
15. Chen, Y. L., Tang, K., Shen, R. J., & Hu, Y. H. (2005). Market basket analysis in a multiple
store environment. Decision support systems, 40(2), 339-354.
16. Chiang, D. A., Wang, Y. H., & Chen, S. P. (2010). Analysis on repeat-buying
patterns. Knowledge-Based Systems, 23(8), 757-768.
17. De Jong, K. A., & Spears, W. M. (1991). An analysis of the interacting roles of population
size and crossover in genetic algorithms. In Parallel problem solving from nature (pp. 38-
47). Springer Berlin Heidelberg.
18. Gates Jr, G. H., Merkle, L. D., Lamont, G. B., & Pachter, R. (1995, November). Simple
genetic algorithm parameter selection for protein structure prediction. InEvolutionary
Computation, 1995., IEEE International Conference on (Vol. 2, pp. 620-624). IEEE.
19. Goldberg, D. E., & Holland, J. H. (1988). Genetic algorithms and machine
learning. Machine learning, 3(2), 95-99.
20. Green, G. M., & Park, J. L. (1998). New insights into supermarket promotions via scanner
data analysis: the case of milk. Journal of Food Distribution Research, 29, 44-53.22
21. Grefenstette, J. J. (1986). Optimization of control parameters for genetic
algorithms. Systems, Man and Cybernetics, IEEE Transactions on, 16(1), 122-128.
22. Guil, F., & Marín, R. (2012). A tree structure for event-based sequence mining.Knowledge-
Based Systems, 35, 186-200.
23. Han, J., Kamber, M., & Pei, J. (2006). Data mining: concepts and techniques. Morgan
kaufmann.
24. Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., & Hsu, M. C. (2000, August).
FreeSpan: frequent pattern-projected sequential pattern mining. InProceedings of the sixth
ACM SIGKDD international conference on Knowledge discovery and data mining (pp.
355-359). ACM.
25. Hirate, Y., & Yamana, H. (2006). Generalized sequential pattern mining with item
intervals. Journal of computers, 1(3), 51-60.
26. Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory
analysis with applications to biology, control, and artificial intelligence. U Michigan Press.
27. Huang, K. Y., Chang, C. H., & Lin, K. Z. (2008). Efficient Discovery of Frequent
Continuities by Projected Window List Technology. Journal of Information Science &
Engineering, 24(4).
28. Jung-Te Wang, Maw-Sheng. Chern and Dar-Li Yang (2001) "A two-machine multi-family
flowshop scheduling problem with two batch processors.," Journal of the Chinese Institute
of Industrial Engineers 18, No. 3, pp. 77-85.
29. Kane, C., & Schoenauer, M. (1996). Topological optimum design using genetic
algorithms. Control and Cybernetics, 25, 1059-1088.
30. Kantardzic, M. (2011). Data mining: concepts, models, methods, and algorithms. John
Wiley & Sons.
31. Kotler, P., & Keller, K. L. (2006). Marketing Insight: Experiential Marketing.Marketing
Management, 12th edition, Pearson Education, Inc., New Jersey.
32. Lee, M. A., & Takagi, H. (1993). Integrating design stage of fuzzy systems using genetic
algorithms. In Fuzzy Systems, 1993., Second IEEE International Conference on (pp. [12-
617). IEEE.
33. Li, H., Fang, L., Wang, P., & Yan, J. (2013, July). Longitudinal Data Based Research on
Web User Interests Drift Modeling. In 2nd International Conference on Advances in
Computer Science and Engineering (CSE 2013). Atlantis Press.
34. Lin, C. J., & Lin, C. T. (1997). An ART-based fuzzy adaptive learning control
network. Fuzzy Systems, IEEE Transactions on, 5(4), 477-496.
35. Lin, M. Y., & Lee, S. Y. (1998, November). Incremental update on sequential patterns in
large databases. In Tools with Artificial Intelligence, 1998. Proceedings. Tenth IEEE
International Conference on (pp. 24-31). IEEE.
36. Lin, M. Y., Lee, S. Y., & Wang, S. S. (2002). DELISP: efficient discovery of generalized
sequential patterns by delimited pattern-growth technology. InAdvances in Knowledge
Discovery and Data Mining (pp. 198-209). Springer Berlin Heidelberg.
37. Liu, Y., & Wang, C. (1999). A modified genetic algorithm based optimisation of milling
parameters. The International Journal of Advanced Manufacturing Technology, 15(11),
796-799.
38. Masseglia, F., Cathala, F., & Poncelet, P. (1998). The psp approach for mining sequential
patterns. In Principles of Data Mining and Knowledge Discovery (pp. 176-184). Springer
Berlin Heidelberg.
39. Masseglia, F., Poncelet, P., & Teisseire, M. (2004, July). Pre-processing time constraints
for efficiently mining generalized sequential patterns. In Temporal Representation and
Reasoning, 2004. TIME 2004. Proceedings. 11th International Symposium on (pp. 87-95).
IEEE.
40. Nakanishi, Y. (2001). Application of homology theory to topology optimization of three-
dimensional structures using genetic algorithm. Computer Methods in Applied Mechanics
and Engineering, 190(29), 3849-3863.
41. Oracle: Big Data for the Enterprise。2013 年 6 月，取自 An Oracle White Paper
http://www.oracle.com/us/products/database/big-data-for-enterprise-519135.pdf
42. Park, T. Y., & Froment, G. F. (1998). A hybrid genetic algorithm for the estimation of
parameters in detailed kinetic models. Computers & Chemical Engineering, 22, S103-
S110.
43. Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., ... & Hsu, M. C. (2004).
Mining sequential patterns by pattern-growth: The prefixspan approach. Knowledge and
Data Engineering, IEEE Transactions on, 16(11), 1424-1440.
44. Qin, L. X., & Shi, Z. Z. (2006). Efficiently mining association rules from time
series. International Journal of Information Technology, 12(4), 30-38.
45. Ren, J. D., Cheng, Y. B., & Yang, L. L. (2004, August). An algorithm for mining
generalized sequential patterns. In Machine Learning and Cybernetics, 2004. Proceedings
of 2004 International Conference on (Vol. 2, pp. 1288-1292). IEEE.
46. Russell, G. J., & Petersen, A. (2000). Analysis of cross category dependence in market
basket selection. Journal of Retailing, 76(3), 367-392.
47. Shao-qian, Y. U. (2011). Improved of Apriori algorithm and Appl ication of Data Mining
in Supermarket. Microcomputer Information, 11, 067.
48. Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and
performance improvements (pp. 1-17). Springer Berlin Heidelberg.
49. Wu, C. H., Tzeng, G. H., Goo, Y. J., & Fang, W. C. (2007). A real-valued genetic algorithm
to optimize the parameters of support vector machine for predicting bankruptcy. Expert
systems with applications, 32(2), 397-408.
50. Xiang, C., & Xiong, S. (2011, July). The GSP algorithm in dynamic cost prediction of
enterprise. In Natural Computation (ICNC), 2011 Seventh International Conference
on (Vol. 4, pp. 2309-2312). IEEE.
51. Yu, C. C., & Chen, Y. L. (2005). Mining sequential patterns from multidimensional
sequence data. Knowledge and Data Engineering, IEEE Transactions on, 17(1), 136-140.
52. Zaki, M. J. (1997). Fast mining of sequential patterns in very large databases.University of
Rochester Computer Science Department, New York.
53. Zaki, M. J. (2001). SPADE: An efficient algorithm for mining frequent
sequences. Machine learning, 42(1-2), 31-60.
54. Zhao, Q., & Bhowmick, S. S. (2003). Sequential pattern mining: A survey.ITechnical
Report CAIS Nayang Technological University Singapore, 1-26.
55. Zou, Y., Mi, Z., & Xu, M. (2006, June). Dynamic load balancing based on roulette wheel
selection. In Communications, Circuits and Systems Proceedings, 2006 International
Conference on (Vol. 3, pp. 1732-1734). IEEE.

指導教授

許秉瑜(Ping-Yu Hsu)

審核日期

2014-7-16

推文