以自我組織特徵映射圖為基礎之
模糊系統實作連續性Q-learning

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：103

、訪客IP：18.118.37.154

姓名

陳律宇(Lu-Yu Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

以自我組織特徵映射圖為基礎之模糊系統實作連續性Q-learning
(A SOM-based Fuzzy Systems Q-learning in Continuous State and Action Space )

相關論文

★ 以Q-學習法為基礎之群體智慧演算法及其應用	★ 發展遲緩兒童之復健系統研製
★ 從認知風格角度比較教師評量與同儕互評之差異：從英語寫作到遊戲製作	★ 基於檢驗數值的糖尿病腎病變預測模型
★ 模糊類神經網路為架構之遙測影像分類器設計	★ 複合式群聚演算法
★ 身心障礙者輔具之研製	★ 指紋分類器之研究
★ 背光影像補償及色彩減量之研究	★ 類神經網路於營利事業所得稅選案之應用
★ 一個新的線上學習系統及其於稅務選案上之應用	★ 人眼追蹤系統及其於人機介面之應用
★ 結合群體智慧與自我組織映射圖的資料視覺化研究	★ 追瞳系統之研發於身障者之人機介面應用
★ 以類免疫系統為基礎之線上學習類神經模糊系統及其應用	★ 基因演算法於語音聲紋解攪拌之應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

所謂的增強式學習法（Reinforcement Learning），就是訓練對象與環境互動的過程中，不藉助監督者提供完整的指令下，可以自行發掘在各種狀態下該採取什麼行動才能獲得最大報酬。而Q-learning 是一種常見的增強式學習法，藉由建立每一個狀態對應每一個動作之Q值的查詢表（look-up table），Q-learning 可以順利的處理存在少量離散狀態與動作空間的問題上。但當處理的問題擁有大量的狀態與動作時，所要建立的查詢表便會十分的巨大，所以此種對於每一個狀態－動作建立查詢表的方法便顯得不可行。本論文提出一個以自我組織特
徵映射網路（Self-Organization Feature Map network, SOM network）為基礎的模糊系統來實作Q-learning，並以此方法來設計控制系統。為了加速訓練的過程，本論文結合任務分解（task decomposition）與自動任務分解的機制來處理複雜的任務。藉由機器人的模擬實驗，可以看出此方法的有效性。

摘要(英)

In reinforcement learning, there is no supervisor to critically judge the chosen action at each step. The learning is through a trial-and-error procedure interacting with a dynamic environment. Q-learning is one popular approach to reinforcement learning. It is widely applied to problems with discrete states and actions and usually implemented by a look-up table where each item corresponds to a combination of a state
and an action. However, the look-up table plementation of Q-learning fails in problems with continuous state and action space because an
exhaustive enumeration of all state-action pairs is impossible. In this thesis, an implementation of Q-learning for solving problems with continuous state and action space using SOM-based fuzzy systems is proposed. Simulations of training a robot to complete two different tasks
are used to demonstrate the effectiveness of the proposed approach. Reinforcement learning usually is a slow process. In order to accelerate
the learning procedure, a hybrid approach which integrates the advantages of the ideas of hierarchical learning and the progressive learning to decompose a complex task into simple elementary tasks is proposed.

關鍵字(中)

★ 任務分解
★ 連續性Q-learning
★ 增強式學習
★ 自我組織特徵映射圖

關鍵字(英)

★ continuous Q-learning
★ task decomposition
★ self-organizing feature map
★ reinforcement learning

論文目次

第一章　緒論 1
1.1 研究動機 1
1.2 研究目標 1
1.3 論文架構 2
第二章　增強式學習法與Q-LEARNING 3
2.1 增強式學習法 3
2.2 馬可夫決策程序 4
2.3 價值函數 5
2.4　Q-LEARNING 7
2.4.1 簡介 7
2.4.2　實驗結果 8
2.5　Q-LEARNING的缺點 9
2.5.1 巨大的狀態與動作空間 9
2.5.2 連續的狀態與動作 10
2.6　相關研究 10
2.6.1 QCON 11
2.6.2 CMAC-based Q-learning 11
2.6.3 Q-KOHON 12
2.7 結論 13
第三章　以SOM為基礎之模糊系統 14
3.1　模糊系統 14
3.1.1　模糊控制 14
3.1.2 模糊化類神經網路 15
3.1.3 效能與計算量 16
3.2 自我組織特徵映射演算法 17
3.2.1 簡介 17
3.2.2 演算法流程 17
3.2.3 SOM演算法的初始化 19
3.3　以SOM為基礎之模糊系統 21
3.3.1 基本架構 21
3.3.2 訓練方式 22
第四章研究方法與步驟 24
4.1 SOMFUS-Q 24
4.1.1 基本架構 24
4.1.2 訓練步驟 25
4.1.3 歸納能力 26
4.2 任務分解 29
4.3 自動任務分解 30
4.3.1 適應共振理論簡介 30
4.3.2 自動任務分解機制 31
第五章　機器人導航模擬實驗 33
5.1 實驗說明 33
5.1.1　實驗環境 33
5.1.2　機器人 33
5.2 基本行為訓練 35
5.2.1 基本行為1：沿牆行進 35
5.2.2 基本行為2：尋找光源 37
5.2.3 基本行為3：前往指定位置 38
5.3　任務分解實驗 39
5.3.1 找尋充電器 39
5.3.2 貨物收集 41
5.4 自動任務分解實驗 42
5.5 實驗結果分析 43
5.5.1 任務分解優缺點 44
5.5.2　自動任務分解優缺點 44
第六章結論與展望 46
6.1 結論 46
6.2 未來展望 46
參考文獻 48

參考文獻

[1] M. N. Ahmadabadi and M. Asadpur, “Expertness Based Cooperative Q-Learning,” IEEE Transactions on Systems, Man, and Cybernetics-part B: Cybernetics, vol. 32, no. 1, Feb 2002.
[2] J. S. Albus, “A new approach to manipulator control: the cerebrellar model articulated controller(CMAC),” Journal of Dynamic Systems, Measurement and Control, pp. 220-227, 1997.
[3] G. A. Carpenter and S. Grossberg, “A massively parallel architecture for a self-organizing neural pattern recognition machine,” Comput. Vision Graphics Image Process, vol. 37, pp. 54-115, 1987.
[4] G. A. Carpenter and S. Grossberg, “ART 2: Self-organization of stable category recognition codes for analog input patterns,” Appl. Opt., vol. 26, pp. 4919-4930, 1987.
[5] G. A. Carpenter and S. Grossberg, “The ART of adaptive pattern recognition by a self-organization neural network,” computer, vol. 21, no. 3, pp. 77-88, 1988.
[6] G. A. Carpenter and S. Grossberg, “ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures,” Neural Networks, vol. 3, no. 2, pp. 129-152, 1990.
[7] C. Gaskett, D. Wettergreen, and A. Zelinsky, “Q-learning in continuous state and action spaces,” 12th Australian Joint Conference on Artificial Intelligence, Australia, 1999.
[8] P. Y. Glorennec, “Fuzzy Q-learning and Dynamical Fuzzy Q-Learning,” Proc. of 3rd IEEE International Conference on Fuzzy Systems, pp. 474-479, USA, 1994.
[9] P. Y. Glorennec and L. Jouffe, “Fuzzy Q-Learning,” Proc. Of 6th IEEE International Conference on Fuzzy Systems, pp. 659-662. Spain, 1997.
[10] H.-M. Gross, V. Stephan, and M. Krabbes. “A neural field approach to topological reinforcement learning in continuous action spaces, ” Proc. 1998 IEEE World Congress on Computational Intelligence, WCCI'98 and International Joint Conference on Neural Networks, IJCNN'98, Anchorage, Alaska, 1998.
[11] J. Hollatz, “Fuzzy identification using methods of intelligent data analysis,” Fuzzy Model Identification, H. Hellen doorn and D. Driankov, Eds., Springer-verlag, Berlin, 1997, pp. 166-191.
[12] T. Horiuchi, A. Fujino, O. Katai, and T. Sawaragi, “Fuzzy Interpolation-Based Q-Learning with Continuous States and Actions,” Proc. of 5th IEEE International Conference on Fuzzy Systems, pp. 594-600, USA, 1996.
[13] J.-S. R. Jang, “ANFIS : Adaptive-network-based fuzzy inference systems,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 23, no. 3, pp. 665-685, 1993.
[14] J.-S. R. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy And Soft Computing, Prentice-Hall International, Inc., 1997.
[15] L. Jouffe and P. Y. Glorennec, “Comparison between Connectionist and Fuzzy Q-learning,” Proc. of 4th International Conference on Sofr Computing, pp. 557-560, Japan, 1996.
[16] T. Kohonen, Self-Organization and Associative Memory, Springer, Berlin, third edition, 1989.
[17] T. Kohonen, Self-organizing Maps, Springer-Verlag, Berlin, 1995.
[18] C.-T. Lin and C. S. G. Lee, “Neural-network-based fuzzy logic control and decision system,” IEEE Trans. on Computers, vol. 40, no. 12, pp. 1320-1336, 1991.
[19] C. -T. Lin, and C. S. G. Lee, Neural fuzzy System: A Neuro–Fuzzy Synergism to Intelligent Systems, Prentice-Hall International, Inc., 1996.
[20] L.-J. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning,” vol.8, no.3, 1992.
[21] L. -J. Lin and T. -M. Mitchell, “Reinforcement Learning with Hidden States,” Animals to Animats 2, MIT Press, pp. 271-280, 1993.
[22] L. -J. Lin, “Hierarchical learning of robots skills by reinforcement,” Neural Networks, IEEE International Conference, pp. 181-186, 1993.
[23] E. H. Mamdani and S. Assilian, “An experiment in linguistic synthesis with a fuzzy logic controller,” Int. Journal of Man-Machine Studies, vol. 7, no. 1, pp. 1-13, 1975.
[24] J. Moody and C. Darken, “Fast learning in networks of local-tuned processing units,” Neural Comput., vol. 1, pp. 281-294, 1989.
[25] C.-H. Oh, T. Nakashima, and H. Ishibuchi, “Initialization of Q-values by fuzzy rules for accelerating Q-learning,” IEEE International Joint Conference, vol. 3, no. 4-9, pp. 2051-2056, 1998.
[26] M. J. D. Powell, “Radial basis functions for multivariable interpolation: A review,” Algorithms for Approximation, eds., J.C. Mason and M.G. Cox, Oxford : Oxford University Press, 1987, pp. 143-167.
[27] G. A. Rummery, Problem solving with reinforcement learning, PhD thesis, Cambridge University, 1995.
[28] F. Saito and T. Fukuda, “Learning architecture for real robot systems—extension of connectionist Q-learning for continuous robot control domain,” Proceedings of the International Conference on Robotics and Automation（IROS’94), pp. 27-32, 1994.
[29] J. C. Santamaria, R. S. Sutton, and A. Ram. “Experiments with reinforcement learning in problems with continuous state and action spaces,” Adaptive Behaviour, vol. 6, no. 2, pp. 163-218, 1998.
[30] S. Sehad and C. Touzet, “Self-organising map for reinforcement learning: Obstacle avoidance with Khepera,” Proceedings of Perception to Action, Lausanne, Switzerland, 1994.
[31] M. C. Su, “Identification of singleton fuzzy models via fuzzy hyperrectangular composite NN,” Fuzzy Model Identification : Selected Approaches, H, Hellendoorn and D. Driankov Eds., Springer, Berlin, Germany, 1997, pp. 193-212.
[32] M. C. Su, C. W. Liu, S. S. Tsay, “Neural-network-based fuzzy model and its application to transient stability prediction in power systems,” IEEE Trans on Systems, Man, and Cybernetics, vol. 29, pp. 149-157, Feb, 1999.
[33] M. C. Su, T. K. Liu, and H. T. Chang, “An efficient initialization scheme for the self-organizing feature map algorithm,” International Joint Conference on Neural Networks, Washington, D. C., 1999.
[34] M. C. Su, D. Y. Huang, C. H. Chou, and C. C. Hsieh, “A reinforcement-learning approach to robot navigation,” Networking, Sensing and Control, 2004 IEEE International Conference, vol. 1, pp. 665-669, 2004.
[35] R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, no. 8, MIT Press, 1996, pp. 1038-1044.
[36] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge, MA: MIT Press, 1998.
[37] C. F. Touzet. “Neural reinforcement learning for behaviour synthesis,” Robotics and Autonomous Systems, vol. 22, no. 3-4, pp. 251-81. 1997.
[38] L.-X. Wang and J. M. Mendel, “Back-propagation fuzzy systems as nonlinear dynamic system identifiers,” Int. Conf. on Fuzzy Systems, San Diego, 1992.
[39] L.-X. Wang, Adaptive Fuzzy Systems and Control: Design and Stability Analysis, Prentice Hall, Englowood Cliffs, NJ., 1994.
[40] L. X. Wang, A Course in Fuzzy Systems and Control, Prentice Hall, Inc., 1997.
[41] C. J. C. H. Watkins, and P. Dayan, “Technical note: Q learning,” Machine Learning, vol. 8, no. 3, pp. 279-292, 1992.
[42] P. J. Werbos. “Approximate dynamic programming for real-time control and neural modeling,” Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A., Sofge Van Nostrand Reinhold, 1992.
[43] R. Carter, Mapping the Mind, 洪蘭，譯，大腦的秘密檔案，遠流出版公司，2002.
[44] 蘇木春，張孝德著，機器學習：類神經網路、模糊系統以及基因演算法則，全華科技圖書股份有限公司。

指導教授

蘇木春(Mu-Chun Su)

審核日期

2006-7-21

推文