以作者查詢圖書館館藏 、以作者查詢臺灣博碩士 、以作者查詢全國書目 、勘誤回報 、線上人數:24 、訪客IP:3.143.22.23
姓名 洪孟嶢(Meng-Yao Hong) 查詢紙本館藏 畢業系所 電機工程學系 論文名稱 記憶體式快速傅立葉轉換運算架構的普遍化不定資料分布存取方案
(A Generalized Inconstant-distribution Access Scheme for Memory-based FFT Architecture)相關論文 檔案 [Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載]
- 本電子論文使用權限為同意立即開放。
- 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
- 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
摘要(中) 實現 Cooley-Tukey 演算法的硬體運算架構主要分為管線式 (Pipelined) 與記憶體式 (Memory-based) 兩個方向,在吞吐量 (Throughput) 許可的前提下,低硬體成本的記憶體式運算架構會是優先的選擇,而記憶體存取方案 (Memory access scheme) 則為此運算架構的核心,並在多層面影響硬體行為的特徵,故成為本論文的研究對象。過去的研究從固定資料分布 (Constant data distribution) 開始發展,對有序輸入的資料建立資料指標 (Data index) 作為分布資料的依據,同時對所有記憶體模組進行交換排序就成為此類研究成果的特徵;而本研究則嘗試開發不定資料分布 (Inconstant data distribution) 的方案,並從研究方法開始探討。
相較於過去研究,本研究嘗試從 SFG (Signal flow graph) 取得分布資料的資訊,並提出一個可適用於任意 Radix-(2^k) FFT 演算法的存取方案,而所配合提出的列指標 (Row index) 產生器,其硬體面積隨字組長度 (Word length) 呈線性增長 (Linear growth),適用於長點數 FFT 運算,又其傳遞延遲不隨演算法所採用的基數而變化,則將更適用於高基數 FFT 演算法。此外,本研究進一步將所提出的存取方案,應用於低記憶體面積的 Ping-pong Cache-memory 運算架構中,以驗證其適用範圍,並對記憶單元所遭遇的非理想效應進行抑制,首先抑制讀取與寫入對彼此時序的相依性,其次抑制連續 FFT 運算對彼此資料分布的相依性。
所提出的存取方案並非基於有序的資料指標,故運作過程中無須對記憶體模組進行交換排序,而是針對每個記憶體模組直接產生列指標,乃是對控制邏輯的進一步簡化;又基於所提出存取方案的資料分布特性,於實際運算架構中抑制連續 FFT 運算的資料分布相依性,而得以再次進一步簡化控制邏輯。總結本研究的核心成果,首先在於針對不定資料分布的研究提出方法,其次又基於所提出的存取方案得以在兩個層面簡化控制邏輯,說明所能帶來的新發展。
摘要(英) Under the condition of guaranteeing sufficient throughput rate, memory-based architecture with low area overhead would be preferable for executing Cooley-Tukey algorithm, thus memory access scheme is the object of this study while it is an essential issue of memory-based architecture. Data distribution is a primary consideration for access scheme, and a criterion of constant distribution is established in previous work, which means the mapping between data sequence and memory address is fixed throughout entire compute procedure. In contrast, this work attempts to develop an access scheme under a criterion of inconstant distribution and begins with developing a novel modeling method.
This work proposes an access scheme for arbitrary power-of-two radix FFT algorithm and a corresponding row index generator which is highly hardware efficient. The area overhead of generator grows linearly with word length of row index, thus it is suitable for long FFT length. On the other hand, the propagation delay of generator remains constant for arbitrary power-of-two radix, thus it is suitable for high radix FFT algorithm. Furthermore, the proposed access scheme is applied to Ping-pong Cache-memory architecture for evaluating its feasibility. At the same time, issues of eliminating non-ideal effects encountered by memorial unit are also discussed. Compared to previous work, this work extracts sufficient information of distributing data from signal flow graph, SFG, rather than ordered data indexes, hence permutation of memory modules for parallel accessing is not required; besides, data distribution dependency between FFT computations is eliminated. Thus complexity of control logic is reduced in two aspects.
In conclusion, the primary achievement of this work is proposing a novel modeling method for inconstant distribution and reducing complexity of control logic to demonstrate possible evolution which could be brought by inconstant distribution.
關鍵字(中) ★ 快速傅立葉轉換
★ 存取方案
★ 資料分布
★ 複雜度降低關鍵字(英) ★ FFT
★ Fast Fourier transform
★ Access scheme
★ Data distribution
★ Complexity reducing論文目次 摘要 i
Abstract ii
誌謝 iii
目錄 v
圖目錄 viii
表目錄 x
一、 導言 1
1-1 研究背景 1
1-1-1 數學運算的離散系統實現 1
1-1-2 離散系統的數位系統硬體實現 2
1-2 研究方向 5
1-2-1 DFT 點數降解的數學推導 5
1-2-2 Cooley-Tukey 演算法的硬體實現性 7
1-3 研究目標與論文架構 9
二、 快速傅立葉轉換運算架構 10
2-1 概論 10
2-2 管線式 (Pipelined) 運算架構 11
2-2-1 單一路徑回授線路 11
2-2-2 多重路徑交換線路 12
2-2-3 比較與總結 13
2-3 記憶體式 (Memory-based) 運算架構 14
2-3-1 單向資料路徑 14
2-3-2 雙向資料路徑 15
2-3-3 比較與總結 16
三、 記憶體存取方案 17
3-1 概論 17
3-2 固定資料分布存取方案 19
3-2-1 普遍化的 Modulo-r addition 方法 21
3-2-2 普遍化的 Bit-wise modulo-2 addition 方法與不同研究的比較 24
3-3 不定資料分布存取方案 27
3-4 所提出的存取方案 30
3-4-1 所提出的 SFG 建模方法 30
3-4-2 所提出存取方案的行為 32
3-4-3 硬體實現 35
3-5 硬體效率驗證 37
3-6 總結 38
四、 Run-time Length-variable FFT 運算核心 40
4-1 概論 40
4-2 初步設計 42
4-2-1 設計規範 42
4-2-2 硬體規劃設計 44
4-3 記憶單元的設計 47
4-3-1 初步設計 47
4-3-2 Cache 的設計 51
4-3-3 宏記憶體模組的設計 52
4-4 控制單元的設計 57
4-4-1 時序分析 57
4-4-2 運作流程規劃設計 59
4-5 FPGA 驗證 65
4-5-1 整合測試規劃 65
4-5-2 驗證結果 67
4-6 總結 69
五、 結論 70
參考文獻 72參考文獻 [1] I. J. Good, “The interaction algorithm and practical Fourier analysis,” J. Royal Stat. Soc., ser. B, vol. 20, no. 2, pp. 361–372, 1958.
[2] J. W. Cooley and J. Tukey, “An algorithm for machine calculation of complex fourier series,” in Math. Comput., vol. 19, pp. 297–301, Apr. 1965.
[3] S. Bouguezel, M. O. Ahmad, and M. N. S. Swam, “Improved radix-4 and radix-8 FFT algorithms,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2004, pp. III 561–564.
[4] S. He and M. Torkelson, ”A New approach to Pipeline FFT processor,” in Proc. Int. Parallel Processing Symp., 1996, pp. 766–770.
[5] S. He and M. Torkelson, “Designing pipeline FFT processor for OFDM (de)modulation,” in Proc. URSI Int. Symp. Signals Syst. Elect., 1998, pp. 257–262.
[6] B. G. Jo and M. H. Sunwoo, ”New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy,” IEEE Tran. on Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911–919, May 2005.
[7] S. He and M. Torkelson, ”Design and implementation of a 1024-point pipeline FFT processor,”in Proc. IEEE Custom Integrated Circuits Conf., May 1998, pp. 131–134.
[8] L. R. Rabiner and B. Gold. Theory and Application of Digital Signal Processing, NJ: Prentice-Hall, 1975.
[9] B. M. Baas, “A low power, high performance 1024-point FFT Processor,” IEEE J. Solid-State Circuits, vol. 34, no. 3, pp. 380–387, Mar 1999.
[10] Y. Chen, Y.-W. Lin, and C.-Y. Lee, “A block scaling FFT/IFFT processor for WiMAX applications,” in Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC), Nov. 2006, pp. 203–206.
[11] M. C. Pease, “Organization of large scale Fourier processors,” IEEE Trans. Circuits Syst. vol. 16, pp. 474–482, July 1969.
[12] D. Cohen, “Simplified control of FFT hardware,” IEEE Trans. Acoust., Speech Signal Processing, Vol. ASSP-24, pp. 577-579, Dec. 1976.
[13] L. G. Johnson, “Conflict free memory addressing for dedicated FFT hardware,” IEEE Trans. Circuits Syst. II, vol. 39, pp. 312–316, May 1992.
[14] C. P. Hung and S. G. Chen, “Design of an efficient variablelength FFT processor,” IEEE Int. Symp. on Circuits Syst., vol. 2, pp. 833-836, May 2004.
[15] J. H. Takala, T. S. Jarvinen, and H. T. Sorokin, “Conflict-free parallel memory access scheme for FFT processors,” IEEE Int. Symp. on Circuits Syst., vol. 4, pp. 524-527, May 2003.
[16] H. T. Sorokin and J. H. Takala, "Conflict-free parallel accessscheme for mixed-radix FFT supporting I/O permutations," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp.1709 -1712, 2011.
[17] X. Xiao, E. Oruklu, and J. Saniie, “An efficient FFT engine with reduced addressing logic,” IEEE Trans. Circuits Syst. II, vol. 55, no 11, pp. 1149–1153, November 2008.
[18] X. Xiao, E. Oruklu, and J. Saniie, “Fast memory addressing scheme for radix-4 FFT implementation,” IEEE Int. Conf. Electro/Information Tech., June 2009, pp. 437–440.
[19] “IEEE standard for broadband over power line networks: Medium access control and physical layer specifications,” IEEE Std 1901-2010, pp. 1-1586, Dec. 2010.
指導教授 薛木添(Muh-Tian Shiue) 審核日期 2015-7-28 推文 facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu 網路書籤 Google bookmarks del.icio.us hemidemi myshare