記憶體式快速傅立葉轉換運算架構的普遍化不定資料分布存取方案

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：3.143.22.23

姓名

洪孟嶢(Meng-Yao Hong) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

記憶體式快速傅立葉轉換運算架構的普遍化不定資料分布存取方案
(A Generalized Inconstant-distribution Access Scheme for Memory-based FFT Architecture)

相關論文

★ 應用於2.5G/5GBASE-T乙太網路傳收機之高成本效益迴音消除器	★ 應用於IEEE 802.3bp車用乙太網路之硬決定與軟決定里德所羅門解碼器架構與電路設計
★ 適用於 10GBASE-T 及 IEEE 802.3bz 之高速低密度同位元檢查碼解碼器設計與實現	★ 基於蛙跳演算法及穩定性準則之高成本效益迴音消除器設計
★ 運用改良型混合蛙跳演算法設計之近端串音干擾消除器	★ 運用改良粒子群最佳化演算法之近端串擾消除器電路設計
★ 應用於多兆元網速乙太網路接收機類比迴音消除器之最小均方演算法電路設計	★ 應用於數位視頻廣播系統之頻率合成器及3.1Ghz寬頻壓控震盪器
★ 地面數位電視廣播基頻接收器之載波同步設計	★ 適用於通訊系統之參數化數位訊號處理器核心
★ 以正交分頻多工系統之同步的高效能內插法技術	★ 正交分頻多工通訊中之盲目頻域等化器
★ 兆元位元率之平行化可適性決策回饋等化器設計與實作	★ 應用於數位視頻廣播系統中之自動增益放大器及接受端濾波器設計
★ OFDM Symbol Boundary Detection and Carrier Synchronization in DVB-T Baseband Receiver Design	★ 適用於億元位元率混合光纖與銅線之電信乙太接取網路技術系統之盲目等化器和時序同步電路設計

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

實現 Cooley-Tukey 演算法的硬體運算架構主要分為管線式 (Pipelined) 與記憶體式 (Memory-based) 兩個方向，在吞吐量 (Throughput) 許可的前提下，低硬體成本的記憶體式運算架構會是優先的選擇，而記憶體存取方案 (Memory access scheme) 則為此運算架構的核心，並在多層面影響硬體行為的特徵，故成為本論文的研究對象。過去的研究從固定資料分布 (Constant data distribution) 開始發展，對有序輸入的資料建立資料指標 (Data index) 作為分布資料的依據，同時對所有記憶體模組進行交換排序就成為此類研究成果的特徵；而本研究則嘗試開發不定資料分布 (Inconstant data distribution) 的方案，並從研究方法開始探討。

　　相較於過去研究，本研究嘗試從 SFG (Signal flow graph) 取得分布資料的資訊，並提出一個可適用於任意 Radix-(2^k) FFT 演算法的存取方案，而所配合提出的列指標 (Row index) 產生器，其硬體面積隨字組長度 (Word length) 呈線性增長 (Linear growth)，適用於長點數 FFT 運算，又其傳遞延遲不隨演算法所採用的基數而變化，則將更適用於高基數 FFT 演算法。此外，本研究進一步將所提出的存取方案，應用於低記憶體面積的 Ping-pong Cache-memory 運算架構中，以驗證其適用範圍，並對記憶單元所遭遇的非理想效應進行抑制，首先抑制讀取與寫入對彼此時序的相依性，其次抑制連續 FFT 運算對彼此資料分布的相依性。

　　所提出的存取方案並非基於有序的資料指標，故運作過程中無須對記憶體模組進行交換排序，而是針對每個記憶體模組直接產生列指標，乃是對控制邏輯的進一步簡化；又基於所提出存取方案的資料分布特性，於實際運算架構中抑制連續 FFT 運算的資料分布相依性，而得以再次進一步簡化控制邏輯。總結本研究的核心成果，首先在於針對不定資料分布的研究提出方法，其次又基於所提出的存取方案得以在兩個層面簡化控制邏輯，說明所能帶來的新發展。

摘要(英)

Under the condition of guaranteeing sufficient throughput rate, memory-based architecture with low area overhead would be preferable for executing Cooley-Tukey algorithm, thus memory access scheme is the object of this study while it is an essential issue of memory-based architecture. Data distribution is a primary consideration for access scheme, and a criterion of constant distribution is established in previous work, which means the mapping between data sequence and memory address is fixed throughout entire compute procedure. In contrast, this work attempts to develop an access scheme under a criterion of inconstant distribution and begins with developing a novel modeling method.

This work proposes an access scheme for arbitrary power-of-two radix FFT algorithm and a corresponding row index generator which is highly hardware efficient. The area overhead of generator grows linearly with word length of row index, thus it is suitable for long FFT length. On the other hand, the propagation delay of generator remains constant for arbitrary power-of-two radix, thus it is suitable for high radix FFT algorithm. Furthermore, the proposed access scheme is applied to Ping-pong Cache-memory architecture for evaluating its feasibility. At the same time, issues of eliminating non-ideal effects encountered by memorial unit are also discussed. Compared to previous work, this work extracts sufficient information of distributing data from signal flow graph, SFG, rather than ordered data indexes, hence permutation of memory modules for parallel accessing is not required; besides, data distribution dependency between FFT computations is eliminated. Thus complexity of control logic is reduced in two aspects.

In conclusion, the primary achievement of this work is proposing a novel modeling method for inconstant distribution and reducing complexity of control logic to demonstrate possible evolution which could be brought by inconstant distribution.

關鍵字(中)

★ 快速傅立葉轉換
★ 存取方案
★ 資料分布
★ 複雜度降低

關鍵字(英)

★ FFT
★ Fast Fourier transform
★ Access scheme
★ Data distribution
★ Complexity reducing

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 v
圖目錄 viii
表目錄 x
一、導言 1
1-1 研究背景 1
1-1-1 數學運算的離散系統實現 1
1-1-2 離散系統的數位系統硬體實現 2
1-2 研究方向 5
1-2-1 DFT 點數降解的數學推導 5
1-2-2 Cooley-Tukey 演算法的硬體實現性 7
1-3 研究目標與論文架構 9
二、快速傅立葉轉換運算架構 10
2-1 概論 10
2-2 管線式 (Pipelined) 運算架構 11
2-2-1 單一路徑回授線路 11
2-2-2 多重路徑交換線路 12
2-2-3 比較與總結 13
2-3 記憶體式 (Memory-based) 運算架構 14
2-3-1 單向資料路徑 14
2-3-2 雙向資料路徑 15
2-3-3 比較與總結 16
三、記憶體存取方案 17
3-1 概論 17
3-2 固定資料分布存取方案 19
3-2-1 普遍化的 Modulo-r addition 方法 21
3-2-2 普遍化的 Bit-wise modulo-2 addition 方法與不同研究的比較 24
3-3 不定資料分布存取方案 27
3-4 所提出的存取方案 30
3-4-1 所提出的 SFG 建模方法 30
3-4-2 所提出存取方案的行為 32
3-4-3 硬體實現 35
3-5 硬體效率驗證 37
3-6 總結 38
四、 Run-time Length-variable FFT 運算核心 40
4-1 概論 40
4-2 初步設計 42
4-2-1 設計規範 42
4-2-2 硬體規劃設計 44
4-3 記憶單元的設計 47
4-3-1 初步設計 47
4-3-2 Cache 的設計 51
4-3-3 宏記憶體模組的設計 52
4-4 控制單元的設計 57
4-4-1 時序分析 57
4-4-2 運作流程規劃設計 59
4-5 FPGA 驗證 65
4-5-1 整合測試規劃 65
4-5-2 驗證結果 67
4-6 總結 69
五、結論 70
參考文獻 72

參考文獻

[1] I. J. Good, “The interaction algorithm and practical Fourier analysis,” J. Royal Stat. Soc., ser. B, vol. 20, no. 2, pp. 361–372, 1958.
[2] J. W. Cooley and J. Tukey, “An algorithm for machine calculation of complex fourier series,” in Math. Comput., vol. 19, pp. 297–301, Apr. 1965.
[3] S. Bouguezel, M. O. Ahmad, and M. N. S. Swam, “Improved radix-4 and radix-8 FFT algorithms,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2004, pp. III 561–564.
[4] S. He and M. Torkelson, ”A New approach to Pipeline FFT processor,” in Proc. Int. Parallel Processing Symp., 1996, pp. 766–770.
[5] S. He and M. Torkelson, “Designing pipeline FFT processor for OFDM (de)modulation,” in Proc. URSI Int. Symp. Signals Syst. Elect., 1998, pp. 257–262.
[6] B. G. Jo and M. H. Sunwoo, ”New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy,” IEEE Tran. on Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911–919, May 2005.
[7] S. He and M. Torkelson, ”Design and implementation of a 1024-point pipeline FFT processor,”in Proc. IEEE Custom Integrated Circuits Conf., May 1998, pp. 131–134.
[8] L. R. Rabiner and B. Gold. Theory and Application of Digital Signal Processing, NJ: Prentice-Hall, 1975.
[9] B. M. Baas, “A low power, high performance 1024-point FFT Processor,” IEEE J. Solid-State Circuits, vol. 34, no. 3, pp. 380–387, Mar 1999.
[10] Y. Chen, Y.-W. Lin, and C.-Y. Lee, “A block scaling FFT/IFFT processor for WiMAX applications,” in Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC), Nov. 2006, pp. 203–206.
[11] M. C. Pease, “Organization of large scale Fourier processors,” IEEE Trans. Circuits Syst. vol. 16, pp. 474–482, July 1969.
[12] D. Cohen, “Simplified control of FFT hardware,” IEEE Trans. Acoust., Speech Signal Processing, Vol. ASSP-24, pp. 577-579, Dec. 1976.
[13] L. G. Johnson, “Conflict free memory addressing for dedicated FFT hardware,” IEEE Trans. Circuits Syst. II, vol. 39, pp. 312–316, May 1992.
[14] C. P. Hung and S. G. Chen, “Design of an efficient variablelength FFT processor,” IEEE Int. Symp. on Circuits Syst., vol. 2, pp. 833-836, May 2004.
[15] J. H. Takala, T. S. Jarvinen, and H. T. Sorokin, “Conflict-free parallel memory access scheme for FFT processors,” IEEE Int. Symp. on Circuits Syst., vol. 4, pp. 524-527, May 2003.
[16] H. T. Sorokin and J. H. Takala, "Conflict-free parallel accessscheme for mixed-radix FFT supporting I/O permutations," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp.1709 -1712, 2011.
[17] X. Xiao, E. Oruklu, and J. Saniie, “An efficient FFT engine with reduced addressing logic,” IEEE Trans. Circuits Syst. II, vol. 55, no 11, pp. 1149–1153, November 2008.
[18] X. Xiao, E. Oruklu, and J. Saniie, “Fast memory addressing scheme for radix-4 FFT implementation,” IEEE Int. Conf. Electro/Information Tech., June 2009, pp. 437–440.
[19] “IEEE standard for broadband over power line networks: Medium access control and physical layer specifications,” IEEE Std 1901-2010, pp. 1-1586, Dec. 2010.

指導教授

薛木添(Muh-Tian Shiue)

審核日期

2015-7-28

推文