在叢集電腦上矩陣相乘與資料分割的效能評估

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：38

、訪客IP：18.118.193.232

姓名

楊訓裕(Hsun-Yu Yang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

在叢集電腦上矩陣相乘與資料分割的效能評估
(Analyzing Data Distribution and Locality for Matrix Multiplication on a PC Cluster)

相關論文

★ 無線行動隨建即連網路上之廣播與繞徑問題	★ 熱門電影的高效能廣播演算法
★ 無線行動隨建即連網路上之媒體存取問題	★ 使用功率調整來增加多節點封包無線網路
★ 在無線行動隨建即用網路下Geocast 之設計與實做經驗	★ 一個適用於熱門隨選視訊服務之快速排程廣播策略
★ 應用數位浮水印技術於影像之智慧財產權保護與認證	★ 在寬頻分碼多重擷取技術上分配及再分配多重正交可變展頻係數碼
★ 無線行動隨建即連網路上之廣播排程協定	★ 在無線行動隨建即連網路下支援即時多媒體傳送的媒介存取協定
★ 以樹狀結構為基礎的Scatternet建構協定	★ 在無線感應器網路中具有省電機制並且採用對角線路徑的方向性擴散
★ 隨意型無線網路上一個具有能量保存的GRID繞徑協定	★ 在無線感應器網路中具有省電機制的傳輸協定
★ 隨意型無線網路上一個具有能量保存以及平衡的繞徑協定	★ 環形藍芽網路：一個藍芽通訊網路的新拓樸及其繞徑協定

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

矩陣相乘不論在科學上和工程上都是一個基本的應用程式。而叢集電腦是一種新型態的高速計算電腦平台。在此論文中，我們提出一個在叢集電腦上有效率的利用資料分割與資料區域性減少矩陣相乘計算時間的演算法。在叢集電腦上計算矩陣相乘需要考慮兩個影響效能的因素。第一是如何分割資料給處理器減少處理器之間的通訊，第二是利用資料區域性的特性減少處理器計算時快取記憶體與記憶體之間的資料搬移。在本論文中，我們考慮這兩個因素來減少計算矩陣的時間。在資料分割方面，我們提出了兩個方法來分析並減少處理器之間的傳數資料量。此外，我們也提出了如何利用資料區域性的特性來加快矩陣相乘的效能。由數據結果得知，我們的演算法比原始的矩陣相乘演算法節省了40%到50%的執行時間。我們也針對在叢集電腦上計算與通訊之間的關係做分析與評量。

摘要(英)

Parallel computer is an important architecture to calculate scienti c and engineer-
ing problems. PC cluster is a new computing platform of parallel computer. Low
cost and high computation capabilities are the characteristics of PC clusters. This
thesis evaluates the performance of matrix multiplication on a PC cluster. Matrix
multiplication on a PC cluster should consider two important factors. One is to
distribute data to processors to reduce interprocessor communication and the other
is to optimize cache utilization to reduce data movements between cache and main
memory. As a result, the thesis takes these two factors into consideration and pro-
poses two communication-free data distribution schemes: totally duplicate scheme
and partially duplicate scheme to totally eliminate interprocessor communication for
matrix multiplication. Moreover, the e ect of cache size on matrix multiplication
is analyzed as well. Experimental results also show the e ciency of the proposed
methods.

關鍵字(中)

★ 資料區域性
★ 資料分割
★ 快取記憶體
★ 矩陣相乘
★ 叢集電腦

關鍵字(英)

★ data distribtuion
★ data locality
★ PC cluster
★ matrix multiplication
★ cache

論文目次

1 Introduction 1
2 Preliminaries 6
2.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Data Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Cache Placement and Data Locality . . . . . . . . . . . . . . . . . . . 9
2.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Matrix Multiplication on a PC Cluster 17
3.1 Totally Duplicate Scheme . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Partially Duplicate Scheme . . . . . . . . . . . . . . . . . . . . . . .. 21
3.3 Analyses . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Matrix Multiplication with Data Locality . . . . . . . . . . . . . . . . 28
4 Performance Analyzes 37
5 Conclusions 42

參考文獻

[1] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative
Approach. Morgan Kaufmann, Inc., second ed., 1996.
[2] G. C. Fox, S. W. Otto, and A. J. G. Hey, Matrix algorithms on a hypercube
I: Matrix multiplication," Parallel Computing, pp. 17{31, Apr. 1987.
[3] G. H. Golub and C. F. V. Loan, Matrix Computations. Johns Hopkins Univer-
sity Press, second ed., 1989.
[4] C. T. Ho, S. L. Johnsson, and A. Edelman, Matrix multiplication on hyper-
cubes using full bandwidth and constant storage," in Proceedings of the 6th
Distributed Memory Computing Conference, May 1991.
[5] S. Coleman and K. S. Mckinley, Tile size selection using cache organization
and data layout," in Proceedings of the SIGPLAN Conference on Programming
Language Design and Implementation, 1995.
[6] M. Kandemir, J. Ramanujam, and A. Choudhary, Improving cache locality
by a combination of loop and data transformations," IEEE Transactions on
Computers, vol. 48, pp. 159{167, Feb. 1999.
[7] G. Rivera and C.-W. Tseng, Data transformations for eliminating con
ict
misses," in Proceedings of the SIGPLAN '98 Conference on Programming Lan-
guage Design and Implementation, (Montreal,Canada), June 1998.
[8] M. Wolfe, High Performance Compilers for Parallel Computing. Addison-
Wesley Publishing Company, Inc., 1996.
[9] J. Barbosa, J. Tavares, and A. J. Padilha, Linear algebra algorithms in a
heterogeneous cluster of personal computers," in Proceedings of the 9th Hetero-
geneous Computing Workshop, 2000.
[10] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Matrix-matrix multi-
plication on heterogeneous platforms," in Proceedings of the International Con-
ference on Parallel Processing, pp. 289{298, 2000.
[11] O. Beaumont, V. Boudet, A. Legrand, F. Rastello, and Y. Robert, Hetero-
geneous matrix-matrix multiplication or partitioning a square into rectangles:
Np-completeness and approximation algorithms," in Proceedings of the Ninth
Euromicro Workshop on Parallel and Distributed Processing, pp. 298{305, 2001.
[12] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Matrix multiplication
on heterogeneous platforms," IEEE Transactions on Parallel and Distributed
Systems, vol. 12, pp. 1033{1051, Oct. 2001.
[13] O. Beaumont, V. Boudet, A. Petitet, F. Rastello, and Y. Robert, A proposal
for a heterogeneous cluster scalapack (dense linear solvers)," IEEE Transactions
on Computers, vol. 50, pp. 1052{1070, Oct. 2001.
[14] R. Larson and B. H. Edwards, Elementary Linear Algebra. Houghton Mi in
Company, fourth ed., 2000.
[15] K. Hwang, Advanced Computer Architecture: Parallelism, Scalability, Pro-
grammability. McGraw-Hill, Inc., 1993.
[16] The Intel Pentium III 256 KB Processor: Product Overview. Intel homepage:
http://developer.intel.com/design/pentiumiii/prodbref/index.htm.
[17] The Intel Pentium 4 Processor: Product Overview. Intel homepage:
http://developer.intel.com/design/Pentium4/prodbref/.
[18] M. E. Wolf and M. S. Lam, A data locality optimizing algorithm," in Proceed-
ings of the ACM SIGPLAN '91 Conference on Programming Language Design
and Implementation, (Toronto, Ontario, Canada), pp. 30{44, June 1991.
[19] PVM: Parallel Virtual Machine. available in
http://www.epm.ornl.gov/pvm/pvm home.html.
[20] P. L. Springer, PVM support for clusters," in Proceedings of the 2001 IEEE
International Conference on Cluster Computing (CLUSTER'01), pp. 183{186,
2001.
[21] T.-S. Chen and J.-P. Sheu, Communication-free data allocation techniques for
parallelizing compilers on multicomputers," IEEE Transactions on Parallel and
Distributed Systems, vol. 5, pp. 924{938, Sept. 1994.
[22] C.-H. Huang and P. Sadayappan, Communication-free hyperplane partitioning
of nested loops," Journal of Parallel and Distributed Computing, vol. 19, pp. 90{
102, 1993.
[23] A. W. Lim and M. S. Lam, Communication-free parallelization via a ne trans-
formations," in Proceedings of the 7th Workshop on Languages and Compilers
for Parallel Computing, Aug. 1994.
[24] J. Ramanujam and P. Sadayappan, Compile-time techniques for data distri-
bution in distributed memory machines," IEEE Transactions on Parallel and
Distributed Systems, vol. 2, pp. 472{482, Oct. 1991.
[25] K.-P. Shih, J.-P. Sheu, and C.-H. Huang, Statement-level communication-free
partitioning techniques for parallelizing compilers," in Proceedings of the 9th
Workshop on Languages and Compilers for Parallel Computing, Aug. 1996.
[26] K.-P. Shih, C.-H. Huang, and J.-P. Sheu, Communication-free partitioning
of nested loops," in Compiler Optimizations for Scalable Parallel Systems:
Languages, Compilation Techniques, and Run Time Systems (S. Pande and
D. P. Agrawal, eds.), vol. 1808 of Lecture Notes in Computer Science, Springer-
Verlag, 2001.
[27] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics: A
Fundation for Computer Science. Addison-Wesley Publishing Company, 1989.

指導教授

許健平(Jang-Ping Sheu)

審核日期

2003-7-2

推文