參考文獻 |
[1] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative
Approach. Morgan Kaufmann, Inc., second ed., 1996.
[2] G. C. Fox, S. W. Otto, and A. J. G. Hey, Matrix algorithms on a hypercube
I: Matrix multiplication," Parallel Computing, pp. 17{31, Apr. 1987.
[3] G. H. Golub and C. F. V. Loan, Matrix Computations. Johns Hopkins Univer-
sity Press, second ed., 1989.
[4] C. T. Ho, S. L. Johnsson, and A. Edelman, Matrix multiplication on hyper-
cubes using full bandwidth and constant storage," in Proceedings of the 6th
Distributed Memory Computing Conference, May 1991.
[5] S. Coleman and K. S. Mckinley, Tile size selection using cache organization
and data layout," in Proceedings of the SIGPLAN Conference on Programming
Language Design and Implementation, 1995.
[6] M. Kandemir, J. Ramanujam, and A. Choudhary, Improving cache locality
by a combination of loop and data transformations," IEEE Transactions on
Computers, vol. 48, pp. 159{167, Feb. 1999.
[7] G. Rivera and C.-W. Tseng, Data transformations for eliminating con
ict
misses," in Proceedings of the SIGPLAN '98 Conference on Programming Lan-
guage Design and Implementation, (Montreal,Canada), June 1998.
[8] M. Wolfe, High Performance Compilers for Parallel Computing. Addison-
Wesley Publishing Company, Inc., 1996.
[9] J. Barbosa, J. Tavares, and A. J. Padilha, Linear algebra algorithms in a
heterogeneous cluster of personal computers," in Proceedings of the 9th Hetero-
geneous Computing Workshop, 2000.
[10] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Matrix-matrix multi-
plication on heterogeneous platforms," in Proceedings of the International Con-
ference on Parallel Processing, pp. 289{298, 2000.
[11] O. Beaumont, V. Boudet, A. Legrand, F. Rastello, and Y. Robert, Hetero-
geneous matrix-matrix multiplication or partitioning a square into rectangles:
Np-completeness and approximation algorithms," in Proceedings of the Ninth
Euromicro Workshop on Parallel and Distributed Processing, pp. 298{305, 2001.
[12] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Matrix multiplication
on heterogeneous platforms," IEEE Transactions on Parallel and Distributed
Systems, vol. 12, pp. 1033{1051, Oct. 2001.
[13] O. Beaumont, V. Boudet, A. Petitet, F. Rastello, and Y. Robert, A proposal
for a heterogeneous cluster scalapack (dense linear solvers)," IEEE Transactions
on Computers, vol. 50, pp. 1052{1070, Oct. 2001.
[14] R. Larson and B. H. Edwards, Elementary Linear Algebra. Houghton Mi in
Company, fourth ed., 2000.
[15] K. Hwang, Advanced Computer Architecture: Parallelism, Scalability, Pro-
grammability. McGraw-Hill, Inc., 1993.
[16] The Intel Pentium III 256 KB Processor: Product Overview. Intel homepage:
http://developer.intel.com/design/pentiumiii/prodbref/index.htm.
[17] The Intel Pentium 4 Processor: Product Overview. Intel homepage:
http://developer.intel.com/design/Pentium4/prodbref/.
[18] M. E. Wolf and M. S. Lam, A data locality optimizing algorithm," in Proceed-
ings of the ACM SIGPLAN '91 Conference on Programming Language Design
and Implementation, (Toronto, Ontario, Canada), pp. 30{44, June 1991.
[19] PVM: Parallel Virtual Machine. available in
http://www.epm.ornl.gov/pvm/pvm home.html.
[20] P. L. Springer, PVM support for clusters," in Proceedings of the 2001 IEEE
International Conference on Cluster Computing (CLUSTER'01), pp. 183{186,
2001.
[21] T.-S. Chen and J.-P. Sheu, Communication-free data allocation techniques for
parallelizing compilers on multicomputers," IEEE Transactions on Parallel and
Distributed Systems, vol. 5, pp. 924{938, Sept. 1994.
[22] C.-H. Huang and P. Sadayappan, Communication-free hyperplane partitioning
of nested loops," Journal of Parallel and Distributed Computing, vol. 19, pp. 90{
102, 1993.
[23] A. W. Lim and M. S. Lam, Communication-free parallelization via a ne trans-
formations," in Proceedings of the 7th Workshop on Languages and Compilers
for Parallel Computing, Aug. 1994.
[24] J. Ramanujam and P. Sadayappan, Compile-time techniques for data distri-
bution in distributed memory machines," IEEE Transactions on Parallel and
Distributed Systems, vol. 2, pp. 472{482, Oct. 1991.
[25] K.-P. Shih, J.-P. Sheu, and C.-H. Huang, Statement-level communication-free
partitioning techniques for parallelizing compilers," in Proceedings of the 9th
Workshop on Languages and Compilers for Parallel Computing, Aug. 1996.
[26] K.-P. Shih, C.-H. Huang, and J.-P. Sheu, Communication-free partitioning
of nested loops," in Compiler Optimizations for Scalable Parallel Systems:
Languages, Compilation Techniques, and Run Time Systems (S. Pande and
D. P. Agrawal, eds.), vol. 1808 of Lecture Notes in Computer Science, Springer-
Verlag, 2001.
[27] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics: A
Fundation for Computer Science. Addison-Wesley Publishing Company, 1989. |