exp(x)の並列計算に関する一考察 白石 啓一(香川高等専門学校 通信ネットワーク工学科) 今井 慈郎(香川大学 工学部) BACKGROUNDS Processors : Clock up --> Multi-core Multi-core processors are not expensive. Core i7, Cell, GPU, etc. PC clusters, Super Computers and Cloud Computing are a kind of parallel computing. Using multi-core processors effectively is important. Teaching materials for parallel computing And… My research area : Computer Algebra System, e-Learning and Instructional Design CONTENTS Parallel computation of exp(x) Embarassingly parallel computation and Masterworker paradigm How to compute/parallelize exp(x) Algorithms Experiments Discussions Conclusion EMBARASSINGLY PARALLEL COMPUTATIONS Embarassingly parallel computations - no dependency or communication exists between parallel tasks Master-worker paradigm is suitable. Tasks Results Computing/Processing NUMERICAL APPROXIMATION OF exp(x) Worker 1 Worker 2 Worker 3 Worker M Equation (3) will be divided to M groups of terms and allocated them to each worker process. N 1 L M NUMERICAL APPROXIMATION OF exp(x) In these group, the last terms of the former groups are appeared in the coefficient of the following other groups, e.g. xL-1/(L-1)! is appeared in all groups. PARALLEL COMPUTATION OF exp(x) (MASTER) PARALLEL COMPUTATION OF exp(x) (WORKER) TEST-BED OF EXPERIMENTS PC Processor Intel(R) Core(TM) i7 860 2.80GHz 4cores (8HT) Memory 3GB OS FreeBSD/i386 8.0-RELEASE Computer Algebra System Risa/Asir 20070806 Number of Workers 1~8 Compute with multiple precision integer/rational numbers (very slow) ELAPSED TIME (exp(1), N=1000) Communicaton time is needed. Sequential Parallel Number of workers increases. Number of digits of return value decreses. MAXIMUM NUMBER OF DIGITS OF NUMERATOR/DENOMINATOR OF RETURN VALUE FROM WORKERS Number of worker processes Maximum number of digits of numerator/denominator of return value 1 2565 2 1437 3 969 4 735 5 594 6 492 7 425 8 375 If the number of digits is twice, it would needs 4 times multiplication. DISCUSSIONS From the viewpoint of numerical computation, a computation with N=1000 isn’t needed because one with about N=10 makes the results sufficient. There are some numerical approach, e.g. C standard library’s exp(x), Stirling's approximation for n!. For teaching material, this approach would be good because the tasks allocated each worker processes have some dependencies. It is more difficult than to compute the circle ratio. CONCLUSIONS Parallel computation of exp(x) is illustrated. As number of worker processes increase, the completion time decreases. As number of digits of numerator/denominator of return value from worker processes decrease, the completion time decreases. Future works Evaluation the problem as teaching material exp(x) by C STANDARD LIBRARY SEQUENTIAL COMPUTATION OF exp(x) NUMERICAL APPROXIMATION OF THE CIRCLE RATIO 1 4 dx 2 1 x 0 4 N 1 1 N i 0 1 ((i 0.5) / N ) 2 4 99 1 100 i 0 1 ((i 0.5) / 100) 2 99 4 49 1 1 2 2 100 i 0 1 ((i 0.5) / 100) i 50 1 ((i 0.5) / 100) N=100, # of workers is 2. Tasks are independent each other. They can be allocated to 2 workers. TEST-BED OF EXPERIMENTS PC PLAYSTATION3 Xeon X3210 2.13GHz Cell B.E. 3.2GHz 4cores PPE, 7SPEs SPE Local Store 256KB EIB 307.2GB/s Memory 3GB 256MB OS FreeBSD/amd64 7.1RELEASE Yellow Dog Linux 6.2 CAS, Library Risa/Asir 20070806 libspe-2.2.80-132 Processor ELAPSED TIME (N=50,000,000) Elapsed Time(Risa/Asir) # of workers 1 2 3 4 Elapsed time[s] 37.902 19.627 13.787 11.056 Speedup ratio* 1 1.931 2.749 3.428 Elapsed Time(SPE Library) # of workers 1 2 3 4 5 6 Elapsed time[s] 6.037 3.023 2.019 1.518 1.217 1.017 Speedup ratio* 1 1.997 2.990 3.978 4.959 5.934 * Speedup ratio is the ratio of elapsed time with 1 worker to one with n workers.
© Copyright 2019 ExploreDoc