### Parallel programming based on master

```exp(x)の並列計算に関する一考察

BACKGROUNDS

Processors : Clock up --> Multi-core
Multi-core processors are not expensive.
 Core i7, Cell, GPU, etc.

PC clusters, Super Computers and Cloud
Computing are a kind of parallel computing.
 Using multi-core processors effectively is
important.
 Teaching materials for parallel computing
And…
 My research area : Computer Algebra System,
e-Learning and Instructional Design

CONTENTS
Parallel computation of exp(x)
 Embarassingly parallel computation and Masterworker paradigm
 How to compute/parallelize exp(x)
 Algorithms
 Experiments
 Discussions
 Conclusion

EMBARASSINGLY PARALLEL
COMPUTATIONS
Embarassingly parallel computations - no
dependency or communication exists between

Results
Computing/Processing
NUMERICAL APPROXIMATION OF
exp(x)
Worker 1
Worker 2
Worker 3
Worker M
Equation (3) will be divided to M groups of terms and allocated them to
each worker process.
 N  1
L

 M 
NUMERICAL APPROXIMATION OF
exp(x)
In these group, the last terms of the former groups are appeared in the
coefficient of the following other groups, e.g. xL-1/(L-1)! is appeared in all
groups.
PARALLEL COMPUTATION OF
exp(x) (MASTER)
PARALLEL COMPUTATION OF
exp(x) (WORKER)
TEST-BED OF EXPERIMENTS
PC
Processor
Intel(R) Core(TM) i7 860 2.80GHz
4cores (8HT)
Memory
3GB
OS
FreeBSD/i386 8.0-RELEASE
Computer Algebra
System
Risa/Asir 20070806
Number of Workers
1~8
Compute with multiple precision
integer/rational numbers (very slow)
ELAPSED TIME (exp(1), N=1000)
Communicaton
time is needed.
Sequential
Parallel
Number of
workers
increases.
Number of
digits of
return value
decreses.
MAXIMUM NUMBER OF DIGITS OF
NUMERATOR/DENOMINATOR OF
RETURN VALUE FROM WORKERS
Number of worker
processes
Maximum number of digits of
numerator/denominator of return value
1
2565
2
1437
3
969
4
735
5
594
6
492
7
425
8
375
If the number of digits is twice, it would needs 4
times multiplication.
DISCUSSIONS
From the viewpoint of numerical computation, a
computation with N=1000 isn’t needed because
one with about N=10 makes the results sufficient.
 There are some numerical approach, e.g. C
standard library’s exp(x), Stirling's
approximation for n!.
 For teaching material, this approach would be
good because the tasks allocated each worker
processes have some dependencies. It is more
difficult than to compute the circle ratio.

CONCLUSIONS
Parallel computation of exp(x) is illustrated.
 As number of worker processes increase, the
completion time decreases.
 As number of digits of numerator/denominator of
return value from worker processes decrease, the
completion time decreases.
 Future works


Evaluation the problem as teaching material
exp(x) by C STANDARD LIBRARY
SEQUENTIAL COMPUTATION OF
exp(x)
NUMERICAL APPROXIMATION OF
THE CIRCLE RATIO
1
4
dx
2
1 x
0
 
4 N 1
1
 
N i 0 1  ((i  0.5) / N ) 2
4 99
1


100 i 0 1  ((i  0.5) / 100) 2
99

4  49
1
1
 



2
2 
100  i 0 1  ((i  0.5) / 100) i 50 1  ((i  0.5) / 100) 
N=100, # of workers is 2.
They can be allocated to 2 workers.
TEST-BED OF EXPERIMENTS
PC
PLAYSTATION3
Xeon X3210 2.13GHz
Cell B.E. 3.2GHz
4cores
PPE, 7SPEs
SPE Local Store 256KB
EIB 307.2GB/s
Memory
3GB
256MB
OS
FreeBSD/amd64 7.1RELEASE
Yellow Dog Linux 6.2
CAS,
Library
Risa/Asir 20070806
libspe-2.2.80-132
Processor
ELAPSED TIME (N=50,000,000)

Elapsed Time(Risa/Asir)
# of workers
1
2
3
4
Elapsed time[s]
37.902
19.627
13.787
11.056
Speedup ratio*
1
1.931
2.749
3.428

Elapsed Time(SPE Library)
# of workers
1
2
3
4
5
6
Elapsed time[s]
6.037
3.023
2.019
1.518
1.217
1.017
Speedup ratio*
1
1.997
2.990
3.978
4.959
5.934
* Speedup ratio is the ratio of elapsed time with 1
worker to one with n workers.
```