parallel processing - Huge slow down using openmp -
i trying test speed little piece of code follows:
for(i=0;i<imgdim;i++) { x[0][i] = z[i] - u1[i] * rhoinv; x[1][i] = z[i] - u2[i] * rhoinv; x[2][i] = z[i] - u3[i] * rhoinv; }
the iteration around 200 , imgdim 1000000. total time piece of code around 2 seconds. , whole code cost 15 seconds. after utilize openmp parallel piece of code like:
omp_set_num_threads(max_threads); #pragma omp parallel shared(x,z,u1,u2,u3,imgdim,rhoinv) private(i) { #pragma omp schedule(dynamic) for(i=0;i<imgdim;i++) { x[0][i] = z[i] - u1[i] * rhoinv; x[1][i] = z[i] - u2[i] * rhoinv; x[2][i] = z[i] - u3[i] * rhoinv; } }
max_threads 8. little piece of code needs around 11 seconds , entire code utilize around 27 seconds. unusual thing time decreases 6 seconds if alter max_threads 1. still much longer sequential code.
it costs me lot of time , can not find problem. appreciate if can help me that.
schedule(dynamic)
introduces huge run-time overhead. should used loops each iteration take different amount of time , improved load balancing justify overhead. regular loops yours dynamic scheduling overkill introduces unnecessary overhead, slows downwards computation.
change schedule type static
:
#pragma omp parallel schedule(static) for(i=0;i<imgdim;i++) { x[0][i] = z[i] - u1[i] * rhoinv; x[1][i] = z[i] - u2[i] * rhoinv; x[2][i] = z[i] - u3[i] * rhoinv; }
(note: variables declared in outer scopes shared default , parallel loop command variable implicitly private)
parallel-processing openmp
No comments:
Post a Comment