Wednesday, 15 January 2014

parallel processing - Huge slow down using openmp -



parallel processing - Huge slow down using openmp -

i trying test speed little piece of code follows:

for(i=0;i<imgdim;i++) { x[0][i] = z[i] - u1[i] * rhoinv; x[1][i] = z[i] - u2[i] * rhoinv; x[2][i] = z[i] - u3[i] * rhoinv; }

the iteration around 200 , imgdim 1000000. total time piece of code around 2 seconds. , whole code cost 15 seconds. after utilize openmp parallel piece of code like:

omp_set_num_threads(max_threads); #pragma omp parallel shared(x,z,u1,u2,u3,imgdim,rhoinv) private(i) { #pragma omp schedule(dynamic) for(i=0;i<imgdim;i++) { x[0][i] = z[i] - u1[i] * rhoinv; x[1][i] = z[i] - u2[i] * rhoinv; x[2][i] = z[i] - u3[i] * rhoinv; } }

max_threads 8. little piece of code needs around 11 seconds , entire code utilize around 27 seconds. unusual thing time decreases 6 seconds if alter max_threads 1. still much longer sequential code.

it costs me lot of time , can not find problem. appreciate if can help me that.

schedule(dynamic) introduces huge run-time overhead. should used loops each iteration take different amount of time , improved load balancing justify overhead. regular loops yours dynamic scheduling overkill introduces unnecessary overhead, slows downwards computation.

change schedule type static:

#pragma omp parallel schedule(static) for(i=0;i<imgdim;i++) { x[0][i] = z[i] - u1[i] * rhoinv; x[1][i] = z[i] - u2[i] * rhoinv; x[2][i] = z[i] - u3[i] * rhoinv; }

(note: variables declared in outer scopes shared default , parallel loop command variable implicitly private)

parallel-processing openmp

No comments:

Post a Comment