c - CUDA kernel and 2D arrays - how does it work? -
i doing image rotation method. takes 2 matricies , grade of rotation. rotates original matrix amount of degrees , stores rotated matrix. have next "normal" code (for cpu - taken site - http://sinepost.wordpress.com/2012/07/24/image-rotation/) , working should;
static void rotateimage(unsigned char original[raw_height][raw_width] , unsigned char rotated[raw_height][raw_width] , int degrees) { double centerx = raw_width/2; double centery = raw_height/2; for(int x = 0; x< raw_height;x++) { (int y = 0; y < raw_width; y++) { double dir = calculatedirection(x-centerx,y-centery); double mag = calculatemagnitude(x-centerx,y-centery); dir-=degrees; int origx = (int)(centerx + calculatex(dir,mag)); int origy = (int)(centery + calculatey(dir,mag)); if (origx >= 0 && origx < raw_height && origy >= 0 && origy < raw_width) { rotated[x][y] = original[origx][origy]; } } } }
i transfer code cuda code. here version:
#define raw_width 1600*3 #define raw_height 1200 unsigned char *dev_original_image; unsigned char *dev_rotated_image; __global__ void rotatepicture(unsigned char *original, unsigned char *rotated, int degrees) { int x = threadidx.x + blockdim.x * blockidx.x; int y = threadidx.y + blockdim.y * blockidx.y; int offset_rotated = x + y * blockdim.x * griddim.x; double centerx = 2400.0; double centery = 600.0; double dir = (atan2(y-centery,x-centerx))*180/3.14159265; double mag = sqrt((x-centerx)*(x-centerx) + (y-centery)*(y-centery)); dir = dir - degrees; int origx = (int)(centerx + cos((dir*3.14159265/180)) * mag); int origy = (int)(centery + sin((dir*3.14159265/180)) * mag); int offset_original = origx + origy * blockdim.x * griddim.x; if(offset_original > 0 && offset_original < raw_height*raw_width) *(rotated + offset_rotated) = *(original + offset_original); }
but doesn't give me same result cpu part. think problem in passing arguments of cuda kerenl. passing them 2d arrays, ok? can explain me? here kerenl configuration , call:
dim3 blockpergrid(450,400,1); dim3 threadspergrid(8,4,1); cudamalloc((void**)&dev_original_image,sizeof(unsigned char)*raw_height*raw_width); cudamalloc((void**)&dev_rotated_image,sizeof(unsigned char)*raw_height*raw_width); cudamemcpy(dev_original_image, raw_image2d, sizeof(unsigned char)*raw_height*raw_width,cudamemcpyhosttodevice); cudamemcpy(dev_rotated_image, raw_image2d_rotated, sizeof(unsigned char)*raw_height*raw_width, cudamemcpyhosttodevice); rotatepicture<<<blockpergrid,threadspergrid>>>(dev_original_image,dev_rotated_image, deg);
thank advices!
note: modified code , working improve still not correct.
here solution other lurking in these waters. here right kernel:
__global__ void rotatepicture(unsigned char *original, unsigned char *rotated, int degrees) { int x = threadidx.x + blockdim.x * blockidx.x; int y = threadidx.y + blockdim.y * blockidx.y; int offset_rotated = x + y * blockdim.x * griddim.x; double centerx = 2400.0; double centery = 600.0; double dir = (atan2(x-centerx,y-centery))*180/3.14159265; double mag = sqrt((x-centerx)*(x-centerx) + (y-centery)*(y-centery)); dir = dir - degrees; int origx = (int)(centerx + sin((dir*3.14159265/180)) * mag); int origy = (int)(centery + cos((dir*3.14159265/180)) * mag); int offset_original = origx + origy * blockdim.x * griddim.x; if(origx > 0 && origx < raw_width && origy > 0 && origy < raw_height) *(rotated + offset_rotated) = *(original + offset_original); }
also, changed kernel dimensions (to accomodate 1600*3 width , 1200 height):
dim3 blockpergrid(600,300,1); dim3 threadspergrid(8,4,1);
so, functioning same way cpu version above using gpu resources. enjoy
c cuda nvidia
No comments:
Post a Comment