I started looking into the graphic card programming especially Nvdia CUDA stuff. After getting the SDK compiling under Fedora 12 (so far only possible with compat-gcc) first examples could be tested. It is quite impressive how fast and nice some tests look like.

To get somehow familiar with the framework, I wrote a simple memcpy program which copies from host memory to device memory and vice versa. The results are quite nice.

Host to Device: 2149.465255 Mbyte/s
Device to Host: 1609.497326 Mbyte/s

In detail, the program allocates pinned memory on host side and global memory space on device side and the measurement only measures the time of the copy routine.

The next simple experiment was to lunch a so called kernel (program on the GPU). For the experiment I just copied an initialized memory region to the global device memory, did a floating point multiplication and copied the result back into host memory. To compare the measured time I did the same multiplication with the host CPU on an equal region of memory.

GPU w/out memcpy: 1.796128 ms
Host CPU        : 3.556544 ms

Hardware used: Lenovo T61p, Nvidia Quadro FX 570M.

Leave a Reply

This site has been fine-tuned by 15 WordPress Tweaks