Pedro Leite

words – an empty shell, a lack of meaning

Something is wrong…

My OpenStereo CUDA plugin did not directly benefit from a huge performance improve. Worst, with a 512×512 texture, his best fps rate was around 20, when compiled in release mode, running over a GeForce 8800 GTX. Even the one that uses shaders and Render-To-Texture have outperformed the CUDA plugin (around 300fps in a GeForce 7900 GTX!!!).

Perhaps that implementation isn’t the best one… I think that memory access could be improved, by using shared memory, but this is something to analyze.

October 31, 2007 Posted by | cuda, development, master degree, nvidia, opengl, openstereo | Leave a comment

OpenStereo Plugin using CUDA

Of course I wouldn’t let OpenStereo apart from my CUDA studies. I’ve implemented an OpenStereo plugin that uses CUDA to do the fusion process. Below is the preliminary kernel code:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>


#define BLOCK_SIZE 16


__global__ void fuse_images(float* left_tex, float* right_tex, float* stereo_tex,
float* matrix_left, float* matrix_right, int width, int height) {
int ix = __mul24(blockIdx.x, BLOCK_SIZE) + threadIdx.x;
int iy = __mul24(blockIdx.y, BLOCK_SIZE) + threadIdx.y;
int i = 0, j = 0, k = 0;
float c = 0.0f;

if (ix >= width || iy >= height)
return;


// rows
for (; i < 3; i++) {
// columns
k = (ix + __mul24(iy, width)) * 3;
for (j = 0; j < 3; j++) {
c += matrix_left[j + __mul24(3, i)] * left_tex[k + j];
c += matrix_right[j + __mul24(3, i)] * right_tex[k + j];
}
stereo_tex[k + i] = c;
c = 0.0f;
}
}

That function runs on the device (GPU) while the host side does the OpenGL stuff, besides calling the kernel code. A call to that function is done this way:

// executes the kernel
fuse_images<<< grid, threads >>>(cuda_left_tex, cuda_right_tex, cuda_stereo_tex,
getCudaMatrixFromMethod(method, isLeft), getCudaMatrixFromMethod(method, !isLeft), width, height);

I need to test this plugin in a CUDA device, since its emulation is working almost perfectly.
P.S.: Yeah, posting source code in wordpress sucks =/

October 30, 2007 Posted by | cuda, development, open source, opengl, openstereo, stereoscopy | 1 Comment

Multi-Massive Parallel Computing

And here is the first step of my Master’s thesis: studying CUDA. In shortly, CUDA (Compute Unified Device Architecture) is an GPGPU language and runtime for the new GeForce 8 Series and so on. It’s a complete new paradigm, where programs written for such devices must have a parallel nature. So it isn’t so general purpose as it claims to be ;)

However, this whole paradigm will soon give me some headaches. Just look the figure below:

CUDA Computing Paradigm

(Figure taken from Nvidia CUDA: preview)

As you can “clearly” see, host is the CPU side and device is GPU side. A kernel is a program designed to run in parallel. The execution occur on a grid, which is divided into blocks, whose are composed by threads. Blocks and threads run logically in parallel, while a set of threads, called warp, run physically in parallel.

More will come, by the time I learn things ;)

October 29, 2007 Posted by | cuda, grvm, master degree, nvidia, screenshot | 2 Comments