Multi-Massive Parallel Computing
And here is the first step of my Master’s thesis: studying CUDA. In shortly, CUDA (Compute Unified Device Architecture) is an GPGPU language and runtime for the new GeForce 8 Series and so on. It’s a complete new paradigm, where programs written for such devices must have a parallel nature. So it isn’t so general purpose as it claims to be ;)
However, this whole paradigm will soon give me some headaches. Just look the figure below:

(Figure taken from Nvidia CUDA: preview)
As you can “clearly” see, host is the CPU side and device is GPU side. A kernel is a program designed to run in parallel. The execution occur on a grid, which is divided into blocks, whose are composed by threads. Blocks and threads run logically in parallel, while a set of threads, called warp, run physically in parallel.
More will come, by the time I learn things ;)




Hi, thank a lot for your explanation about Parallel computing. But I down’t understand about the “Blocks and threads run logically in parallel, while a set of threads, called warp, run physically in parallel.”; what do you mean by run logically and physically?
Hi, physically means that if I have 16 stream processors, at least 16 threads are in fact running in parallel. In the other hand, logically means that the whole block is running in parallel, i.e., threads within a block can share data among each other, but only 16 threads will be running at the exactly same time (the scheduler is responsible for managing which warps runs at given time).
I hope that you’ve got the big picture! =)