CS344: Parallel Computing

posted in: MOOC | 0

Parallel Computing


c1  c2

Here is a simple CUDA program:

To compile it in Linux:

Or in Windows + VS:

New CUDA project is the easiest solution to get started,


  • right click on project
  • build Dependencies
  • build Customizations
  • check the box for the Cuda version
  • right click all .cu files for properties, set their types to CUDA/C++ in the CUDA file properties
  • add cudart.lib to Linker->Input for all configurations and all platforms


Another simple program from Udacity:


KERNEL <<< GRID OF BLOCKS, BLOCK OF THREADS, shared memory per block in bytes >>> (…)

Efficient: Each thread knows its threadIdx (thread within block, threadIdx.x), blockDim (size of a block), blockIdx(block within gird) and gridDim (size of grid)



  • Set of elements to process [64 floats]
  • Function to run on each element [“square”]


Map (elements, function)

GPU are good at map

  • GPUs have many parallel processors
  • GPUs optimize for throughput

Gather, scatter, stencil, transpose.

Synchronization, Barrier

SMs, blocks, threads



Image blur


Leave a Reply