CS344: Parallel Computing

posted in: MOOC | 0

Parallel Computing

Introduction

c1  c2

Here is a simple CUDA program:

To compile it in Linux:

Or in Windows + VS:

New CUDA project is the easiest solution to get started,

or

  • right click on project
  • build Dependencies
  • build Customizations
  • check the box for the Cuda version
  • right click all .cu files for properties, set their types to CUDA/C++ in the CUDA file properties
  • add cudart.lib to Linker->Input for all configurations and all platforms

 

Another simple program from Udacity:

 

KERNEL <<< GRID OF BLOCKS, BLOCK OF THREADS, shared memory per block in bytes >>> (…)

Efficient: Each thread knows its threadIdx (thread within block, threadIdx.x), blockDim (size of a block), blockIdx(block within gird) and gridDim (size of grid)

 

Map

  • Set of elements to process [64 floats]
  • Function to run on each element [“square”]

 

Map (elements, function)

GPU are good at map

  • GPUs have many parallel processors
  • GPUs optimize for throughput
 

Gather, scatter, stencil, transpose.

Synchronization, Barrier

SMs, blocks, threads

 

 

Image blur

 

Leave a Reply