posted in: Graduate, PostMS | 0

CMSC 740 Fall 2017 is taught by an awesome teacher Prof. Matthias Zwicker

Overall, this class is a fantastic introduction to advanced graphics, ray tracing pipelines, as well as recent advances in deep learning.

# Projects

The course projects are based on Nori, an educational ray tracer.

## Octrees and Ray Traversal

To build an Octree, one could use a stack to mimic the DFS procedure. For non-leaf node, store its 8 children as pointers (8 bytes * 8 = 64 bytes), for leaf node, store a a vector list of N triangle indices (4N bytes).

For leaf node, traverse the triangle sets; for non-leaf node, traverse the inner nodes.

To improve the naive implementation, one could sort the triangles, then conduct the intersection.

OpenMP Notes

## Monte Carlo Sampling

### Sample Warping and corresponding validation

The uniform circle warping is widely used for lots of purposes, so I put it here for reference.

Code removed to prevent plagiarism.

# Notes

## Acceleration structures,

### Bounding Volume Hierarchies (BVHs)

• Partition space into non-overlapping cells
• Each cell stores reference to all objects that overlap it

### Spatial subdivision

• Binary space partitioning (BSP) trees
• Recursively divide space into two parts using
• dividing planes (with arbitrary position, orientation)
• k-d-trees, nearest neighbor search

### Light Transport Effects

• Quantify spatial energy distribution of light, in a way that is compatible with the geometrical optics model
• Physical measurement of electromagnetic energy, Unit: Watts
• Assume light consists of photons with
• Position x
• Direction of motion w
• Wavelength lambda
• Each photo has an energy of hv
• H Planck’s constant
• V = 1 / lambda frequency
• Measuring energy means “counting photons”
• Energy per time per wavelength per solid angle per area
• Means differetial area perpendicular to $\omega$
• Power carried along a ray is measured in radiance
• Radiance along ray is constant (in vacuum)
• In practice: assume steady state, measure at discrete wavelengths R, G, B
• Radiance L: power per solid angle per area, vector of 3 values for R, G, B
• Irradiance: Power per unit area
• Integration of radiance over hemisphere
• Irradiance E: power per area perpendicular to the surface normal
• Radiance L: power per area perpendicular to the incident ray \omega
• When we do ray tracing, we use radiance for rays
• Integration in spherical coordinates
• Radiant intensity I in all direction, has a total power

### Photometry

• Perceptual measurement of perceived brightness
• Lumen

## BRDF (bidirectional reflectance distribution function), or BSDF (bidirectional scattering distribution function)

For every pair of light and viewing directions, BRDF gives fraction of transmitted light

Diffuse reflection: equally distributed reflections

Glossy reflection: majorly reflected to mirror-direction, quickly fall off in a cone

Differential irradiance comes from a cone of directions around incident direction:

• BRDF cannot do volume scattering, because it relies on the surface normal. It’s a definition depending on the surface normal.
• BRDF cannot do diffraction nor polarization.
• BRDFs are wavelength dependent, four-dimensional vectors

Helmoltz reciprocity: can exchange role of light source and observer.

Representations

1. As a table of values over incident and outgoing directions
2. Analytic BRDF models
1. Represent BRDFs as analytic functions

Gonio reflectometer by Cornell University is a device to measure and tabulate BRDFs, a robot with light source and camera, measures reflection for each light/ camera direction

Analytic BRDFs

• Diffuse reflection: reflected radiance is equal over all directions
• Physical model: Lambert’s cosine law
• Diffuse BRDF is just a constant
• Specular reflection and refraction
• Dielectrics (Glass, plastic, diamond, water, air) and conductors (metals)

Index of refraction

• Air 1.00029, Water: 1.33 Acrylic glass: 1.49

Fermat’s principle

• The actual path between two points taken by a beam of light is the one which is traversed in the least time.

Snell’s law

• Ratio of sines of angles of incidence and refraction is equal to opposite ratio of indices of refraction
• $\frac{\sin \theta_1}{\sin \theta_2} = \frac{n_2}{n_1}$

# Torrance-Sparrow model

Vector form to obtain refracted direction r

Fresnel equations describe fraction of intensity of light that is reflected and refracted

Derived by solving the Maxwell equations

Takes into account polarization of light

Different versions for dielectrics and conductors

Full equations for dielectrics see

In graphics, often use Schlick’s approximation

Refraction coefficient is

OK for both dielectrics and conductors

Specular BRDF

Micro facet models for glossy reflection

• Assume surface consists of randomly oriented small mirrors (micro facets)
• Statistical distribution of micro facet orientations determines appearance, glossiness / shininess of reflection

Torrance-Sparrow model

• Physically-based model for specular highlight due to micro facet distribution
• Parameter: probability distribution over micro facet orientations
• Half-vector between
• D usually function of dot product of
• Distribution of microfacet normals determines surface roughness
• Sharp or smooth specular highlight

Rough surface

Reflection equation

So far: directional form of reflection equation

Change of integration variables

Monte Carlo integration

General technique for numerical integration

Hemisphere has two dimensions

• Easy to implement
• Robust, can integrate almost any function
• Efficient for high dimensional integrals
• Variance (noise)
• Slow convergence

Probability concepts

• Random variable X
• Cumulative distribution function (CDP) P
• Probability density function (PDF  is the derivative of CDP
• The standard deviation  of the estimator
• The expected error converges with
• Important sampling
• With general probability density
• Convergence still slow
• Observe:  is proportional to
• Idea: choose f and p to be as similar as possible to reduce variance!

## Area Lights

Given same number of samples, surface form of reflection equation looks better

Illumination from environment

In real world, at each point in scene light arrives from all directions

Assume for now: incoming radiance from all directions is given

For example, environment maps

Store “omni-directional” incoming radiance as images

Each pixel corresponds to radiance from a certain direction

Later: Global illumination

Illumination from environment

Assumption for environment maps

Independent of point x, source of illumination is “infinitely distant”

Given by environment

## Global Illumination

The Rendering Equation

Convergence, because of energy conservation (each bounce absorbs some light)

Reflected radiance appears as unknown on both sides of equation

Solution: find radiance such that reflection equation is satisfied simultaneously at each point

and direction

Based on approximate physical model (geometric optics)

Cannot model Wave effects such as polarization and diffraction

So far: reflection equation

Givenincident light  over hemisphere  and BRDF  at point x,

what is reflected light , which is a radiance distribution: reflected radiance at each point

And in direction

For example, incident radiance given by environment map

Direct illumination from area and point lights, but not multiple bounces.

Incident light at one point (a) depends on reflected light at other point (b)

Direct only vs Direct & indirect

Think of both reflected and incident radiance as *unknown*

Reflection equation expresses equilibrium between incident and reflected radiance

Radiance doesn’t change along the ray in vacuum

Light sources are represented by known function

is emitted radiance at each surface point x in each direction

Monte Carlo Path Tracing

Naïve approach

• Recursive ray tracing as for mirror reflection, but shoot many rays in each step
• Distribute rays over hemisphere at each step
• Problem: exponential explosion of number of rays
• Exponentially more longer paths than shorter paths
• But longer paths contribute less to image than shorter ones

Smarter

• Compute radiance as “integral over all light paths that connect light sources and eye”
• Paths have different lengths (number of bounces)
• Con formulate one integral for all paths of each length
• Approach
• Sum over all path lengths
• Integral of radiance transported along all paths of given length
• Use Monte Carlo integration

Path tracing

• Want to compute radiance through each pixel in image
• Sum of radiance over all light paths connecting light and camera
• Light paths have different lengths

Native approach: average of both techniques

• Variance is bounded by larger variance of two techniques

Multiple importance Sampling

• Weighted average, with weights that reduce variance in optimal way
• N sampling techniques

This is an unbiased estimator of integral, as before!

Weights for provable variance reduction

• Balance heuristics
• Power heuristics
• Details, proofs

Multiple importance sampling

Naïve sampling

Optimal sampling

N-rooks (Latin hypercube) sampling

Advantage over jittered grid: can generate any number n of samples, not restricted to n x m grid

Quasi-Monte Carlo

Based on low-discrepency instead of pseudo-random samples

Low-discrepancy sequences have more uniform distribution of points, less clumping

Verices Position

UV

Normals

• 1500×1500
• 750×750
• 1500×1500
• Classical Foveated Rendering
• Speedups
• Nvidia foveated rendering
• Shift
• Logmap foveated rendering
• F
• Directly working in the logpolar rendering
• F
• Find a big model

1. Deferred shading is slower than what Nvidia is doing
2. Likely to get some returns from the original log-polar
1. Cut problem
2. Use 2 or three cuts

Hierarchical Z-Buffer Visibility

http://dl.acm.org/citation.cfm?id=166147

Can show theoretically that convergence improves

Before: Radiance along straight rays through volume is constant

Now

Participating media: volumetric media influence radiance along the rays

Light interaction with participating media

Absorption

• Absorption coefficients
• Material property
• Related to absorption cross section
• Rate of absorption, unites are 1 / meter
• “Probability per unit length that photon is absorbed”

Transmittance is “fraction of light transported over a certain distance along a ray”

Optical thickness

Transmittance

Phase functions: examples

Rayleigh scattering: scattering from particles smaller than wavelength of light

For example, molecules in atmosphere

Detail see, e.g., wikipedia

Mie (Lorenz-Mie) theory: scattering by spherical particles

Henyey-Greenstein phase function is an empirical phase function , and is useful for many phenomena (water, clouds, skin, stone)

Average phase angle g, asymmetry parameter

Ray marching is an approximation of integral

It has a limited number of samples, and uniformly distributed locations along the ray, could render using brute force Monte Carlo path tracing

Very expensive

Idea: describe scattering properties of material using an extension of BRDFs

So far: BRDFs, light scatters exactly at the same point where it hits the surface

Subsurface scattering: light enters the material, bounces around, leaves at a different place

BSSRDF: bidirectional surface scattering reflectance distribution function

## How to acquire a 3D point clouds

• 3D point cloud acquisition
• Point cloud alignment
• Active methods
• Interfere with reconstructed object
• Mechanically rotate the object to acquire the full geometry
• Radiometrically (active illumination) project laser line
• Accurate
• Slow, need to sweep object with laser
• Need calibrated light source-camera pair
• Doesn’t work if object doesn’t reflect light source
• Passive methods
• Sensing only
• Image sensors, cameras
• Replace laser by second camera
• Known baseline
• Fast, video input, produce real-time depth video
• No active illumination required, just cameras
• Can be accurate with high resolution cameras
• Hard to make stereo matching work rebustly in general settings
• Assume accurate camera calibration

• Triangulation
• How to calculate the height of a sea island
• Disparity inversely related to depth
• This is due to projective projection
• Z = fB / d
• Multiview reconstruction
• Structure from motion, photogrammetry
• Given few correspondences in images (features)
• Compute location and orientation of all cameras
• 3D location of features
• Time of flight (active)
• LIDAR, TOF cameras, Kinect one (v2)
• Send pulse, derive distance based on round-trip time of pulse
• LIDAR (light detection and raging)
• Scan scene point-by-point
• Up to 100’000 points / sec
• Long distances possible
• Long distances, kilometers, slow, relatively accurate
• Measure complete image at once
• Modulate illumination with radio frequency signal and measure phase shift
• Point cloud alignment ICP
• Most scanning algorithms acquire objects piece by piece (e.g. different viewpoints)
• Problem: align pieces to complete full geometry
• Two problems:
• Correspondence: which points in data should be aligned to which points on model?
• Given correspondence, how to rotate / translate to best align parts?
• ICP (Iterative closest point)
• Find initial correspondences using nearest neighbor
• Brute force
• For each point in data, traverse all points in model to find closed point
• Complexity quadratic in number of points
• More efficient
• Use spatial data structure for points in model to efficiently query nearest neighbors
• For example, kd-trees or Octrees
• While distances between corresponding points too large
• Estimate rotation / translation
• Transform using rotation / translation
• Update correspondences using nearest neighbor
• Alignment
• Goal: given correspondences, find rigid transformation that minimizes distances between corresponding points in data and model
• Exact alignment may not be possible because of noise
• Many algorithms, here two basic ones
• Minimizing point-to-point distances
• Points p, in data, p’ in model, 3×1 vectors
• Rotation matrix R in R 3×3
• Translation vector t in  R 3×1
• Find the best rotation R, T that minimize squared point-to-point distances
• Challenges
• Error is linear function of R and t
• But R needs to be constrained to be rotation matrix
• Orthonormal
• Optimization problem is non-linear least squares problem
• Translation
• Can show that aligned point clouds have same centroid
• For the best rotation rotation R, translation t
• Define coordinates q, relative to the centroid
• Find SVD of H
• Point to plane distances
• Material from low
• Assume known surface normals
• Measure distances to tangent planes
• Infinitesimal rotation, generators of rotation
• Omit higher order (quadratic) terms
• Small rotation of vector (x, y, z)

## Linear Least Squares Problem

• Combinational approach
• Voronoi diagram
• Function fitting
• Implicit function in 3D
• SSD: Smooth signed distance function
• Reconstruction

## Generative models

• Morphable face models
• Space of human body shapes
• Shape are texture vectors

# Deep Learning

## Concepts

• Layers
• Tensor
• Activation function
• Logistic sigmoid function: $frac{1}{1+e^{-x}}$
• The function is differentiable, monotonic
• unit step
• sign
• linear
• piece-wise linear
• hyperbolic tangent
• ReLU (Rectified Linear Unit)
• ReLU will make the output of a part of neurons 0, which will cause the sparseness of the network;
Disadvantages: Some neurons may never be activated, causing the corresponding parameters to never be updated.
• The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01 or so.
• Feature map – the output of one filter applied to the previous layer
• Convolutional layer (convolutional neural networks, CNN)
• Convolutional layers apply a convolution operation to the input, passing the result to the next layer. The convolution emulates the response of an individual neuron to visual stimuli.
• Fully connected layer
• Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular neural networks. Their activations can hence be computed with a matrix multiplication followed by a bias offset.
• Degree of freedom is very small
• 3×3, 5×5
• Loss functions
• The loss function computes the error for a single training example.
• The cost function is the average of the loss functions of the entire training set.
• L1 (Mean Absolute Error)
• $$C = C_0 + \frac{\lambda}{n} \sum_w |w|$$
• cannot compute derivative at 0
• L2 (Mean Squared Error)
• $$C = C_0 + \frac{\lambda}{2n} \sum_w w^2$$
• smaller w means complexity of the model is lower
• Cross-entropy loss, or log loss (soft-max), measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.
• use ONLY ONE or SUBSET of training sample from your training set to do the update for a parameter in a particular iteration
•

• SGD: suitable for large sample size, requires small memory; but each step may not be toward the optimal solution direction;
• Newton method: fast convergence; but there are strict requirements on the objective function, there must be a continuous one. The second-order partial derivative has a large amount of calculation.
• Pros: converges much faster compared to GD
• Cons: the error function is not as well minimized as in the case of GD
• Batch size (64, 128, or 256)
• Size of subset as an approximation of the whole databaseIt, which impacts the time efficiency of training and the noisiness of the gradient estimate. Updating the parameters using all training data is not efficient. Updating by one single sample (online updating) is noisy if the sample is not a good representation of the whole data.

• Batch normalization

• It is always great if the layer get data that is always normalized(mean 0 variance 1).
• We just normalize each batch with its running mean and variance.
• Batch normalization potentially helps in two ways: faster learning and higher overall accuracy. The improved method also allows you to use a higher learning rate, potentially providing another boost in speed.
• Automatic differentiation
• Automatic differentiation is a general technique that numerically evaluates the derivative of a function
• Error back-propagation
• Compute the gradients of a loss function using the chain rule
• Proceed ‘backwards’ with respect to the computations performed to compute the loss itself
• Feed-forward and recurrent neural network
• Feed-forward
• The connections between the units do not form a cycle
• Recurrent Neural Network
• Sequence of words and images, unsegmented, connected handwriting recognition, or speech recognition
• Autoencoder / autoassociator / Diabolo network
• Used for unsupervised learning of efficient coding, earn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction, widely used for learning generative models of data.
• Encoder / Decoder
• The encoder encodes the input sequence to an internal representation called ‘context vector’ which is used by the decoder to generate the output sequence.
• The lengths of input and output sequences can be different, as there is no explicit one on one relation between the input and output sequences.
• An Encoder/Decoder is not an autoencoder
• Reinforcement learning
• Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).
• Reinforcement learning is called approximate dynamic programming, or neuro-dynamic programming
• Limitations
• Firstly, it is often too memory expensive to store values of each state, since the problems can be pretty complex. Solving this involves looking into value approximation techniques, such as Decision Trees or Neural Networks. There are many consequence of introducing these imperfect value estimations, and research tries to minimise their impact on the quality of the solution.
• Moreover, problems are also generally very modular; similar behaviours reappear often, and modularity can be introduced to avoid learning everything all over again. Hierarchical approaches are common-place for this, but doing this automatically is proving a challenge. Finally, due to limited perception, it is often impossible to fully determine the current state. This also affects the performance of the algorithm, and much work has been done to compensate this Perceptual Aliasing.
• Skip connections
•
• Residual network
• u-net
• Classification
• Object detection: YOLO, R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN, SSD, etc.
• The YOLO algorithm directly uses a convolutional neural network to output the location of the object and its associated category. It is an end-to-end system, so the detection speed is extremely fast and can meet real-time requirements.
• Regression
• RNN
• Siamese Network
• Siamese networks consists of two identical neural networks, each taking one of the two input images.
• The last layers of the two networks are then fed to a contrastive loss function , which calculates the similarity between the two images.
• The contrastive loss function is a Distance-based Loss function (as opposed to prediction error-based Loss functions like Logistic loss or Hinge loss used in Classification).
• Same network shared the same weights
• Suppose you have cross-domain data
•

# How to prevent overfitting?

• Early stopping
• No improvement in N epoches
• 1 epoch
• Go through the entire training set once
• Mini-batch
• How much data do you put in one batch?
• Extending / Augmenting the datasets
• Acquire more
• Copy data and add noise
• Resampling
• Estimate the distribution of the data and generate
• Regulation / Loss Function
• L1 (Mean Absolute Error)
• $$C = C_0 + \frac{\lambda}{n} \sum_w |w|$$
• cannot compute derivative at 0
• L2 (Mean Squared Error)
• $$C = C_0 + \frac{\lambda}{2n} \sum_w w^2$$
• smaller w means complexity of the model is lower
• Cross-entropy loss, or log loss (soft-max), measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.
• Dropout (trick)
• Prevent overfitting by removing neurons randomly from the hidden layers.
•

## Denoising Monte Carlo Renderings

Kernel-Predicting Convolutional Networks for Denoising Monte Carlo Renderings

Specular has large dynamic range

• Network
• 8 hidden layers, 100 kernels of 5×5 in each layer
• L1 loss
• Training data
• 600 frames from movie “Finding Dori”
• Noise: 32 spp
• Reference: 1024 spp
• Test data
• 25 frames from other movies

In terms of timing, for an HD image of 1920×1080, our network takes about 12 seconds to evaluate and output a full denoised image

### Denoising animations

Frame-by-frame denoising leads to temporal flickering in animations

Need to filter in 3D space-time

How to implement this with CNNs?

Chaitanya et al. Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder, SIGGRAPH 2017

Input to network

• RGB colros
• G-buffer with view-space shading normals, depth, material roughness
• Divided by diffuse albedo (remove texture complexity from input)

Loss

• Spatial: L1, L1 on spatial gradients
• Temporal: L1 on temporal gradients
• Back propagation through time
• Replicate feed forward parts to unroll recurrent connections
• Gradients are noisy in SGD
• Compute moving average over SGD steps
• feed-forward network

Input with 1spp

• 54.9ms for 720p on Titan X
• Path tracer from 70ms to 260ms for 1spp

## What else could we learn to improve rendering efficiency?

• Superresolution
• Foveated rendering
• Antiliasing
• Multiple Importance Sampling
• Antialiasing for light fields
• Learning Light Transport the Reinforced Way, Dahm and Keller
• Use Q learning to learn importance sampling incrementally during path tracing

# Deep Learning: The Future of Real-Time Rendering

## Reference

1. How to prevent overfitting. https://blog.csdn.net/heyongluoyao8/article/details/49429629