CMSC 740 Fall 2017 is taught by an awesome teacher Prof. Matthias Zwicker.
Overall, this class is a fantastic introduction to advanced graphics, ray tracing pipelines, as well as recent advances in deep learning.
Projects
The course projects are based on Nori, an educational ray tracer.
Octrees and Ray Traversal
To build an Octree, one could use a stack to mimic the DFS procedure. For nonleaf node, store its 8 children as pointers (8 bytes * 8 = 64 bytes), for leaf node, store a a vector list of N triangle indices (4N bytes).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 
Node* build(BoundingBox3f &_box, TriIndexList &_triangles) { while (!m_stack.empty()) m_stack.pop(); if (_triangles.size() < MIN_NUM_TRIANGLES) return new LeafNode(_triangles); BuildStackItem rootItem(_box, _triangles, nullptr, 8); Node* root = new Node(); rootItem.node = root; m_stack.push(rootItem); m_total_nodes = 1; while (!m_stack.empty()) { auto item = &m_stack.top(); if (item>triangles.empty()  item>splitted) { m_stack.pop(); continue; } if (item>triangles.size() < MIN_NUM_TRIANGLES  (item>parent && item>parent>depth > OCTREE_DEPTH_LIMIT)) { ++m_total_nodes; ++m_leaf_nodes; item>node = new LeafNode(item>triangles); if (item>parent != nullptr) item>parent>child[item>child_id] = item>node; m_stack.pop(); continue; } calcSubBoxes(item>box, item>boxes); // assume that a triangle overlaps with an octree cell // if the bounding box of the triangle's vertices overlap with the cell for (const auto &triIndex : item>triangles) { const auto &triBox = m_mesh>getBoundingBox(triIndex); for (int i = 0; i < 8; ++i) if (item>boxes[i].overlaps(triBox)) item>sets[i].push_back(triIndex); } ++m_total_nodes; if (item>node == nullptr) item>node = new Node(item>parent>depth + 1); ++m_interior_nodes; if (item>parent != nullptr) item>parent>child[item>child_id] = item>node; item>splitted = true; for (int i = 0; i < 8; ++i) { m_stack.push(BuildStackItem(item>boxes[i], item>sets[i], item>node, i)); } } return root; } 
For leaf node, traverse the triangle sets; for nonleaf node, traverse the inner nodes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
oid rayIntersect(Ray3f &_ray, Intersection &_its, bool _shadowRay, bool &foundIntersection, uint32_t &resIndex) { rayIntersectNode(m_root, m_box, _ray, _its, _shadowRay, foundIntersection, resIndex); } void rayIntersectNode(Node* _node, BoundingBox3f _box, Ray3f &_ray, Intersection &_its, bool _shadowRay, bool &foundIntersection, uint32_t &resIndex) { if (_node>child.empty()) { LeafNode* leaf = (LeafNode*)_node; #pragma omp parallel for for (int i = 0; i < leaf>m_triangles.size(); ++i) { const auto &triIndex = leaf>m_triangles[i]; float u, v, t; if (m_mesh>rayIntersect(triIndex, _ray, u, v, t)) { /* An intersection was found! Can terminate immediately if this is a shadow ray query */ if (_shadowRay) return; _ray.maxt = _its.t = t; _its.uv = Point2f(u, v); _its.mesh = m_mesh; resIndex = triIndex; foundIntersection = true; } } //} return; } //... } 
To improve the naive implementation, one could sort the triangles, then conduct the intersection.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
int intersects[8]; float dummy; vector<pair<float, int="">> nearT(8); #pragma omp parallel for for (int i = 0; i < 8; ++i) { nearT[i].second = i; intersects[i] = (_node>child[i] != nullptr && boxes[i].rayIntersect(_ray, nearT[i].first, dummy)); if (!intersects[i]) nearT[i].first = INFINITY; } sort(nearT.begin(), nearT.end(), [](const pair<float, int=""> &left, const pair<float, int=""> &right) { return left.first < right.first; }); #pragma omp parallel for for (int i = 0; i < 8; ++i) { int id = nearT[i].second; if (intersects[id] && nearT[i].first < _ray.maxt) rayIntersectNode(_node>child[id], boxes[id], _ray, _its, _shadowRay, foundIntersection, resIndex); } 
OpenMP Notes
1 2 3 4 5 6 7 8 
#pragma omp parallel for Modify the CMakeList.txt find_package(OpenMP) if (OPENMP_FOUND) set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}") set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}") set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}") endif() 
Monte Carlo Sampling
Sample Warping and corresponding validation
The uniform circle warping is widely used for lots of purposes, so I put it here for reference.
1 2 3 4 5 6 7 8 9 
Point2f Warp::squareToUniformDisk(const Point2f &sample) { auto two_pi_y = 2 * M_PI * sample.y(); auto sqrt_x = sqrt(sample.x()); return Point2f(cos(two_pi_y) * sqrt_x, sin(two_pi_y) * sqrt_x); } float Warp::squareToUniformDiskPdf(const Point2f &p) { return (p.x() * p.x() + p.y() * p.y() < 1.0f) ? 1.0f / M_PI : 0.0f; } 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 
Vector3f Warp::squareToUniformSphere(const Point2f &sample) { float z = 1.0f  2.0f * sample.x(); float r = sqrt(std::max(0.0f, 1.0f  z * z)); float two_pi_y = 2.0f * M_PI * sample.y(); float sin_phi = sin(two_pi_y); float cos_phi = cos(two_pi_y); return Vector3f(r * cos_phi, r * sin_phi, z); } float Warp::squareToUniformSpherePdf(const Vector3f &v) { return (abs(v.x() * v.x() + v.y() * v.y() + v.z() * v.z()  1.0f) < 1e6) ? INV_FOURPI : 0.0f; } Vector3f Warp::squareToUniformHemisphere(const Point2f &sample) { float z = sample.x(); float two_pi_y = 2.0f * M_PI * sample.y(); float r = sqrt(1.0f  z * z); float sin_phi = sin(two_pi_y); float cos_phi = cos(two_pi_y); return Vector3f(r * cos_phi, r * sin_phi, z); } 
Code removed to prevent plagiarism.
Hierarchical Sample Warping
Point Lights
Ambient Occlusion
Area Light
Distributed Ray Tracing
Microfacet BRDF
Path Tracer
Notes
Acceleration structures,
Uniform grid
Object subdivision
Bounding Volume Hierarchies (BVHs)
 Partition space into nonoverlapping cells
 Each cell stores reference to all objects that overlap it
Spatial subdivision
 Binary space partitioning (BSP) trees
 Recursively divide space into two parts using
 dividing planes (with arbitrary position, orientation)
 kdtrees, nearest neighbor search
Light Transport Effects
 Fluorescence
 Reflection
 Thin film interference: https://www.shadertoy.com/view/XddXRj
 Diffraction: https://www.shadertoy.com/view/4t23zR
 Refraction
 Dispersion
 Rayleigh Scattering: https://www.shadertoy.com/view/MdXSzX (Subsurface Scattering)
 Polarization
Radiometry
 Quantify spatial energy distribution of light, in a way that is compatible with the geometrical optics model
 Physical measurement of electromagnetic energy, Unit: Watts
 Assume light consists of photons with
 Position x
 Direction of motion w
 Wavelength lambda
 Each photo has an energy of hv
 H Planck’s constant
 V = 1 / lambda frequency
 Measuring energy means “counting photons”
 Spectral Radiance
 Energy per time per wavelength per solid angle per area
 Means differetial area perpendicular to $\omega$
 Power carried along a ray is measured in radiance
 Radiance along ray is constant (in vacuum)
 In practice: assume steady state, measure at discrete wavelengths R, G, B
 Radiance L: power per solid angle per area, vector of 3 values for R, G, B
 Irradiance: Power per unit area
 Integration of radiance over hemisphere
 Irradiance E: power per area perpendicular to the surface normal
 Radiance L: power per area perpendicular to the incident ray \omega
 When we do ray tracing, we use radiance for rays
 Integration in spherical coordinates
 Radiant intensity I in all direction, has a total power
Photometry
 Perceptual measurement of perceived brightness
 Lumen
BRDF (bidirectional reflectance distribution function), or BSDF (bidirectional scattering distribution function)
For every pair of light and viewing directions, BRDF gives fraction of transmitted light
Diffuse reflection: equally distributed reflections
Glossy reflection: majorly reflected to mirrordirection, quickly fall off in a cone
BRDF is fraction of reflected radiance over differential irradiance, instead of outgoing irradiance over incoming radiance
Differential irradiance comes from a cone of directions around incident direction:
SDF: https://www.shadertoy.com/view/XlKSDR
 BRDF cannot do volume scattering, because it relies on the surface normal. It’s a definition depending on the surface normal.
 BRDF cannot do diffraction nor polarization.
 BRDFs are wavelength dependent, fourdimensional vectors
Helmoltz reciprocity: can exchange role of light source and observer.
Representations
 As a table of values over incident and outgoing directions
 Analytic BRDF models
 Represent BRDFs as analytic functions
Gonio reflectometer by Cornell University is a device to measure and tabulate BRDFs, a robot with light source and camera, measures reflection for each light/ camera direction
Analytic BRDFs
 Diffuse reflection: reflected radiance is equal over all directions
 Physical model: Lambert’s cosine law
 Diffuse BRDF is just a constant
 Specular reflection and refraction
 Dielectrics (Glass, plastic, diamond, water, air) and conductors (metals)
Index of refraction
 Air 1.00029, Water: 1.33 Acrylic glass: 1.49
Fermat’s principle
 The actual path between two points taken by a beam of light is the one which is traversed in the least time.
Snell’s law
 Ratio of sines of angles of incidence and refraction is equal to opposite ratio of indices of refraction
 $\frac{\sin \theta_1}{\sin \theta_2} = \frac{n_2}{n_1}$
TorranceSparrow model
Vector form to obtain refracted direction r
Fresnel equations describe fraction of intensity of light that is reflected and refracted
Derived by solving the Maxwell equations
Takes into account polarization of light
Different versions for dielectrics and conductors
Full equations for dielectrics see
In graphics, often use Schlick’s approximation
Refraction coefficient is
OK for both dielectrics and conductors
Specular BRDF
Micro facet models for glossy reflection
 Assume surface consists of randomly oriented small mirrors (micro facets)
 Statistical distribution of micro facet orientations determines appearance, glossiness / shininess of reflection
TorranceSparrow model
 Physicallybased model for specular highlight due to micro facet distribution
 Parameter: probability distribution over micro facet orientations
 Halfvector between
 D usually function of dot product of
 Distribution of microfacet normals determines surface roughness
 Sharp or smooth specular highlight
Masking and shadowing
Rough surface
Reflection equation
So far: directional form of reflection equation
Change of integration variables
Monte Carlo integration
General technique for numerical integration
Hemisphere has two dimensions
 Advantages
 Easy to implement
 Robust, can integrate almost any function
 Efficient for high dimensional integrals
 Disadvantages
 Variance (noise)
 Slow convergence
Probability concepts
 Random variable X
 Cumulative distribution function (CDP) P
 Probability density function (PDF is the derivative of CDP
 The standard deviation of the estimator
 The expected error converges with
 Important sampling
 With general probability density
 Convergence still slow
 Observe: is proportional to
 Idea: choose f and p to be as similar as possible to reduce variance!
Area Lights
Given same number of samples, surface form of reflection equation looks better
Illumination from environment
In real world, at each point in scene light arrives from all directions
Assume for now: incoming radiance from all directions is given
For example, environment maps
Store “omnidirectional” incoming radiance as images
Each pixel corresponds to radiance from a certain direction
Later: Global illumination
Illumination from environment
Assumption for environment maps
Independent of point x, source of illumination is “infinitely distant”
Given by environment
Global Illumination
The Rendering Equation
Convergence, because of energy conservation (each bounce absorbs some light)
Reflected radiance appears as unknown on both sides of equation
Solution: find radiance such that reflection equation is satisfied simultaneously at each point
and direction
Based on approximate physical model (geometric optics)
Cannot model Wave effects such as polarization and diffraction
So far: reflection equation
Givenincident light over hemisphere and BRDF at point x,
what is reflected light , which is a radiance distribution: reflected radiance at each point
And in direction
For example, incident radiance given by environment map
Direct illumination from area and point lights, but not multiple bounces.
Incident light at one point (a) depends on reflected light at other point (b)
Direct only vs Direct & indirect
Think of both reflected and incident radiance as *unknown*
Reflection equation expresses equilibrium between incident and reflected radiance
Radiance doesn’t change along the ray in vacuum
Light sources are represented by known function
is emitted radiance at each surface point x in each direction
Monte Carlo Path Tracing
Naïve approach
 Recursive ray tracing as for mirror reflection, but shoot many rays in each step
 Distribute rays over hemisphere at each step
 Problem: exponential explosion of number of rays
 Exponentially more longer paths than shorter paths
 But longer paths contribute less to image than shorter ones
Smarter
 Compute radiance as “integral over all light paths that connect light sources and eye”
 Paths have different lengths (number of bounces)
 Con formulate one integral for all paths of each length
 Approach
 Sum over all path lengths
 Integral of radiance transported along all paths of given length
 Use Monte Carlo integration
 Sum over all path lengths
Path tracing
 Want to compute radiance through each pixel in image
 Sum of radiance over all light paths connecting light and camera
 Light paths have different lengths
Native approach: average of both techniques
 Variance is bounded by larger variance of two techniques
Multiple importance Sampling
 Weighted average, with weights that reduce variance in optimal way
 N sampling techniques
This is an unbiased estimator of integral, as before!
Weights for provable variance reduction
 Balance heuristics
 Power heuristics
 Details, proofs
Multiple importance sampling
Naïve sampling
Optimal sampling
Nrooks (Latin hypercube) sampling
Advantage over jittered grid: can generate any number n of samples, not restricted to n x m grid
QuasiMonte Carlo
Based on lowdiscrepency instead of pseudorandom samples
Lowdiscrepancy sequences have more uniform distribution of points, less clumping
Verices Position
UV
Normals
 Deferred shading
 1500×1500
 750×750
 1500×1500
 Classical Foveated Rendering
 Speedups
 Nvidia foveated rendering
 Shift
 Logmap foveated rendering
 F
 Directly working in the logpolar rendering
 F
 Find a big model
 Deferred shading is slower than what Nvidia is doing
 Likely to get some returns from the original logpolar
 Cut problem
 Use 2 or three cuts
Hierarchical ZBuffer Visibility
http://dl.acm.org/citation.cfm?id=166147
Can show theoretically that convergence improves
Before: Radiance along straight rays through volume is constant
Now
Participating media: volumetric media influence radiance along the rays
Light interaction with participating media
Absorption
 Absorption coefficients
 Material property
 Related to absorption cross section
 Rate of absorption, unites are 1 / meter
 “Probability per unit length that photon is absorbed”
Transmittance is “fraction of light transported over a certain distance along a ray”
Optical thickness
Transmittance
Phase functions: examples
Rayleigh scattering: scattering from particles smaller than wavelength of light
For example, molecules in atmosphere
Detail see, e.g., wikipedia
Mie (LorenzMie) theory: scattering by spherical particles
HenyeyGreenstein phase function is an empirical phase function , and is useful for many phenomena (water, clouds, skin, stone)
Average phase angle g, asymmetry parameter
Ray marching is an approximation of integral
It has a limited number of samples, and uniformly distributed locations along the ray, could render using brute force Monte Carlo path tracing
Very expensive
Idea: describe scattering properties of material using an extension of BRDFs
So far: BRDFs, light scatters exactly at the same point where it hits the surface
Subsurface scattering: light enters the material, bounces around, leaves at a different place
BSSRDF: bidirectional surface scattering reflectance distribution function
How to acquire a 3D point clouds
 3D point cloud acquisition
 Point cloud alignment
 Active methods
 Interfere with reconstructed object
 Mechanically rotate the object to acquire the full geometry
 Radiometrically (active illumination) project laser line
 Accurate
 Slow, need to sweep object with laser
 Need calibrated light sourcecamera pair
 Doesn’t work if object doesn’t reflect light source
 Passive methods
 Sensing only
 Image sensors, cameras
 Replace laser by second camera
 Known baseline
 Fast, video input, produce realtime depth video
 No active illumination required, just cameras
 Can be accurate with high resolution cameras
 Hard to make stereo matching work rebustly in general settings
 Assume accurate camera calibration
 Triangulation
 How to calculate the height of a sea island
 Disparity inversely related to depth
 This is due to projective projection
 Z = fB / d
 Multiview reconstruction
 Structure from motion, photogrammetry
 Given few correspondences in images (features)
 Compute location and orientation of all cameras
 3D location of features
 Time of flight (active)
 Similar principle as radar
 LIDAR, TOF cameras, Kinect one (v2)
 Send pulse, derive distance based on roundtrip time of pulse
 LIDAR (light detection and raging)
 Use visible light pulse instead of radio frequency pulse
 Scan scene pointbypoint
 Up to 100’000 points / sec
 Long distances possible
 Long distances, kilometers, slow, relatively accurate
 Measure complete image at once
 Modulate illumination with radio frequency signal and measure phase shift
 Point cloud alignment ICP
 Most scanning algorithms acquire objects piece by piece (e.g. different viewpoints)
 Problem: align pieces to complete full geometry
 Two problems:
 Correspondence: which points in data should be aligned to which points on model?
 Given correspondence, how to rotate / translate to best align parts?
 ICP (Iterative closest point)
 Find initial correspondences using nearest neighbor
 Brute force
 For each point in data, traverse all points in model to find closed point
 Complexity quadratic in number of points
 More efficient
 Use spatial data structure for points in model to efficiently query nearest neighbors
 For example, kdtrees or Octrees
 Brute force
 While distances between corresponding points too large
 Estimate rotation / translation
 Transform using rotation / translation
 Update correspondences using nearest neighbor
 Alignment
 Goal: given correspondences, find rigid transformation that minimizes distances between corresponding points in data and model
 Exact alignment may not be possible because of noise
 Many algorithms, here two basic ones
 Minimizing pointtopoint distances
 Points p, in data, p’ in model, 3×1 vectors
 Rotation matrix R in R 3×3
 Translation vector t in R 3×1
 Find the best rotation R, T that minimize squared pointtopoint distances
 Challenges
 Error is linear function of R and t
 But R needs to be constrained to be rotation matrix
 Orthonormal
 Optimization problem is nonlinear least squares problem
 Translation
 Can show that aligned point clouds have same centroid
 For the best rotation rotation R, translation t
 Define coordinates q, relative to the centroid
 Find SVD of H
 Can show that aligned point clouds have same centroid
 Minimizing pointtopoint distances
 Point to plane distances
 Material from low
 Assume known surface normals
 Measure distances to tangent planes
 Infinitesimal rotation, generators of rotation
 Omit higher order (quadratic) terms
 Small rotation of vector (x, y, z)
 Find initial correspondences using nearest neighbor
Linear Least Squares Problem
 Combinational approach
 Voronoi diagram
 Function fitting
 Implicit function in 3D
 SSD: Smooth signed distance function
 Reconstruction
Generative models
 Morphable face models
 Space of human body shapes
 Shape are texture vectors
Deep Learning
Concepts
 Layers
 Tensor
 Activation function
 https://www.shadertoy.com/view/4lccRB
 Logistic sigmoid function: $frac{1}{1+e^{x}}$
 The function is differentiable, monotonic
 unit step
 sign
 linear
 piecewise linear
 hyperbolic tangent
 ReLU (Rectified Linear Unit)
 Advantages: piecewise linearity, large gradient, fast convergence
 ReLU will make the output of a part of neurons 0, which will cause the sparseness of the network;
Disadvantages: Some neurons may never be activated, causing the corresponding parameters to never be updated.
 The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01 or so.
 Feature map – the output of one filter applied to the previous layer
 Convolutional layer (convolutional neural networks, CNN)
 Convolutional layers apply a convolution operation to the input, passing the result to the next layer. The convolution emulates the response of an individual neuron to visual stimuli.
 Fully connected layer
 Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular neural networks. Their activations can hence be computed with a matrix multiplication followed by a bias offset.
 Degree of freedom is very small
 3×3, 5×5
 Loss functions
 The loss function computes the error for a single training example.
 The cost function is the average of the loss functions of the entire training set.
 L1 (Mean Absolute Error)
 $$ C = C_0 + \frac{\lambda}{n} \sum_w w $$
 cannot compute derivative at 0
 L2 (Mean Squared Error)
 $$ C = C_0 + \frac{\lambda}{2n} \sum_w w^2 $$
 smaller w means complexity of the model is lower
 Crossentropy loss, or log loss (softmax), measures the performance of a classification model whose output is a probability value between 0 and 1. Crossentropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.
 Stochastic gradient descent (SGD)
 use ONLY ONE or SUBSET of training sample from your training set to do the update for a parameter in a particular iteration

12345678initialize network weights (often small random values)<b>do</b><b>forEach</b> training example named exprediction = <u>neuralnetoutput</u>(network, ex) <i>// forward pass</i>actual = <u>teacheroutput</u>(ex)compute error (prediction  actual) at the output units<span class="nowrap">compute <span class="mwemathelement"><span class="mwemathmathmlinline mwemathmathmla11y"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><annotation encoding="application/xtex">{\displaystyle \Delta w_{h}}</annotation></semantics></math></span></span></span>
123<span class="nowrap"><span class="mwemathelement"><img class="mwemathfallbackimageinline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/0f84dc6434bf1fcc9fc997866e03b7ddfc5a5234" alt="\Delta w_h" ariahidden="true"></span> for all weights from hidden layer to output layer</span> <i>// backward pass</i><span class="nowrap">compute <span class="mwemathelement"><span class="mwemathmathmlinline mwemathmathmla11y"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><annotation encoding="application/xtex">{\displaystyle \Delta w_{i}}</annotation></semantics></math></span></span></span>
1234<span class="nowrap"><span class="mwemathelement"><img class="mwemathfallbackimageinline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/238dcf3322345baf33ca8d6ccb1a129fd18809ac" alt="\Delta w_i" ariahidden="true"></span> for all weights from input layer to hidden layer</span> <i>// backward pass continued</i>update network weights <i>// input layer not modified by error estimate</i><b>until</b> all examples classified correctly or another stopping criterion satisfied<b>return</b> the network  SGD: suitable for large sample size, requires small memory; but each step may not be toward the optimal solution direction;
 Newton method: fast convergence; but there are strict requirements on the objective function, there must be a continuous one. The secondorder partial derivative has a large amount of calculation.
 Pros: converges much faster compared to GD
 Cons: the error function is not as well minimized as in the case of GD
 Batch size (64, 128, or 256)

Size of subset as an approximation of the whole databaseIt, which impacts the time efficiency of training and the noisiness of the gradient estimate. Updating the parameters using all training data is not efficient. Updating by one single sample (online updating) is noisy if the sample is not a good representation of the whole data.


Batch normalization
 It is always great if the layer get data that is always normalized(mean 0 variance 1).
 We just normalize each batch with its running mean and variance.
 Batch normalization potentially helps in two ways: faster learning and higher overall accuracy. The improved method also allows you to use a higher learning rate, potentially providing another boost in speed.
 Automatic differentiation
 Automatic differentiation is a general technique that numerically evaluates the derivative of a function
 Error backpropagation
 Compute the gradients of a loss function using the chain rule
 Proceed ‘backwards’ with respect to the computations performed to compute the loss itself
 Feedforward and recurrent neural network
 Feedforward
 The connections between the units do not form a cycle
 Recurrent Neural Network
 Sequence of words and images, unsegmented, connected handwriting recognition, or speech recognition
 Feedforward
 Autoencoder / autoassociator / Diabolo network
 Used for unsupervised learning of efficient coding, earn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction, widely used for learning generative models of data.
 Encoder / Decoder
 The encoder encodes the input sequence to an internal representation called ‘context vector’ which is used by the decoder to generate the output sequence.
 The lengths of input and output sequences can be different, as there is no explicit one on one relation between the input and output sequences.
 An Encoder/Decoder is not an autoencoder
 Reinforcement learning
 Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor suboptimal actions explicitly corrected. Instead the focus is on online performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).
 Reinforcement learning is called approximate dynamic programming, or neurodynamic programming
 Limitations
 Firstly, it is often too memory expensive to store values of each state, since the problems can be pretty complex. Solving this involves looking into value approximation techniques, such as Decision Trees or Neural Networks. There are many consequence of introducing these imperfect value estimations, and research tries to minimise their impact on the quality of the solution.
 Moreover, problems are also generally very modular; similar behaviours reappear often, and modularity can be introduced to avoid learning everything all over again. Hierarchical approaches are commonplace for this, but doing this automatically is proving a challenge. Finally, due to limited perception, it is often impossible to fully determine the current state. This also affects the performance of the algorithm, and much work has been done to compensate this Perceptual Aliasing.
 Skip connections
 Residual network
 unet
 Classification
 Object detection: YOLO, RCNN, Fast RCNN, Faster RCNN, Mask RCNN, SSD, etc.
 The YOLO algorithm directly uses a convolutional neural network to output the location of the object and its associated category. It is an endtoend system, so the detection speed is extremely fast and can meet realtime requirements.
 Regression
 RNN
 A recurrent neural network (RNN) is a class of artificial neural network where connections between nodes form a directed graph along a sequence. This allows it to exhibit temporal dynamic behavior for a time sequence.
 LSTM (Long shortterm memory (LSTM) networks were discovered by Hochreiter and Schmidhuber in 1997 and set accuracy records in multiple applications domains)
 Siamese Network
 A Siamese networks consists of two identical neural networks, each taking one of the two input images.
 The last layers of the two networks are then fed to a contrastive loss function , which calculates the similarity between the two images.
 The contrastive loss function is a Distancebased Loss function (as opposed to prediction errorbased Loss functions like Logistic loss or Hinge loss used in Classification).
 Same network shared the same weights
 Suppose you have crossdomain data
 Generative Adversarial Networks
How to prevent overfitting?
 Early stopping
 No improvement in N epoches
 1 epoch
 Go through the entire training set once
 Minibatch
 How much data do you put in one batch?
 Extending / Augmenting the datasets
 Acquire more
 Copy data and add noise
 Resampling
 Estimate the distribution of the data and generate
 Regulation / Loss Function
 L1 (Mean Absolute Error)
 $$ C = C_0 + \frac{\lambda}{n} \sum_w w $$
 cannot compute derivative at 0
 L2 (Mean Squared Error)
 $$ C = C_0 + \frac{\lambda}{2n} \sum_w w^2 $$
 smaller w means complexity of the model is lower
 Crossentropy loss, or log loss (softmax), measures the performance of a classification model whose output is a probability value between 0 and 1. Crossentropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.
 L1 (Mean Absolute Error)
 Dropout (trick)
Denoising Monte Carlo Renderings
KernelPredicting Convolutional Networks for Denoising Monte Carlo Renderings
Specular has large dynamic range
 Take advantage of additional information (besides RGB) produced by renderer
 Network
 8 hidden layers, 100 kernels of 5×5 in each layer
 L1 loss
 Training data
 600 frames from movie “Finding Dori”
 Noise: 32 spp
 Reference: 1024 spp
 600 frames from movie “Finding Dori”
 Test data
 25 frames from other movies
In terms of timing, for an HD image of 1920×1080, our network takes about 12 seconds to evaluate and output a full denoised image
Denoising animations
Framebyframe denoising leads to temporal flickering in animations
Need to filter in 3D spacetime
How to implement this with CNNs?
Chaitanya et al. Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder, SIGGRAPH 2017
Input to network
 RGB colros
 Gbuffer with viewspace shading normals, depth, material roughness
 Divided by diffuse albedo (remove texture complexity from input)
Loss
 Spatial: L1, L1 on spatial gradients
 Temporal: L1 on temporal gradients
 Back propagation through time
 Replicate feed forward parts to unroll recurrent connections
 “Adam” SGD solver
 Gradients are noisy in SGD
 Compute moving average over SGD steps
 feedforward network
Input with 1spp
 54.9ms for 720p on Titan X
 Path tracer from 70ms to 260ms for 1spp
What else could we learn to improve rendering efficiency?
 Superresolution
 Foveated rendering
 Antiliasing
 Multiple Importance Sampling
 Antialiasing for light fields
 Learning Light Transport the Reinforced Way, Dahm and Keller
 Use Q learning to learn importance sampling incrementally during path tracing
Deep Learning: The Future of RealTime Rendering
Reference
 How to prevent overfitting. https://blog.csdn.net/heyongluoyao8/article/details/49429629
Leave a Reply