CMSC 740 Fall 2017 is taught by an awesome teacher Prof. Matthias Zwicker.
Overall, this class is a fantastic introduction to advanced graphics, ray tracing pipelines, as well as recent advances in deep learning.
Projects
The course projects are based on Nori, an educational ray tracer.
Octrees and Ray Traversal
To build an Octree, one could use a stack to mimic the DFS procedure. For non-leaf node, store its 8 children as pointers (8 bytes * 8 = 64 bytes), for leaf node, store a a vector list of N triangle indices (4N bytes).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
Node* build(BoundingBox3f &_box, TriIndexList &_triangles) { while (!m_stack.empty()) m_stack.pop(); if (_triangles.size() < MIN_NUM_TRIANGLES) return new LeafNode(_triangles); BuildStackItem rootItem(_box, _triangles, nullptr, 8); Node* root = new Node(); rootItem.node = root; m_stack.push(rootItem); m_total_nodes = 1; while (!m_stack.empty()) { auto item = &m_stack.top(); if (item->triangles.empty() || item->splitted) { m_stack.pop(); continue; } if (item->triangles.size() < MIN_NUM_TRIANGLES || (item->parent && item->parent->depth > OCTREE_DEPTH_LIMIT)) { ++m_total_nodes; ++m_leaf_nodes; item->node = new LeafNode(item->triangles); if (item->parent != nullptr) item->parent->child[item->child_id] = item->node; m_stack.pop(); continue; } calcSubBoxes(item->box, item->boxes); // assume that a triangle overlaps with an octree cell // if the bounding box of the triangle's vertices overlap with the cell for (const auto &triIndex : item->triangles) { const auto &triBox = m_mesh->getBoundingBox(triIndex); for (int i = 0; i < 8; ++i) if (item->boxes[i].overlaps(triBox)) item->sets[i].push_back(triIndex); } ++m_total_nodes; if (item->node == nullptr) item->node = new Node(item->parent->depth + 1); ++m_interior_nodes; if (item->parent != nullptr) item->parent->child[item->child_id] = item->node; item->splitted = true; for (int i = 0; i < 8; ++i) { m_stack.push(BuildStackItem(item->boxes[i], item->sets[i], item->node, i)); } } return root; } |
For leaf node, traverse the triangle sets; for non-leaf node, traverse the inner nodes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
oid rayIntersect(Ray3f &_ray, Intersection &_its, bool _shadowRay, bool &foundIntersection, uint32_t &resIndex) { rayIntersectNode(m_root, m_box, _ray, _its, _shadowRay, foundIntersection, resIndex); } void rayIntersectNode(Node* _node, BoundingBox3f _box, Ray3f &_ray, Intersection &_its, bool _shadowRay, bool &foundIntersection, uint32_t &resIndex) { if (_node->child.empty()) { LeafNode* leaf = (LeafNode*)_node; #pragma omp parallel for for (int i = 0; i < leaf->m_triangles.size(); ++i) { const auto &triIndex = leaf->m_triangles[i]; float u, v, t; if (m_mesh->rayIntersect(triIndex, _ray, u, v, t)) { /* An intersection was found! Can terminate immediately if this is a shadow ray query */ if (_shadowRay) return; _ray.maxt = _its.t = t; _its.uv = Point2f(u, v); _its.mesh = m_mesh; resIndex = triIndex; foundIntersection = true; } } //} return; } //... } |
To improve the naive implementation, one could sort the triangles, then conduct the intersection.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
int intersects[8]; float dummy; vector<pair<float, int="">> nearT(8); #pragma omp parallel for for (int i = 0; i < 8; ++i) { nearT[i].second = i; intersects[i] = (_node->child[i] != nullptr && boxes[i].rayIntersect(_ray, nearT[i].first, dummy)); if (!intersects[i]) nearT[i].first = INFINITY; } sort(nearT.begin(), nearT.end(), [](const pair<float, int=""> &left, const pair<float, int=""> &right) { return left.first < right.first; }); #pragma omp parallel for for (int i = 0; i < 8; ++i) { int id = nearT[i].second; if (intersects[id] && nearT[i].first < _ray.maxt) rayIntersectNode(_node->child[id], boxes[id], _ray, _its, _shadowRay, foundIntersection, resIndex); } |
OpenMP Notes
1 2 3 4 5 6 7 8 |
#pragma omp parallel for Modify the CMakeList.txt find_package(OpenMP) if (OPENMP_FOUND) set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}") set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}") set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}") endif() |
Monte Carlo Sampling
Sample Warping and corresponding validation
The uniform circle warping is widely used for lots of purposes, so I put it here for reference.
1 2 3 4 5 6 7 8 9 |
Point2f Warp::squareToUniformDisk(const Point2f &sample) { auto two_pi_y = 2 * M_PI * sample.y(); auto sqrt_x = sqrt(sample.x()); return Point2f(cos(two_pi_y) * sqrt_x, sin(two_pi_y) * sqrt_x); } float Warp::squareToUniformDiskPdf(const Point2f &p) { return (p.x() * p.x() + p.y() * p.y() < 1.0f) ? 1.0f / M_PI : 0.0f; } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
Vector3f Warp::squareToUniformSphere(const Point2f &sample) { float z = 1.0f - 2.0f * sample.x(); float r = sqrt(std::max(0.0f, 1.0f - z * z)); float two_pi_y = 2.0f * M_PI * sample.y(); float sin_phi = sin(two_pi_y); float cos_phi = cos(two_pi_y); return Vector3f(r * cos_phi, r * sin_phi, z); } float Warp::squareToUniformSpherePdf(const Vector3f &v) { return (abs(v.x() * v.x() + v.y() * v.y() + v.z() * v.z() - 1.0f) < 1e-6) ? INV_FOURPI : 0.0f; } Vector3f Warp::squareToUniformHemisphere(const Point2f &sample) { float z = sample.x(); float two_pi_y = 2.0f * M_PI * sample.y(); float r = sqrt(1.0f - z * z); float sin_phi = sin(two_pi_y); float cos_phi = cos(two_pi_y); return Vector3f(r * cos_phi, r * sin_phi, z); } |
Code removed to prevent plagiarism.
Hierarchical Sample Warping
Point Lights
Ambient Occlusion
Area Light
Distributed Ray Tracing
Microfacet BRDF
Path Tracer
Notes
Acceleration structures,
Uniform grid
Object subdivision
Bounding Volume Hierarchies (BVHs)
- Partition space into non-overlapping cells
- Each cell stores reference to all objects that overlap it
Spatial subdivision
- Binary space partitioning (BSP) trees
- Recursively divide space into two parts using
- dividing planes (with arbitrary position, orientation)
- k-d-trees, nearest neighbor search
Light Transport Effects
- Fluorescence
- Reflection
- Thin film interference: https://www.shadertoy.com/view/XddXRj
- Diffraction: https://www.shadertoy.com/view/4t23zR
- Refraction
- Dispersion
- Rayleigh Scattering: https://www.shadertoy.com/view/MdXSzX (Subsurface Scattering)
- Polarization
Radiometry
- Quantify spatial energy distribution of light, in a way that is compatible with the geometrical optics model
- Physical measurement of electromagnetic energy, Unit: Watts
- Assume light consists of photons with
- Position x
- Direction of motion w
- Wavelength lambda
- Each photo has an energy of hv
- H Planck’s constant
- V = 1 / lambda frequency
- Measuring energy means “counting photons”
- Spectral Radiance
- Energy per time per wavelength per solid angle per area
- Means differetial area perpendicular to $\omega$
- Power carried along a ray is measured in radiance
- Radiance along ray is constant (in vacuum)
- In practice: assume steady state, measure at discrete wavelengths R, G, B
- Radiance L: power per solid angle per area, vector of 3 values for R, G, B
- Irradiance: Power per unit area
- Integration of radiance over hemisphere
- Irradiance E: power per area perpendicular to the surface normal
- Radiance L: power per area perpendicular to the incident ray \omega
- When we do ray tracing, we use radiance for rays
- Integration in spherical coordinates
- Radiant intensity I in all direction, has a total power
Photometry
- Perceptual measurement of perceived brightness
- Lumen
BRDF (bidirectional reflectance distribution function), or BSDF (bidirectional scattering distribution function)
For every pair of light and viewing directions, BRDF gives fraction of transmitted light
Diffuse reflection: equally distributed reflections
Glossy reflection: majorly reflected to mirror-direction, quickly fall off in a cone
BRDF is fraction of reflected radiance over differential irradiance, instead of outgoing irradiance over incoming radiance
Differential irradiance comes from a cone of directions around incident direction:
SDF: https://www.shadertoy.com/view/XlKSDR
- BRDF cannot do volume scattering, because it relies on the surface normal. It’s a definition depending on the surface normal.
- BRDF cannot do diffraction nor polarization.
- BRDFs are wavelength dependent, four-dimensional vectors
Helmoltz reciprocity: can exchange role of light source and observer.
Representations
- As a table of values over incident and outgoing directions
- Analytic BRDF models
- Represent BRDFs as analytic functions
Gonio reflectometer by Cornell University is a device to measure and tabulate BRDFs, a robot with light source and camera, measures reflection for each light/ camera direction
Analytic BRDFs
- Diffuse reflection: reflected radiance is equal over all directions
- Physical model: Lambert’s cosine law
- Diffuse BRDF is just a constant
- Specular reflection and refraction
- Dielectrics (Glass, plastic, diamond, water, air) and conductors (metals)
Index of refraction
- Air 1.00029, Water: 1.33 Acrylic glass: 1.49
Fermat’s principle
- The actual path between two points taken by a beam of light is the one which is traversed in the least time.
Snell’s law
- Ratio of sines of angles of incidence and refraction is equal to opposite ratio of indices of refraction
- $\frac{\sin \theta_1}{\sin \theta_2} = \frac{n_2}{n_1}$
Torrance-Sparrow model
Vector form to obtain refracted direction r
Fresnel equations describe fraction of intensity of light that is reflected and refracted
Derived by solving the Maxwell equations
Takes into account polarization of light
Different versions for dielectrics and conductors
Full equations for dielectrics see
In graphics, often use Schlick’s approximation
Refraction coefficient is
OK for both dielectrics and conductors
Specular BRDF
Micro facet models for glossy reflection
- Assume surface consists of randomly oriented small mirrors (micro facets)
- Statistical distribution of micro facet orientations determines appearance, glossiness / shininess of reflection
Torrance-Sparrow model
- Physically-based model for specular highlight due to micro facet distribution
- Parameter: probability distribution over micro facet orientations
- Half-vector between
- D usually function of dot product of
- Distribution of microfacet normals determines surface roughness
- Sharp or smooth specular highlight
Masking and shadowing
Rough surface
Reflection equation
So far: directional form of reflection equation
Change of integration variables
Monte Carlo integration
General technique for numerical integration
Hemisphere has two dimensions
- Advantages
- Easy to implement
- Robust, can integrate almost any function
- Efficient for high dimensional integrals
- Disadvantages
- Variance (noise)
- Slow convergence
Probability concepts
- Random variable X
- Cumulative distribution function (CDP) P
- Probability density function (PDF is the derivative of CDP
- The standard deviation of the estimator
- The expected error converges with
- Important sampling
- With general probability density
- Convergence still slow
- Observe: is proportional to
- Idea: choose f and p to be as similar as possible to reduce variance!
Area Lights
Given same number of samples, surface form of reflection equation looks better
Illumination from environment
In real world, at each point in scene light arrives from all directions
Assume for now: incoming radiance from all directions is given
For example, environment maps
Store “omni-directional” incoming radiance as images
Each pixel corresponds to radiance from a certain direction
Later: Global illumination
Illumination from environment
Assumption for environment maps
Independent of point x, source of illumination is “infinitely distant”
Given by environment
Global Illumination
The Rendering Equation
Convergence, because of energy conservation (each bounce absorbs some light)
Reflected radiance appears as unknown on both sides of equation
Solution: find radiance such that reflection equation is satisfied simultaneously at each point
and direction
Based on approximate physical model (geometric optics)
Cannot model Wave effects such as polarization and diffraction
So far: reflection equation
Givenincident light over hemisphere and BRDF at point x,
what is reflected light , which is a radiance distribution: reflected radiance at each point
And in direction
For example, incident radiance given by environment map
Direct illumination from area and point lights, but not multiple bounces.
Incident light at one point (a) depends on reflected light at other point (b)
Direct only vs Direct & indirect
Think of both reflected and incident radiance as *unknown*
Reflection equation expresses equilibrium between incident and reflected radiance
Radiance doesn’t change along the ray in vacuum
Light sources are represented by known function
is emitted radiance at each surface point x in each direction
Monte Carlo Path Tracing
Naïve approach
- Recursive ray tracing as for mirror reflection, but shoot many rays in each step
- Distribute rays over hemisphere at each step
- Problem: exponential explosion of number of rays
- Exponentially more longer paths than shorter paths
- But longer paths contribute less to image than shorter ones
Smarter
- Compute radiance as “integral over all light paths that connect light sources and eye”
- Paths have different lengths (number of bounces)
- Con formulate one integral for all paths of each length
- Approach
- Sum over all path lengths
- Integral of radiance transported along all paths of given length
- Use Monte Carlo integration
- Sum over all path lengths
Path tracing
- Want to compute radiance through each pixel in image
- Sum of radiance over all light paths connecting light and camera
- Light paths have different lengths
Native approach: average of both techniques
- Variance is bounded by larger variance of two techniques
Multiple importance Sampling
- Weighted average, with weights that reduce variance in optimal way
- N sampling techniques
This is an unbiased estimator of integral, as before!
Weights for provable variance reduction
- Balance heuristics
- Power heuristics
- Details, proofs
Multiple importance sampling
Naïve sampling
Optimal sampling
N-rooks (Latin hypercube) sampling
Advantage over jittered grid: can generate any number n of samples, not restricted to n x m grid
Quasi-Monte Carlo
Based on low-discrepency instead of pseudo-random samples
Low-discrepancy sequences have more uniform distribution of points, less clumping
Verices Position
UV
Normals
- Deferred shading
- 1500×1500
- 750×750
- 1500×1500
- Classical Foveated Rendering
- Speedups
- Nvidia foveated rendering
- Shift
- Logmap foveated rendering
- F
- Directly working in the logpolar rendering
- F
- Find a big model
- Deferred shading is slower than what Nvidia is doing
- Likely to get some returns from the original log-polar
- Cut problem
- Use 2 or three cuts
Hierarchical Z-Buffer Visibility
http://dl.acm.org/citation.cfm?id=166147
Can show theoretically that convergence improves
Before: Radiance along straight rays through volume is constant
Now
Participating media: volumetric media influence radiance along the rays
Light interaction with participating media
Absorption
- Absorption coefficients
- Material property
- Related to absorption cross section
- Rate of absorption, unites are 1 / meter
- “Probability per unit length that photon is absorbed”
Transmittance is “fraction of light transported over a certain distance along a ray”
Optical thickness
Transmittance
Phase functions: examples
Rayleigh scattering: scattering from particles smaller than wavelength of light
For example, molecules in atmosphere
Detail see, e.g., wikipedia
Mie (Lorenz-Mie) theory: scattering by spherical particles
Henyey-Greenstein phase function is an empirical phase function , and is useful for many phenomena (water, clouds, skin, stone)
Average phase angle g, asymmetry parameter
Ray marching is an approximation of integral
It has a limited number of samples, and uniformly distributed locations along the ray, could render using brute force Monte Carlo path tracing
Very expensive
Idea: describe scattering properties of material using an extension of BRDFs
So far: BRDFs, light scatters exactly at the same point where it hits the surface
Subsurface scattering: light enters the material, bounces around, leaves at a different place
BSSRDF: bidirectional surface scattering reflectance distribution function
How to acquire a 3D point clouds
- 3D point cloud acquisition
- Point cloud alignment
- Active methods
- Interfere with reconstructed object
- Mechanically rotate the object to acquire the full geometry
- Radiometrically (active illumination) project laser line
- Accurate
- Slow, need to sweep object with laser
- Need calibrated light source-camera pair
- Doesn’t work if object doesn’t reflect light source
- Passive methods
- Sensing only
- Image sensors, cameras
- Replace laser by second camera
- Known baseline
- Fast, video input, produce real-time depth video
- No active illumination required, just cameras
- Can be accurate with high resolution cameras
- Hard to make stereo matching work rebustly in general settings
- Assume accurate camera calibration
- Triangulation
- How to calculate the height of a sea island
- Disparity inversely related to depth
- This is due to projective projection
- Z = fB / d
- Multiview reconstruction
- Structure from motion, photogrammetry
- Given few correspondences in images (features)
- Compute location and orientation of all cameras
- 3D location of features
- Time of flight (active)
- Similar principle as radar
- LIDAR, TOF cameras, Kinect one (v2)
- Send pulse, derive distance based on round-trip time of pulse
- LIDAR (light detection and raging)
- Use visible light pulse instead of radio frequency pulse
- Scan scene point-by-point
- Up to 100’000 points / sec
- Long distances possible
- Long distances, kilometers, slow, relatively accurate
- Measure complete image at once
- Modulate illumination with radio frequency signal and measure phase shift
- Point cloud alignment ICP
- Most scanning algorithms acquire objects piece by piece (e.g. different viewpoints)
- Problem: align pieces to complete full geometry
- Two problems:
- Correspondence: which points in data should be aligned to which points on model?
- Given correspondence, how to rotate / translate to best align parts?
- ICP (Iterative closest point)
- Find initial correspondences using nearest neighbor
- Brute force
- For each point in data, traverse all points in model to find closed point
- Complexity quadratic in number of points
- More efficient
- Use spatial data structure for points in model to efficiently query nearest neighbors
- For example, kd-trees or Octrees
- Brute force
- While distances between corresponding points too large
- Estimate rotation / translation
- Transform using rotation / translation
- Update correspondences using nearest neighbor
- Alignment
- Goal: given correspondences, find rigid transformation that minimizes distances between corresponding points in data and model
- Exact alignment may not be possible because of noise
- Many algorithms, here two basic ones
- Minimizing point-to-point distances
- Points p, in data, p’ in model, 3×1 vectors
- Rotation matrix R in R 3×3
- Translation vector t in R 3×1
- Find the best rotation R, T that minimize squared point-to-point distances
- Challenges
- Error is linear function of R and t
- But R needs to be constrained to be rotation matrix
- Orthonormal
- Optimization problem is non-linear least squares problem
- Translation
- Can show that aligned point clouds have same centroid
- For the best rotation rotation R, translation t
- Define coordinates q, relative to the centroid
- Find SVD of H
- Can show that aligned point clouds have same centroid
- Minimizing point-to-point distances
- Point to plane distances
- Material from low
- Assume known surface normals
- Measure distances to tangent planes
- Infinitesimal rotation, generators of rotation
- Omit higher order (quadratic) terms
- Small rotation of vector (x, y, z)
- Find initial correspondences using nearest neighbor
Linear Least Squares Problem
- Combinational approach
- Voronoi diagram
- Function fitting
- Implicit function in 3D
- SSD: Smooth signed distance function
- Reconstruction
Generative models
- Morphable face models
- Space of human body shapes
- Shape are texture vectors
Deep Learning
Concepts
- Layers
- Tensor
- Activation function
- https://www.shadertoy.com/view/4lccRB
- Logistic sigmoid function: $frac{1}{1+e^{-x}}$
- The function is differentiable, monotonic
- unit step
- sign
- linear
- piece-wise linear
- hyperbolic tangent
- ReLU (Rectified Linear Unit)
- Advantages: piecewise linearity, large gradient, fast convergence
- ReLU will make the output of a part of neurons 0, which will cause the sparseness of the network;
Disadvantages: Some neurons may never be activated, causing the corresponding parameters to never be updated.
- The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01 or so.
- Feature map – the output of one filter applied to the previous layer
- Convolutional layer (convolutional neural networks, CNN)
- Convolutional layers apply a convolution operation to the input, passing the result to the next layer. The convolution emulates the response of an individual neuron to visual stimuli.
- Fully connected layer
- Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular neural networks. Their activations can hence be computed with a matrix multiplication followed by a bias offset.
- Degree of freedom is very small
- 3×3, 5×5
- Loss functions
- The loss function computes the error for a single training example.
- The cost function is the average of the loss functions of the entire training set.
- L1 (Mean Absolute Error)
- $$ C = C_0 + \frac{\lambda}{n} \sum_w |w| $$
- cannot compute derivative at 0
- L2 (Mean Squared Error)
- $$ C = C_0 + \frac{\lambda}{2n} \sum_w w^2 $$
- smaller w means complexity of the model is lower
- Cross-entropy loss, or log loss (soft-max), measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.
- Stochastic gradient descent (SGD)
- use ONLY ONE or SUBSET of training sample from your training set to do the update for a parameter in a particular iteration
-
12345678initialize network weights (often small random values)<b>do</b><b>forEach</b> training example named exprediction = <u>neural-net-output</u>(network, ex) <i>// forward pass</i>actual = <u>teacher-output</u>(ex)compute error (prediction - actual) at the output units<span class="nowrap">compute <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><annotation encoding="application/x-tex">{\displaystyle \Delta w_{h}}</annotation></semantics></math></span></span></span>
123<span class="nowrap"><span class="mwe-math-element"><img class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/0f84dc6434bf1fcc9fc997866e03b7ddfc5a5234" alt="\Delta w_h" aria-hidden="true"></span> for all weights from hidden layer to output layer</span> <i>// backward pass</i><span class="nowrap">compute <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><annotation encoding="application/x-tex">{\displaystyle \Delta w_{i}}</annotation></semantics></math></span></span></span>
1234<span class="nowrap"><span class="mwe-math-element"><img class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/238dcf3322345baf33ca8d6ccb1a129fd18809ac" alt="\Delta w_i" aria-hidden="true"></span> for all weights from input layer to hidden layer</span> <i>// backward pass continued</i>update network weights <i>// input layer not modified by error estimate</i><b>until</b> all examples classified correctly or another stopping criterion satisfied<b>return</b> the network - SGD: suitable for large sample size, requires small memory; but each step may not be toward the optimal solution direction;
- Newton method: fast convergence; but there are strict requirements on the objective function, there must be a continuous one. The second-order partial derivative has a large amount of calculation.
- Pros: converges much faster compared to GD
- Cons: the error function is not as well minimized as in the case of GD
- Batch size (64, 128, or 256)
-
Size of subset as an approximation of the whole databaseIt, which impacts the time efficiency of training and the noisiness of the gradient estimate. Updating the parameters using all training data is not efficient. Updating by one single sample (online updating) is noisy if the sample is not a good representation of the whole data.
-
-
Batch normalization
- It is always great if the layer get data that is always normalized(mean 0 variance 1).
- We just normalize each batch with its running mean and variance.
- Batch normalization potentially helps in two ways: faster learning and higher overall accuracy. The improved method also allows you to use a higher learning rate, potentially providing another boost in speed.
- Automatic differentiation
- Automatic differentiation is a general technique that numerically evaluates the derivative of a function
- Error back-propagation
- Compute the gradients of a loss function using the chain rule
- Proceed ‘backwards’ with respect to the computations performed to compute the loss itself
- Feed-forward and recurrent neural network
- Feed-forward
- The connections between the units do not form a cycle
- Recurrent Neural Network
- Sequence of words and images, unsegmented, connected handwriting recognition, or speech recognition
- Feed-forward
- Autoencoder / autoassociator / Diabolo network
- Used for unsupervised learning of efficient coding, earn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction, widely used for learning generative models of data.
- Encoder / Decoder
- The encoder encodes the input sequence to an internal representation called ‘context vector’ which is used by the decoder to generate the output sequence.
- The lengths of input and output sequences can be different, as there is no explicit one on one relation between the input and output sequences.
- An Encoder/Decoder is not an autoencoder
- Reinforcement learning
- Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).
- Reinforcement learning is called approximate dynamic programming, or neuro-dynamic programming
- Limitations
- Firstly, it is often too memory expensive to store values of each state, since the problems can be pretty complex. Solving this involves looking into value approximation techniques, such as Decision Trees or Neural Networks. There are many consequence of introducing these imperfect value estimations, and research tries to minimise their impact on the quality of the solution.
- Moreover, problems are also generally very modular; similar behaviours reappear often, and modularity can be introduced to avoid learning everything all over again. Hierarchical approaches are common-place for this, but doing this automatically is proving a challenge. Finally, due to limited perception, it is often impossible to fully determine the current state. This also affects the performance of the algorithm, and much work has been done to compensate this Perceptual Aliasing.
- Skip connections
- Residual network
- u-net
- Classification
- Object detection: YOLO, R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN, SSD, etc.
- The YOLO algorithm directly uses a convolutional neural network to output the location of the object and its associated category. It is an end-to-end system, so the detection speed is extremely fast and can meet real-time requirements.
- Regression
- RNN
- A recurrent neural network (RNN) is a class of artificial neural network where connections between nodes form a directed graph along a sequence. This allows it to exhibit temporal dynamic behavior for a time sequence.
- LSTM (Long short-term memory (LSTM) networks were discovered by Hochreiter and Schmidhuber in 1997 and set accuracy records in multiple applications domains)
- Siamese Network
- A Siamese networks consists of two identical neural networks, each taking one of the two input images.
- The last layers of the two networks are then fed to a contrastive loss function , which calculates the similarity between the two images.
- The contrastive loss function is a Distance-based Loss function (as opposed to prediction error-based Loss functions like Logistic loss or Hinge loss used in Classification).
- Same network shared the same weights
- Suppose you have cross-domain data
- Generative Adversarial Networks
How to prevent overfitting?
- Early stopping
- No improvement in N epoches
- 1 epoch
- Go through the entire training set once
- Mini-batch
- How much data do you put in one batch?
- Extending / Augmenting the datasets
- Acquire more
- Copy data and add noise
- Resampling
- Estimate the distribution of the data and generate
- Regulation / Loss Function
- L1 (Mean Absolute Error)
- $$ C = C_0 + \frac{\lambda}{n} \sum_w |w| $$
- cannot compute derivative at 0
- L2 (Mean Squared Error)
- $$ C = C_0 + \frac{\lambda}{2n} \sum_w w^2 $$
- smaller w means complexity of the model is lower
- Cross-entropy loss, or log loss (soft-max), measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.
- L1 (Mean Absolute Error)
- Dropout (trick)
Denoising Monte Carlo Renderings
Kernel-Predicting Convolutional Networks for Denoising Monte Carlo Renderings
Specular has large dynamic range
- Take advantage of additional information (besides RGB) produced by renderer
- Network
- 8 hidden layers, 100 kernels of 5×5 in each layer
- L1 loss
- Training data
- 600 frames from movie “Finding Dori”
- Noise: 32 spp
- Reference: 1024 spp
- 600 frames from movie “Finding Dori”
- Test data
- 25 frames from other movies
In terms of timing, for an HD image of 1920×1080, our network takes about 12 seconds to evaluate and output a full denoised image
Denoising animations
Frame-by-frame denoising leads to temporal flickering in animations
Need to filter in 3D space-time
How to implement this with CNNs?
Chaitanya et al. Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder, SIGGRAPH 2017
Input to network
- RGB colros
- G-buffer with view-space shading normals, depth, material roughness
- Divided by diffuse albedo (remove texture complexity from input)
Loss
- Spatial: L1, L1 on spatial gradients
- Temporal: L1 on temporal gradients
- Back propagation through time
- Replicate feed forward parts to unroll recurrent connections
- “Adam” SGD solver
- Gradients are noisy in SGD
- Compute moving average over SGD steps
- feed-forward network
Input with 1spp
- 54.9ms for 720p on Titan X
- Path tracer from 70ms to 260ms for 1spp
What else could we learn to improve rendering efficiency?
- Superresolution
- Foveated rendering
- Antiliasing
- Multiple Importance Sampling
- Antialiasing for light fields
- Learning Light Transport the Reinforced Way, Dahm and Keller
- Use Q learning to learn importance sampling incrementally during path tracing
Deep Learning: The Future of Real-Time Rendering
Reference
- How to prevent overfitting. https://blog.csdn.net/heyongluoyao8/article/details/49429629
Leave a Reply