Large Scale Planet Rendering

In the last few months I’ve been working on a rendering engine capable of rendering an entire planet. I’ve seen many solutions for procedural planet rendering (generating the height map on the fly), so I was focusing on a different problem: how to render a planet using already existing elevation data, and which is the most efficient way to store that data at large scale.



I measured the performance on my (at this point, quite weak) laptop:
CPU: i7-6700HQ 2.6 GHz
GPU: NVIDIA Geforce GTX 950M, 4 GB
HDD: 1TB, 7200 RPM

First, some words about the rendering: the geometry is divided into meshlets (triangle batches containing 64 triangles) to make the pipeline suitable for mesh shaders, and to allow further optimizations like meshlet-culling which is not implemented yet. Currently, there is space allocated for 16384 meshlets, and the “culling” shader doesn’t do much, only decides which meshlet slots are active, and fills the indirect command arguments accordingly.

I measured the GPU performance using PIX. Rendering the above frame (during movement) takes about 11 ms. Most of it (~9.5 ms) is spent in the ExecuteIndirect call that draws the planet with a simple forward rendering method, at the moment. Sky rendering is ~1 ms. The remaining 0.5 ms is for resource barriers and generating new requested meshlets (vertices, normals, uv coords).

The tessellation, lod selection is done by the CPU, on the main thread. I measured the lod update with _rdtsc(). The results were unfortunately quite unstable, but it’s worth noting that the measured cycles were almost always below 1.5 million cycles, even when travelling with high speed close to the ground. When the camera is way above the ground, like in the beginning of the video, the measured values are around 100-200 thousand cycles.

It’s important to note that the loading of the height data from file to gpu memory happens on a different thread, only the requests get issued on the main thread.


Beside the main program, I made some tools for various purpose, I’ll describe them briefly.

  • Shader Compiler: putting together a graphics (even a compute) pipeline in D3D12, allocating descriptor tables, setting the descriptors, binding the descriptor tables, setting root constants could be a pain in the ass. It’s not the worst thing in the world, but it’s better to have something that fprintfs some C++ code based on a short description of the layout, rasterizer settings, render targets, depth-stencil state, struct definitions in hlsl, etc… That’s what this tool does. Also, it compiles the HLSL code using D3DCompile(). Every other program that has shaders uses this tool to compile them.
  • Height Generator: this tool converts the raw elevation data organized in HGT files to the format I use directly for rendering. After developing, and debugging it, it took ~24 hours to generate the necessary data with it.
  • Sky Generator: this tool is for generating the textures required for sky rendering and aerial perspective.

Design Philosophy, Coding

Working on this project the main focus was on designing algorithms, data structures while taking account the capabilities of the hardware. So the focus was not on coding, in fact I wanted to avoid code-related design issues at all. For this I had two major “rules”:

  • Making sure that iteration is as fast as possible, that is, I can change algorithms, data structures, hard-coded constants quickly without thinking about the code and waiting for compilation too much.
  • Writing easily understandable code, that is, by looking at the code it should be relatively easy to figure out what the hardware does when runs that code, what data it loads, how it modifies it, and what data it writes.

In order to achieve these, I don’t use templates, STL, the concept of RAII, or any third party library beside the Win32 API, D3D12, XInput, stb_image.h, and some standard C functions (math functions, functions for string manipulations, memcpy). I use lot’s of raw pointers, relative pointers and plane old data structures with global functions. I have only one compilation unit per program which can be built by running a batch file. For text-editor and debugger I use Visual Studio 2017.

With these concepts, I can compile the main program and the tools from scratch in about 2-3 second per program, using -Od (with -O2 it takes only slightly longer).

Computing the SGGX normal distribution and the projected area of an ellipsoid


Let M \subset\mathbb{R}^3 be an oriented, differentiable 2-manifold, and N: M\to \mathbb{S}^2 its Gauss map (where \mathbb{S}^2 is the 2-sphere). For such a surface let \omega_M\in\Omega^2(M) be the volume form generated by the embedding M\hookrightarrow\mathbb{R}^3, that is, \omega(X,Y) = \langle X\times Y, N(p)\rangle for X,Y\in T_pM. Let’s assume N is invertible. The normal distribution D:\mathbb{S}^2\to\mathbb{R} of M is defined by the equation

\left( N^{-1} \right)^* \omega_M = D\omega_{\mathbb{S}^2}

that is, it’s the coordinate of the pull-back of the volume form of M by the inverse of the Gauss map, expressed in the basis defined by the volume form of the 2-sphere.

Next, let’s defined the projected area of M along a given direction n\in\mathbb{S}^2. The projection associated to n can be defined by the volume form \mu_n\in\Omega^2(M), \mu_n(X,Y) = \langle X\times Y, n\rangle . The projected area of M along n can be defined as

\sigma_M(n) = \int\limits_{M_n} \mu_n, \qquad where \quad M_n = \left\{ p\in M \, | \, \langle N(p), n \rangle > 0\right\}

Computing the SGGX normal distribution

Let’s compute the normal distribution for the case where M is an ellipsoid defined by

\mathbb{E} = \left\{ p\in\mathbb{R}^3 \,| \,\|S^{-1}p\|^2 = 1\right\}

where S is a symmetric, positive definite matrix with eigenvalues a,b,c, which are the sizes of the main axes of \mathbb{E}. For a given n\in\mathbb{S}^2, let’s choose vectors X,Y\in T_n\mathbb{S}^2 such that X,Y, n form an orthonormal basis of \mathbb{R}^3. Then we have

(D\omega_{\mathbb{S}^2})(X,Y) = D(n)\langle X\times Y, n \rangle = D(n)\langle n, n\rangle = D(n),

since the Gauss map of the 2-sphere is the identity map. So we got

D(n) = ( (N^{-1})^* \omega_{\mathbb{E}})(X,Y) \\ \quad = \omega_{\mathbb{E}}(  N^{-1}_*X,  N^{-1}_*Y) \\ \quad = \langle  N^{-1}_*X \times N^{-1}_*Y, N(N^{-1}(n)) \rangle \\ \quad = \langle N^{-1}_*X \times N^{-1}_*Y, n) \rangle

Now let’s compute the map N^{-1}_* : T\mathbb{S}^2\to T\mathbb{E}. This map is the multiplication by the Jacobian of N^{-1} = (p\mapsto\frac{S^{-2}p}{\|S^{-2}p\|})^{-1} = n\mapsto\frac{S^2n}{\|Sn\|} .

This Jacobian can be written as

\frac{\|Sn\|^2S^2-S^2n(S^2n)^T}{\|Sn\|^3} ,

and after some straitforward calculation we get

N^{-1}_*X \times N^{-1} _*Y = \frac{det(S^2)}{\|Sn\|^4}S^{-2}\left(\langle S^2n,n\rangle n + \langle S^2n,X\rangle X + \langle S^2n,Y\rangle Y\right) = \frac{det(S^2)}{\|Sn\|^4}n

where we used that X,Y,n is an orthonormal basis. Putting these together, we have

D(n) = \frac{det(S^2)}{\|Sn\|^4} = \frac{vol(\mathbb{E})^2}{\pi^2\|Sn\|^4}

Computing the projected area

Now let’s fix a direction n\in\mathbb{S}^2, and compute the projected area \sigma_\mathbb{E}(n). Note that the matrix S viewed as a linear transformation, gives a diffeomorphism \hat{S}: \mathbb{S}^2 \to\mathbb{E}.

This diffeomorphism can be used to compute the projected area by transforming the integral to the 2-sphere:

\sigma_M(n) = \int\limits_{M_n} \mu_n = \int\limits_{\hat{S}^{-1}(M_n)} \hat{S}^*\mu_n

Using that the Gauss map is N(p)= \frac{S^{-2}p}{\|S^{-2}p\|}, a simple calculation shows that

\hat{S}^{-1}(M_n) = \left\{ m\in\mathbb{S}^2\,|\, \langle m, S^{-1}n \rangle > 0 \right\}

Moreover, for X,Y\in T\mathbb{S}^2, we have

(\hat{S}^*\mu_n)(X.Y) = \mu_n(\hat{S}_*X,\hat{S}_*Y) \\= \mu_n(SX,SY) \\ =\langle SX\times SY, n\rangle \\ = \langle det(S)S^{-1}(X\times Y), n\rangle \\ = \langle X\times Y, det(S)S^{-1}n\rangle

We can express this volume form as \hat{S}^*\mu_n = f\omega_{\mathbb{S}^2} for some function f:\mathbb{S}^2\to\mathbb{R}. We can compute f by, for example, choosing a point m\in\mathbb{S}^2 and X,Y\in T_m\mathbb{S}^2 such that X, Y, m form an orthonormal basis. Then \omega_{\mathbb{S}^2}(X,Y) = 1, so

f(m) = (\hat{S}^*\mu_n)(X.Y) \\= \langle X\times Y, det(S)S^{-1}n\rangle \\ = \langle m, det(S)S^{-1}n\rangle

By setting v = \frac{ S^{-1}n }{ \|S^{-1}n\|}, we have f(m) = det(S)\|S^{-1}n\|\langle m, v\rangle, and the projected area is

\sigma_M(n) = \int\limits_{\{\langle m, v\rangle\ > 0\}}  f(m) \omega_{\mathbb{S}^2,m} \\ = det(S)\|S^{-1}n\|\int\limits_{\{\langle m, v\rangle\ > 0\}} \langle m, v\rangle\ \omega_{\mathbb{S}^2,m} \\= det(S)\|S^{-1}n\|\pi \\= vol(\mathbb{E})\|S^{-1}n\|

Resource synchronization in Direct3D 12

I’ve been working on an implementation of a mass-spring system using Hamiltonian mechanics which I’d like to apply for hair, fur simulation in the future. I’ll probably write a post about it (it depends on how it goes), but in the meantime, I thought I write something about how I handle resource synchronization between GPU and CPU in my current project (link) (this is the project I use for experimentation, for example the code for my previous posts about normal mapping is in this project).

I use Direct3D 12 which provides the below objects and functionalities that are relevant for us at the moment:

  • ID3D12Resource: This object represents any kind of resource on the GPU (meshes, textures, any piece of memory, or even just a “virtual” resource without memory binding to it).
  • ID3D12CommandQueue: This object represents a command queue of the GPU, it can be used for dispatching commands (e.g. prerecorded command lists, instruction for presentation) to the GPU.
  • ID3D12Fence: This is the main object for synchronization. A fence is just an atomic 64-bit unsigned integer which can be set (both by the CPU and the GPU), can be read (by the CPU), and can be used to block a thread (a thread on the CPU or an execution of a command queue on the GPU) until the fence reaches a certain value.

. Command Queue

I use the struct below to represent command queues.

struct CommandQueue
	ID3D12CommandQueue* d12commandQueue;
	u64 lastSignaledFenceValue;
	ID3D12Fence* d12fence;

Here’s how it works: after the required commands are dispatched to the d12commandQueue, to be able to tell later whether the GPU has finished the execution of the these commands, a synchronization point can be made in the queue by increasing the lastSignaledFenceValue by one, and dispatching a signal command to the d12commandQueue which sets the d12fence to the lastSignaledFenceValue. This process is done by the function called signalNext.

static void signalNext(CommandQueue* commandQueue)

So in the future, to check whether the GPU has executed the dispatched commands, we have to check whether (the current value of) lastSignaledFenceValue is smaller or equal than (the value in the future of) d12fence.

Of course, in theory, the fence can overflow, but since 2^{64} nanoseconds is about 585 years, overflow is basically impossible in practice.

Tracked Resource

I use the structs below to represent a synchronizable resource:

struct FenceSlot
	ID3D12Fence* d12Fence;
	u64 requiredFenceValue;

struct TrackedResource
	ID3D12Resource* d12Resource;
	FenceSlot useFenceSlots[4];
	FenceSlot modifyFenceSlot;
	D3D12_RESOURCE_STATES stateAfterModification;

A fence slot is just a fence associated with a fixed fence value. A fence slot is considered valid if the fence hasn’t reached the given value, and it is invalid otherwise. A fence slot can be waited on by waiting until the fence reaches the given value. A valid fence slot of a resource represents that the resource is being used by a queue.

A resource (in general, not only here) should be modified (write) by only one thing at once, and not used by any other thing during modification, and can be used (read) by multiple thing at once, while it’s not being modified. Keeping this in mind, a TrackedResource object has one fence slot which indicates that the resource is being modified, and four fence slots which indicate that the resource is being used. (Of course the number four is completely arbitrary here, and in fact, the fence slots don’t have to be stored inside the resource as an array, but as a forward list, allocating them dynamically from a memory pool for example, if necessary.)

Additionally, the TrackedResource struct has a D3D12_RESOURCE_STATES member, which indicates the current state of the resource. More precisely, it indicates the state of the resource after a modification finished, that is, after the fence slot used for modification becomes invalid.

Functions for resource synchronization

The two key functions for assuring proper synchronization of the resources are the markUse and markModify functions.

static void markUse(
	TrackedResource* resource,
	CommandQueue* commandQueue

static void markModify(
	TrackedResource* resource,
	CommandQueue* commandQueue,
	ID3D12GraphicsCommandList* commandList,
	D3D12_RESOURCE_STATES stateToModify

To mark that a resource will be used (but not modified) by a command queue, we have to call the markUse function. First, this function checks whether the modifyFenceSlot of the resource is valid. If it is, it will make the queue wait for the modifyFenceSlot. After that, the function records the commandQueue->d12fence and (commandQueue->lastSignaledFenceValue + 1) values to a (currently invalid) useFenceSlot of the resource. (So the function assumes that the queue will use the resource until its fence reaches the next fence value to be signaled. This is also assumed by the markModify function.)

To mark that a resource will be modified by a command queue, we have to call the markModify function. First, this function makes the queue wait for all the valid fence slots (modify and use) of the resource. Then it records the commandQueue->d12fence and (commandQueue->lastSignaledFenceValue + 1) values to the modifyFenceSlot of the resource. Also, if the stateToModify parameter differs from the value of resource->stateAfterModification, the function records a resource barrier to the commandList which makes the proper state transition of the resource and sets resource->stateAfterModification to stateToModify.


In my project this synchronization model is pretty useful. It’s also simple, requires only little code and during debugging it’s relatively easy to see what’s going on. Of course, it has to be used cautiously, otherwise deadlocks can easily occur. For example, after calling markUse and/or markModify on some resources, one has to call the signalNext function on the command queue, otherwise the fence slots set by markUse and/or markModify will stay valid forever.

However, I encountered a pretty nasty deadlock that I considered a bug, not a misuse of the model. Perhaps you’ve already found it… Let’s consider for example what happens if we call markModify on a resource, and after that, in the same “pass” (before calling signalNext on the command queue, so the lastSignaledFenceValue of the queue is unchanged) we call markUse or markModify again on the same resource. In this case the queue is forced to wait for a fence value that should be set by it later… But the solution to this problem is pretty easy: the markModify function just has to check that all the fence slots that we make the queue wait on is not the same as the new fence slot that is being recorded.

Normal Mapping (part 2)

This post is the continuation of the previous post. (link)

In this post, I will continue from where I finished the previous post. First, I will talk about how to compute the normal vectors for a surface defined by a height map, and how they get transformed when we transform the underlying height map. After that, I will derive to proper formula for the normal vector of a surface whose vertices are offset with a height map. I used this formula for computing the normal in the project I talked about in the previous post. (For more info, please check the previous post.)

Connection between the normal and the height map

Let’s take a height map, i.e. a h: [0,1]^2\rightarrow\mathbb{R} function. Let’s assume that this function is smooth. The graph of this function, \tilde{h}: [0,1]^2\rightarrow\mathbb{R}^3  , \tilde{h}(u,v)=(u,v,h(u,v))^T  defines a smooth surface in \mathbb{R}^3  .

What is the normal vector of this surface? Or in other words, what is the normal map, corresponding to the height map h  ? To calculate that, we can take the cross product of the tangent and bitangent vectors, and normalize the result. So, for the unnormalized normal vector (denoted by \hat{n}  ) we have

\hat{n}=\partial _u\tilde{h}\times\partial _v\tilde{h}=\begin{pmatrix} 1 \\ 0 \\ \partial_uh\end{pmatrix} \times\begin{pmatrix} 0 \\ 1 \\ \partial_vh\end{pmatrix} =  \begin{pmatrix} -\partial_uh \\ -\partial_vh \\  1 \end{pmatrix}

and for the normal vector, we have n=\frac{\hat{n}}{\|\hat{n}\|}  . Now, what happens if we scale the height map by some number a\in\mathbb{R}  , that is, we consider the height map (u,v)\mapsto ah(u,v)  ? The partial derivatives will get multiplied by a  as well, so to compute the new normal from the old one, we have to do the following steps:

  1. Divide the old normal by its z coordinate to get the old unnormalized normal back.
  2. Multiply the first two coordinates of the old unnormalized normal by a  to get the new unnormalized normal.
  3. Normalize the new unnormalized normal.

As you can see, the normal vector does not behave so nicely during the transformation of the height map, the partial derivatives (gradient) behave in a much simpler fashion. In fact, if we apply a more complicated transformation to the height map (for example multiply, add, divide two height maps together, take the exponential, logarithm, sine, square root of a height map, or whatever you like), the gradient of the new height map can be computed relatively easily and intuitively from the old gradient map, using the chain rule for derivation. Then the transformed normal map can be computed from the transformed gradient map.

In my opinion, it’s always more straightforward to work with the gradient map directly instead of the normal map. It’s not only that gradient maps can be transformed in a more naturally way, but in case of linear sampling and mip level generation they produce more correct results. This easily follows from the fact that differentiation is a linear operation, but the operation of taking the normal vector is not. For example, in this project, to compute the mip levels for a normal map I generate all mip levels of the underlying height map, and generate each level of the normal map from the corresponding level of the height map, instead of generating the biggest level of the normal map from the height map, and generating the mip levels of it by downsampling.

Also, as we will see below, if we want to apply normal mapping properly, what we need directly is not the normal map of the height map, it’s rather the gradient map of it.

Normal mapping

So let’s see, what we have and what we want. We have a surface r: [0,1]^2\rightarrow\mathbb{R}^3  . As I wrote at the end of the previous post, we can assume that the texture coordinates of a point r(u,v)  is (u,v)  . We also have a height map h: [0,1]^2\rightarrow\mathbb{R}  . Let’s define the function n: [0,1]^2\rightarrow\mathbb{R}^3  in a way that n(u,v)  is the normal vector of the surface at the point r(u,v)  . (This map is called the Gauss map of the surface r  .) From r, n  and h  we can define a new surface r_h  by offsetting the surface r  along its normal by the amount h  , that is,

r_h=r+hn  .

(This is the equation I use in the vertex shader to offset the vertices.) What we want is to find the normal vector of the surface r_h  . For that, we have to compute the tangent and bitangent vectors of this surface, and take the cross product of them. For the tangent vector, we have \partial_ur_h = \partial_ur + (\partial_uh)n + h\partial_un  . The terms look familiar, except for one: we haven’t seen the \partial_un  term yet. This term represents how the normal vector changes if we move along the tangent direction on the surface. It can be viewed as a second order term for describing the surface (the first order terms are the tangent and bitangent vectors), and can be computed from the first and second partial derivatives of the surface r  . Without going into too much details, I say that in differential geometry there is a thing called the shape operator (or Weingarten map) for a given surface. This operator tells us how the normal vector changes along a vector from the tangent plane on the surface, and this is exactly what we need. It can be represented by a 2×2 matrix W  which for the surface r  can be computed as

W = \begin{pmatrix} w_{11} & w_{21} \\ w_{12} & w_{22} \end{pmatrix} = \begin{pmatrix} \langle\partial_ur, \partial_ur\rangle &  \langle\partial_ur, \partial_vr\rangle \\  \langle\partial_vr, \partial_ur\rangle &  \langle\partial_vr, \partial_vr\rangle \end{pmatrix} ^{-1}\ast   \begin{pmatrix} \langle\partial_{uu}r, n\rangle &  \langle\partial_{uv}r, n\rangle \\  \langle\partial_{vu}r, n\rangle &  \langle\partial_{vv}r, n\rangle \end{pmatrix}

where \langle \cdot, \cdot\rangle  denotes the usual scalar product on \mathbb{R}^3  .

I mention that the shape operator is related to the curvature of the surface: its eigenvalues are the principal curvatures, its determinant (the product of the eigenvalues) are the Gaussian curvature, the half of its trace (the sum of the element in the main diagonal, or the sum of the eigenvalues) is the mean curvature of the surface.

For us, the important thing is that we have \partial_un = -w_{11}\partial_u r - w_{12}\partial_vr  . (Don’t care about the negative sign, let’s just accept it as a convention.) So for the tangent vector of the offset surface we have

\partial_ur_h =  \partial_ur + (\partial_uh)n + h(-w_{11} \partial_u r - w_{12}\partial_vr) = (1-hw_{11})\partial_ur -hw_{12}\partial_vr  + (\partial_uh)n

= \left[TBN\right] \begin{pmatrix}  1-hw_{11} \\  -hw_{12} \\  \partial_uh \end{pmatrix}  ,

where \left[TBN\right]  = (\partial_ur, \partial_vr, n)  is the usual (but not orthogonalized!) TBN matrix. Similarly, for the bitangent vector of the offset surface we have

\partial_vr_h =   \left[TBN\right] \begin{pmatrix}  -hw_{21} \\  1-w_{22} \\  \partial_vh   \end{pmatrix}  .

And actually, that’s it. As usually, the cross product of the tangent and bitangent vectors gives the unnormalized normal \hat{n}_h=  \partial_ur_h \times  \partial_vr_h  , and normalizing that we get the required normal vector n_h=\frac{\hat{n}_h}{\|\hat{n}_h\|}  .

Computation in practice

In order to do the above computation in the pixel shader, we need the following data

  • TBN matrix (at least the tangent and bitangent vectors),
  • the height map,
  • the gradient map (or indirectly, the normal map) of the height map,
  • the shape operator of the mesh.

The shape operator can be placed in the vertex data, next to the TBN matrix. Note, that even if the TBN matrix is not orthogonal, the normal vector is always perpendicular to the tangent vectors, and has unit length, so it can be computed on the fly, doesn’t have to be stored.

The shape operator is not large, it’s four floats. But anyway, is this the shape operator really necessary? As I said above, it can be seen as the second order terms of the surface, so in some way, they are less significant as the first order terms (the tangent and bitangent vectors). For the shapes in the video in the previous post I didn’t see too big difference between using the shape operator in the computation, or assuming it is zero. I saw only a little change in the rendered image when I switched on and off the shape operator in the shader (and hot-reloaded it). I want to do more experiments with it in the future, but it’s obvious from the above calculation, that the bigger the vertex offset (the values in the height map), the more significantly the shape operator influences the normal. Also, of course, the matrix of the shape operator depends on the actual surface, the more it curves, the bigger the values in the matrix are. For example, the shape operator of the plane (not surprisingly) is the zero matrix, and the shape operator of a sphere with radius r  is

\begin{pmatrix} 1/r & 0 \\ 0 & 1/r \end{pmatrix}  .

The case when TBN is orthogonal

Let’s finish with assuming that the TBN matrix is orthogonal, the shape operator is zero, and compute the normal in this case. Hopefully, we will get back the formula that is usually used for normal mapping.

If the shape operator is zero, we have

\partial_ur_h =   \left[TBN\right] \begin{pmatrix}  1 \\  0 \\  \partial_uh   \end{pmatrix}  , and

\partial_vr_h =   \left[TBN\right] \begin{pmatrix}  0  \\  1 \\  \partial_vh   \end{pmatrix}  .

And if TBN is orthogonal, we have

\hat{n}_h=\left(  \left[TBN\right] \begin{pmatrix}  1  \\  0 \\  \partial_uh   \end{pmatrix}  \right) \times \left(  \left[TBN\right] \begin{pmatrix}  0  \\  1 \\  \partial_vh   \end{pmatrix}  \right) =  \left[TBN\right]  \left( \begin{pmatrix}  1  \\  0 \\  \partial_uh   \end{pmatrix}  \times  \begin{pmatrix}  0  \\  1 \\  \partial_vh   \end{pmatrix} \right)

=  \left[TBN\right] \begin{pmatrix}  -\partial_uh  \\  -\partial_vh \\  1   \end{pmatrix}  .

Does this look familiar? I hope it does. The last term in the above equation is the unnormalized normal of the height map that we’ve seen before. That is what is usually stored in the normal map, and that’s what we multiply by the TBN matrix to get the normal vector of the normal mapped surface.

Normal Mapping (part 1)

I started a project recently in which I’m experimenting with some random stuff. I think it can be called a “handmade” project, currently I use only the C runtime library, Windows API, Direct3D12 and the AVX instruction set. (I don’t use COM objects or the d3dx12.h helper header.)

Anyway, I made a simple Perlin noise generator, and used it for creating height maps, and some images with “infinite detail” (that is, they can be zoomed into arbitrary many times). Let me call the latter ones simply just fractals in the following. I also mixed the two types, that is, I made a height map which is a fractal as well. Here is the link to the github repo:

I guess the code is in a very messy state, and it will definitely stay in this state for a while, since I don’t have any reason to make a proper interface and code clean-up, and anyway, it’s just the code. Below, there is a demonstrating video. Unfortunately the recording caused some frame rate issues and popping.

Currently the images are generated by the CPU, and for the fractals, they are constantly recomputed asynchronously and uploaded to the GPU when they are ready. The non-fractal, static images (height maps) and the meshes are generated at startup.

About the height mapping: after generating the height maps with Perlin noise, I computed the normal maps from them. I used the height map for offsetting the vertices in the vertex shader (yep, no nice enough tessellation yet, perhaps in the future), and used the normal map in the pixel shader to compute the proper normal. As you can see in the video, the height offsetting is relatively large, so it’s important for the normal mapping to be precise enough, that is, the tangent, bitangent vectors, and the scaling should work well. In the remaining of this post I describe briefly what normal mapping is, how it is usually done (more precisely, I describe what I’ve seen in other tutorials about how it is usually done). In the next post, I will derive the precise formula for how it should be done, at least in theory.


To make the geometry of a 3D object look more detailed, than the geometry defined by the vertices, we usually use normal maps. A normal map is just a texture whose value at a point is the more detailed version of the normal vector, expressed in the basis defined by the tangent, bitangent and the less detailed normal vector. The tangent vector encodes the direction and rate in which the first texture coordinate changes along the surface. The bitangent vector is the same but for the second texture coordinate. More precisely, if we have an r: A\rightarrow \mathbb{R}^3 function, which is the parametrization of our surface, where A  is some 2-dimensional subset of \mathbb{R}^2  , and also, we have a t: A\rightarrow [0,1]^2  function, which gives the texture coordinates (uv coordinates) of the surface; then the tangent and bitangent vectors are the first and second colums of the matrix D(r\circ t^{-1})  , respectively. (For a function f  , let’s denote its Jacobi matrix by Df  .) Note, that D(r\circ t^{-1}) = (Dr)(Dt^{-1})=(Dr)(Dt)^{-1}  .

Before going further, let’s take a well-known example. Let’s say that our surface is just a triangle, with vertices p_0, p_1, p_2  , and with correspondig texture coordinates t_0, t_1, t_2 . Then both r  and t  are just parametrizations of triangles, namely A is the triangle whose vertices are (0,0) , (0,1) and (1,0) ,

r(a,b) = (1-a-b)p_0+ap_1+bp_2  , and

t(a,b) = (1-a-b)t_0+at_1+bt_2  .

Hence Dr=(p_1-p_0, p_2-p_0)  and Dt=(t_1-t_0,t_2-t_0)  , so the tangent and bitangent vectors are the columns of the matrix (p_1-p_0, p_2-p_0)  (t_1-t_0,t_2-t_0) ^{-1}  . This expression is usually used to calculate the tangent and bitangent vectors for a triangulated mesh at the vertices p_0,p_1,p_2  . Of course, a vertex can be a vertex of more than one triangles (and usually is). In that case, for a vertex we can simply take the average of the tangent and bitangent vectors computed for the triangles containing that vertex. It’s important to note that the normal vector is always perpendicular to both tangent vector (after all, this is how the normal vector is defined), so we have to make the averaged tangent and bitangent vectors perpendicular to the normal vector.

It’s important to notice that there is no reason for the tangent and bitangent vectors to be orthonormal (have unit length and perpendicular to each other). But we should usually aim for that, since that would mean that the texture is mapped to the surface without any distortion. Since the mesh is usually fixed, this means that we should define the texture coordinate function t  in a way that the columns of the matrix (Dr)(Dt)^{-1}  are orthonormal. Intuitively, this just means that we have to unfold the surface, flatten it without any stretching. If that can be done, it’s great, but it’s usually not the case. Without going into details, I mention that for a smooth surface, such t exists only if the surface has zero Gaussian curvature, and that’s just simply not the case for most surfaces.

Despite that, (at least what I’ve seen in every tutorial about normal mapping) it is assumed that the TBN matrix (the matrix consisting of the tangent, bitangent, and normal vectors) is orthogonal. I guess the reason for this is that it’s usually a good enough approximation, but as we seen above, it can be good enough only if the texture coordinate function t  is chosen well enough. In this project, this was not the case, and I got horrible results when I orthogonalized the TBN matrix.

Another problem occurs when we want to scale the height map. For example, how should we change the normal vector, if we offset the vertices by the value of the height map multiplied by two? What I’ve seen so far, people usually use an empirical intensity value for the normal map, which is used to scale the coordinates of the normal map corresponding to the tangent and bitangent vectors (the first two coordinates). This, of course, makes sense, but only if there is no actual height map being used for the normal map, otherwise the scaling of the height map and the normal map has to be synchronized properly. Actually, synchronizing them is pretty easy, but I keep the explanation for the next post.

One thing before wrapping up: notice that we can assume that the parametrization of a given surface can be selected in a way that its domain is the texture coordinates, that is, we have r:[0,1]^2\rightarrow\mathbb{R}^3  , and the texture coordinates of the point r(u,v)  is just (u,v)  . This can be achieved by replacing r  with r\circ t^{-1}  . This simplifies things a little bit, since then, the tangent vector is \partial _u r = \frac{\partial r}{\partial u}  , and the bitangent vector is \partial _v r =  \frac{\partial r}{\partial v}  . In the next post, I will assume that this holds for r  .

Perspective correct interpolation

Reading this very useful series about the graphics pipeline I realized I had never thought about what is the proper way to interpolate between vertex attributes (outputted by the vertex, domain or geometry shader to the pixel shader) using the vertices of the triangle after perspective divison. I spent some time thinking on that, and although it’s not hard to figure out and the result is described in the link above, I thought I describe one possible way to get the solution.

Let’s assume we have a triangle before perspective divison with vertices (p_0, w_0), (p_1, w_1)  , (p_2,w_2)  with some corresponding vertex attributes c_0  , c_1  , c_2  (e.g. texture coordinates, color, normal). After perspective divison we have p'_0=\frac{p_0}{w_0}  , p'_1=\frac{p_1}{w_1}  , p'_2=\frac{p_2}{w_2}  . Let’s have p'=(1-t'_1-t'_2)p'_0+t'_1p'_1+t'_2p'_2  . The question is, how should we compute the proper vertex attribute c  corresponding to p'  ?

Let’s think backward: assume we have p=(1-t_1-t_2)p_0+t_1p_1+t_2p_2  . Then, of course, the vertex attribute corresponding to p  is simply c= (1-t_1-t_2)c_0+t_1c_1+t_2c_2  . Now, what is the point p'  we get from p  after perspective divison? Using the notation w =  (1-t_1-t_2)w_0+t_1w_1+t_2w_2  we have

p' =\frac{p}{w} \\ \quad =  \frac{(1-t_1-t_2)p_0+t_1p_1+t_2p_2 }{w} \\ \quad =\frac{(1-t_1-t_2)w_0}{w}\frac{p_0}{w_0}+ \frac{t_1w_1}{w}\frac{p_1}{w_1} +  \frac{t_2w_2}{w}\frac{p_2}{w_2} \\ \quad =  \frac{(1-t_1-t_2)w_0}{w}p'_0+ \frac{t_1w_1}{w}p'_1 + \frac{t_2w_2}{w}p'_2

Well, it’s easy to check that the sum of the coefficients in the above interpolation between p'_0  , p'_1  and p'_2  is 1  . So by setting t_1'= \frac{t_1w_1}{w}  and t_2'= \frac{t_2w_2}{w}  , we can write p'= (1-t'_1-t'_2)p'_0+t'_1p'_1+t'_2p'_2  .

And basically, we are done: for a point p'= (1-t'_1-t'_2)p'_0+t'_1p'_1+t'_2p'_2  , the corresponding vertex attribute is c=  (1-t_1-t_2)c_0+t_1c_1+t_2c_2  , where t_1  and t_2  can be computed from the two equations t_1'= \frac{t_1w_1}{w}  and t_2'= \frac{t_2w_2}{w}  , where w =  (1-t_1-t_2)w_0+t_1w_1+t_2w_2  .

Of course, we can solve for t_1  and t_2  explicitly (it’s just a system a linear equations with two variable), and we get

t_1 = \frac{t'_1w^{-1}_1}{\tilde{w}}  , and t_2 = \frac{t'_2w^{-1}_2}{\tilde{w}}  ,

where \tilde{w} = (1-t'_0-t'_1)w^{-1}_0 +  t'_1w^{-1}_1  +t'_2 w^{-1}_2  . We can go further by substituting t_1  and t_2  into the equation for c

c=  (1-t_1-t_2)c_0+t_1c_1+t_2c_2   \\ (1- \frac{t'_1w^{-1}_1}{\tilde{w}} - \frac{t'_2w^{-1}_2}{\tilde{w}} )c_0 +  \frac{t'_1w^{-1}_1}{\tilde{w}} c_1 +  \frac{t'_2w^{-1}_2}{\tilde{w}} c_2 \\ =   ((\tilde{w} -   t'_1w^{-1}_1  -  t'_2w^{-1}_2)c_0 +   t'_1w^{-1}_1 c_1 +  t'_1w^{-1}_2 c_2)/\tilde{w} \\= ((1-t'_1-t'_2)\frac{c_0}{w_0} + t'_1\frac{c_1}{w_1} + t'_2\frac{c_2}{w_2})/\tilde{w}

Therefore, to do the interpolation in practice, we have two main choices:

  • For given t'_1  and t'_2  values (that is, basically, for a given pixel inside the triangle) we calculate t_1  and t_2  , and use them to calculate all vertex attributes by a simple linear interpolation.
  • For all vertex attributes c_i  , we compute the corresponding \frac{c_i}{w_i}  values, and use them to calculate the vertex attributes directly from given t'_1  and t'_2  .

Of course, in both cases we have to compute \tilde{w}  as well.

Making unions the hard way

Here’s a story I think worth sharing. At the beginning I learned C++ programming mostly on my own by watching tutorials, CppCon videos reading posts and practicing. At some point (in my graphics engine project) I needed dynamic polymorphism, but I didn’t want to use OOP style inheritance, I just wanted to have a piece of memory that knows its type that can be changed dynamically, in a relatively convenient way. So I had this genius idea to make a struct containing an enum value for storing the current type, and a buffer for storing the data. What should the alignment and the size of this buffer be? To calculate that in “the most general way” in compile time I did the following. The possible types were specializations of a struct template, where the template parameter was the enum for the type. Then I used constexpr functions which iterated through the possible types, and calculated the required alignment and size for the buffer that could store them. Now it seems completely ridiculous.

Later I learned that there is this thing called union in C++… It could be a coincidence or it could be completely my fault, but it seems very unfortunate that I hadn’t met the union keyword in the sources I had learned C++ from, and learned it only pretty late . Namely because I think unions are very useful and I’ve been using them a lot.

Transforming the normal and tangent vectors

I have found some sources about how to transform normal vectors and why they transform in a way the do. These sources seemed to rely on the fact that the transformations we use are affine transformations (multiplying with a matrix and adding a vector). Using linear algebra and vector calculus, I would like to give a simple derivation of how the normal and tangent vectors are transformed by a general transformation.

Let f: \mathbb{R}^3\rightarrow\mathbb{R}^3 be a differentiable function, and M\subset\mathbb{R}^3 is a smooth surface. Let’s denote the image of M under f by N=f(M)\subset\mathbb{R}^3. If f is “nice enough” N is another “well-behaved” surface. (For example, you can keep in mind the usual case where f(x)=Ax+a, where a is a vector, and A is an invertible matrix. Then, f is “nice enough”.)

Transforming the tangent vector. The green curves are the \gamma and f\circ\gamma curves referenced in the text.

Let’s fix a point p\in M. Let’s denote the tangent plane of M at p by T_pM. Similarly, let q=f(p) and T_qN be the tangent plane of N at q. Let’s define first how the tangent vectors in T_pM are transformed by f to T_qN, so let’s make a T_pf: T_pM\rightarrow T_qN function from f in some meaningful way… It turns out that the most meaningful way for that is to do the following. Let’s fix a v\in T_pM vector. We can represent this vector with a small piece of curve \gamma in M, for which \gamma (0)=p and \gamma'(0)=v. Now let’s apply f to \gamma. Obviously, f(\gamma (0)) = q, so by taking the derivative of f\circ\gamma at 0, we get a vector w\in T_qN. And that’s what we wanted! That is T_pf(v). Let’s calculate it, using the chain rule for derivation

T_pf(v)=(f\circ\gamma)'(0)=Df(\gamma (0))\gamma '(0)=Df(p)v,

where Df(p) is the Jacobi matrix of f at p.

So we got that T_pf is a linear transformation, it is the multiplication by Df(p). For the case where f(x)=Ax+a we have Df(p) = A, therefore, the Jacobi matrix is constant, it is A for every p\in M.

Now let’s see how the normal vector is transformed. For simplicity, let’s use D=Df(p) in the following. First, what is the normal vector n_p of M at p? It is a (nonzero) vector perpendicular to T_pM. Similarly, the normal vector n_q of N at q is a (nonzero) vector perpendicular to T_qN.

First, let’s assume that D is invertible. In this case, for every vector v\in T_pM we can write

0 = n_p^Tv = n_p^T(D^{-1}D)v = (n_p^TD^{-1})(Dv) = (D^{-1^T}n_p)^T(Dv).

Since D is invertible, every vector w\in T_qN can be written in the form of w=Dv for some v\in T_pM. (Namely, for v = D^{-1}w). So we got that the vector D^{-1^T}n_p is perpendicular to T_qN (and nonzero), so it’s a normal vector of of T_qN. Therefore, we can say that the way f transforms the normal vector at a point p is by multiplying it with the inverse of the transpose of the Jacobi matrix of f at p, if this matrix exists. If D is on orthogonal matrix, then of course, we have D^{-1^T}=D.

What if D is not invertible? This roughly means that f as a function from \mathbb{R}^3 to \mathbb{R}^3 collapses some (one or more) directions into zero at p. If these are not the direction in T_pM, we can still define the normal vector, but I won’t go into details now. Otherwise, we can’t, because actually, in this case T_qN is ill-defined, and there is probably a crack in N at q.

On premultiplied alpha

There seem to be many sources about premultiplied alpha out there, so here’s another one. 🙂 This one makes the most sense to me, and I couldn’t find this explanation in this form elsewhere. (Although I haven’t done too much research.)

When blending images together it’s useful to think about them as filters, functions, applied to the colors already in the framebuffer. An image
A = (c, t) can be viewed as a transparent light source where c=(r, g, b) is the emittance and t=1-a is the transparency. Then A applied to a color x results in the color A(x)=c+tx, that is, it absorbs some light, and adds to the remaining light its own emittance value . This is just the blend equation using premultiplied alpha. Because of that, premultiplied alpha seems a more physically plausible blending, than non-premultiplied alpha.

Let’s see how should we blend together two images. it’s actually quite simple and natural. Let’s have two images A = (c_A, t_A) and B = (c_B,t_B) an apply them to the color x. We get

(A\circ B)(x) = A(B(x))=A(c_B + t_Bx)=c_A+t_A(c_B+t_Bx)=(c_A+t_Ac_B)+t_At_Bx.

So for the image C=(c_A+t_Ac_B, t_At_B) we have C=A\circ B. This shows how to blend images together. Expressing the transparencies with the alpha channels, we got the blend equation for the alpha channel

a_C = 1-t_C=1-t_At_B=1-(1-a_A)(1-a_B) = a_A+ (1-a_A)a_B.

Associativity is usually a required property of blending equations, which is the property that for each images P,Q,S we have (P\circ Q)\circ S=P\circ (Q\circ S). The blending above is by definition associative, since it was defined simply by function composition, which is always associative.