Back to the main page of this topic.
Before discussing the rendering pipeline, I would like to talk about the offline computations required to generate the necessary data for the renderer. These computations are done by a separate program I wrote which generates the sky maps, the terrain height, normal and texture mask maps, and a white noise map. The white noise map is just for the FFT used for real-time ocean rendering.
To generate the required sky, I did some research in atmospheric (and more generally, volume) scattering, rendering. I found some sources about this topic, for example this, this and this papers. Based on these papers I generated two 3D textures by simulating multiple scattering. The first texture contained the results of Mie scattering (wavelength-independent scattering caused by dust and other non-molecular particles), the second texture contained the results of Rayleigh scattering (wavelength-dependent scattering caused by the nitrogen and oxygen molecules).The three texture coordinates correspond to the height (distance from the center of the Earth), the angle between the view direction and the zenit, and the angle between the sun direction and the zenit; where the zenit is the up-vector. A 2D texture, the transmittance map was also generated which is used to compute the attenuation caused by the atmosphere of the light coming from a light source in outer space (e.g. from the sun).
I generated the terrain height and normal maps using Perlin noise. The texture mask is just 32 bit values stored in the alpha channel of the normal map. It encodes four 8 bit values representing the weight values of the materials used for terrain rendering. Of course, they could encode five material, with the assumption that the sum of the weights has to be one, or perhaps even more materials using less bits per weight, but four was enough to me. The weights at a position are generated with a simple rule, by blending between the steepness and the height of the position.
The rendering pipeline
I read this GTA V graphics study which helped me putting together the basics of the rendering pipeline. According to my memory and the github repo, these are the main steps, in order:
- The mip levels of the height map, normal map and texture mask of the terrain are stored in sparse textures (using the sparse texture support of Vulkan), and fetched from disk when they are needed. Using a compute shader which runs for every tile of the terrain, I figure out which mip levels are required for drawing the given tile of the terrain, based on the distance between the tile and the camera. The shader can write a request to a buffer (atomically): if the required mip level is not available it writes an allocation request, if the available biggest mip level is not required, it writes a deallocation request, otherwise it doesn’t do anything. Later (on the next frame) the CPU reads the requests and executes them. So… Why is this computed on the GPU, since there are definitely not that many tiles? Well, in this project computing this on the GPU while the result has to be processed by the CPU is not worth it, just makes things more complicated than they should be. But again, this was a learning project, and computing this on the GPU made me think about proper synchronization and communication between the GPU and CPU. So I guess it’s acceptable.
- Based on Tessendorf’s work, I use another compute shader to generate the height and gradient map of the ocean from a periodic time parameter, wind vector, and a texture containing white noise, using Fast Fourier Transform (FFT). Of course, for performance reasons I didn’t do one 2D FFT, I did lots of 1D FFT-s, for every row, and after that, for every column. I remember, when implementing FFT, I found the indexing a little bit tricky… The most convenient way would have been to rearrange the elements in the arrays containing the partial results after every iteration in the FFT. I didn’t want to do that, so I made the indexing a little bit trickier. I don’t remember it exactly, maybe it was completely trivial, maybe I’ll check it later. About the graphics… It’s definitely not the most realistic shaped water I’ve ever seen, I haven’t spent time tweaking parameters to make it look better and I didn’t implemented Tessendorf’s paper completely: I didn’t implemented choppy waves, and did the shading differently. But it was a very interesting topic, I would like to dive into the details sometime.
- For shadows, I used cascaded shadow mapping. I divided the view frustum into four smaller frustums, and generate four shadow maps accordingly. I used layered rendering to render into the four shadow maps simultaneously for some basic optimization. It can be seen in the video that just like in case of the water, I didn’t spent too much time tweaking the parameters: choosing a proper shadow map size and the relative size of the four smaller frustums.
- Next, I assemble the G-buffer by rendering the opaque objects and the terrain. (The terrain is rendered using tessellation shaders to get the desired resolution of the tiles.) The G-buffer consists of the following 4 RGBAF32 textures: (position3, roughness1), (basecolor3, screen-space ambient occlusion1), (metalness1, 0, 0, screen-space shadow1), (normal3, ambient occlusion of the material1). The screen-space ambient occlusion and screen-space shadow values are filled later in the pipeline. There is nothing special happening here, after a parallax occlusion algorithm, and a Gram-Schmidt orthogonalization of the TBN=(tangent, bitangent, normal) matrix (which is maybe an overshot…), it’s basically just some texture-sampling. Looking at the shader now I remember having some performance issues with the parallax occlusion mapping, which I guess makes sense, seeing the while loop in the shader… Beside that, it looks like I didn’t use sRGB textures, just the pow(. , 2.2) function.
- Next, I compute the screen-space shadow map values. To do this, I use the position and the normal textures of the G-buffer and the four shadow maps. To get the position and normal vector, I use the subpass input feature in Vulkan, which means that I can use the output of the previous subpass (the G-buffer generator) directly, without going through texture sampling, because the required values are already on the core, it simply stayed there from the previous pass. At least, as I understood it from the docs, this would be the essence of subpass inputs. I hope this is kind of what is really happening on the hardware.
- After this, I blur the generated screen-space shadow map, and write it to the screen-space shadow map slot of the G-buffer.
- I repeat step 5 and 6 for the screen-space ambient occlusion map. It seems I didn’t use rotating and/or random kernels, just iterated through the 5×5 neighborhood of each pixels. I guess it seemed decent enough.
- Next, I generate the main image from the G-buffer. I use the PBR model described in the PBR section of https://learnopengl.com. The final color of a pixel is computed by applying aerial perspective effect (things in the distance look more blueish) to the sum of the three colors of the object caused by the sun, the sky, and the ambient light). Looking at the shader now, it seems the computation of the color caused by the light is flawed: to determine the skylight, I sample the sky at the direction of the reflected vector (which is, reflecting the view vector to the normal vector), which seems wrong. I guess it would be better to sample some irradiance map generated from the sky at the normal direction.
- Next, I draw the sky and a sun. The sky is simply drawn using a cube at infinity, and sampling the sky maps. The sun is just a disk.
- Next, I generate the refraction image for the ocean rendering, by basically copying those pixels of the main image (the image assembled in the previous step), which are below sea-level, that is, whose y coordinate is negative. This is done by drawing the xz plane with depth testing on.
- After that, I draw the ocean. It is drawn in tiles, every tile corresponds to the ocean height map, generated in step 2, causing the undesired repetitiveness of the ocean. I used tessellation shaders to control the resolution of the tiles, based on their distance from the camera. The computation of the color is pretty messy, but the main idea is simple: compute the reflected color, the refracted color (coming from the water), add them together and apply the aerial perspective effect. One major problem is that to find the point where the refraction image has to be sampled for realistic refraction, the geometry of the bottom of the ocean has to be known. We have this data in the G-buffer, but finding the required point would require lots of texture sampling along a ray. To avoid that, I simply assumed that the bottom is flat between the refracted ray and the non-refracted ray coming from the camera. It’s definitely not the best solution, the refraction caused by the ocean looks weird in the video.
- In the end, I add some bloom effect, using a Gaussian blur (visible around the sun), and apply tone mapping. Under “apply tone mapping” I mean applying a function called Uncharted 2 tone mapping that I found on the net somewhere… I didn’t know much about tone mapping, and unfortunately I still don’t. I should learn more about it some day.