In the last few months I’ve been working on a rendering engine capable of rendering an entire planet. I’ve seen many solutions for procedural planet rendering (generating the height map on the fly), so I was focusing on a different problem: how to render a planet using already existing elevation data, and which is the most efficient way to store that data at large scale.
Sources
- For development/debug/demonstration purposes I’ve been using the elevation data of the Earth available at http://viewfinderpanoramas.org/dem3.html. I use the 3 arc second version which has full coverage.
- For texturing, I use the images available at https://www.h-schmidt.net/map/.
- For loading the textures in jpeg I use Sean Barrett’s stb_image.h from https://github.com/nothings/stb.
- For rendering the atmosphere I implemented the algorithm described in the article https://hal.inria.fr/inria-00288758/document. (I haven’t implemented everything from here, for example there are no light shafts yet.)
Performance
I measured the performance on my (at this point, quite weak) laptop:
CPU: i7-6700HQ 2.6 GHz
GPU: NVIDIA Geforce GTX 950M, 4 GB
HDD: 1TB, 7200 RPM
First, some words about the rendering: the geometry is divided into meshlets (triangle batches containing 64 triangles) to make the pipeline suitable for mesh shaders, and to allow further optimizations like meshlet-culling which is not implemented yet. Currently, there is space allocated for 16384 meshlets, and the “culling” shader doesn’t do much, only decides which meshlet slots are active, and fills the indirect command arguments accordingly.

I measured the GPU performance using PIX. Rendering the above frame (during movement) takes about 11 ms. Most of it (~9.5 ms) is spent in the ExecuteIndirect call that draws the planet with a simple forward rendering method, at the moment. Sky rendering is ~1 ms. The remaining 0.5 ms is for resource barriers and generating new requested meshlets (vertices, normals, uv coords).
The tessellation, lod selection is done by the CPU, on the main thread. I measured the lod update with _rdtsc(). The results were unfortunately quite unstable, but it’s worth noting that the measured cycles were almost always below 1.5 million cycles, even when travelling with high speed close to the ground. When the camera is way above the ground, like in the beginning of the video, the measured values are around 100-200 thousand cycles.
It’s important to note that the loading of the height data from file to gpu memory happens on a different thread, only the requests get issued on the main thread.
Tools
Beside the main program, I made some tools for various purpose, I’ll describe them briefly.
- Shader Compiler: putting together a graphics (even a compute) pipeline in D3D12, allocating descriptor tables, setting the descriptors, binding the descriptor tables, setting root constants could be a pain in the ass. It’s not the worst thing in the world, but it’s better to have something that fprintfs some C++ code based on a short description of the layout, rasterizer settings, render targets, depth-stencil state, struct definitions in hlsl, etc… That’s what this tool does. Also, it compiles the HLSL code using D3DCompile(). Every other program that has shaders uses this tool to compile them.
- Height Generator: this tool converts the raw elevation data organized in HGT files to the format I use directly for rendering. After developing, and debugging it, it took ~24 hours to generate the necessary data with it.
- Sky Generator: this tool is for generating the textures required for sky rendering and aerial perspective.
Design Philosophy, Coding
Working on this project the main focus was on designing algorithms, data structures while taking account the capabilities of the hardware. So the focus was not on coding, in fact I wanted to avoid code-related design issues at all. For this I had two major “rules”:
- Making sure that iteration is as fast as possible, that is, I can change algorithms, data structures, hard-coded constants quickly without thinking about the code and waiting for compilation too much.
- Writing easily understandable code, that is, by looking at the code it should be relatively easy to figure out what the hardware does when runs that code, what data it loads, how it modifies it, and what data it writes.
In order to achieve these, I don’t use templates, STL, the concept of RAII, or any third party library beside the Win32 API, D3D12, XInput, stb_image.h, and some standard C functions (math functions, functions for string manipulations, memcpy). I use lot’s of raw pointers, relative pointers and plane old data structures with global functions. I have only one compilation unit per program which can be built by running a batch file. For text-editor and debugger I use Visual Studio 2017.
With these concepts, I can compile the main program and the tools from scratch in about 2-3 second per program, using -Od (with -O2 it takes only slightly longer).