Tech Notes
These are notes about the technology in DarkPlaces engine, not really a blog, more a spillage of techniques and ideas that may be of interest to engine programmers, and may also answer some questions about how the engine works for those who are interested.
Rendering Process - DarkPlaces walks the scene in multiple different ways each time it renders (rather than using a SceneGraph - read Tom Forsyth's article Scene Graphs - just say no for a good explanation of why they are undesirable).
The basic flow is:
- -- Setup --
- The screen is cleared.
- The visible set of surfaces (geometry in the world model) and BSP leafs (convex regions of the world model) are determined by recursive portal flow through the world model's BSP leafs by the function R_View_WorldVisibility.
- if High Dynamic Range Bloom is enabled (r_hdr console variable), a scene is rendered at lower resolution and copied to texture, then blurred by two 1D blur passes, and saved for later use (the bloom texture).
- -- Scene Render --
- if any sky surfaces are visible, a second scene (the sky) is rendered and the depth buffer cleared afterward, to prepare for rendering of the main scene, and then all the sky surfaces are rendered into the depth buffer (protecting the sky from being overwritten by other geometry that is further away than the sky surfaces), this depth protection of sky surfaces is done so that the Quake1 levels (which expected sky to occlude other geometry) are rendered correctly (one example of this behavior is the Quake logo on the floor in e1m5 over a teleporter, which was done by the level designers using a Quake logo encased in a sky brush, so the lighting passed through the sky and was blocked by the Quake logo, but rendering could not see the Quake logo, only its shadow on the floor below).
- if r_depthfirst console variable is enabled, all world surfaces are rendered into the depth buffer (and all later rendering will not write depth), eliminating redundant shading of pixels.
- The world surfaces are rendered using their diffuse texture and lightmaps (and optionally deluxemap texturing - a texture similar to a lightmap that stores primary light direction, which enables per pixel bumpmapping and specular lighting), their lightmaps may be dynamically animated (Quake flickering/switchable lights), and the brightness of this entire rendering pass is multiplied by the console variable r_hdr_scenebrightness, additionally fogging may be applied.
- The dynamic model entities (objects) in the world are rendered in a similar way, using lighting conditions queried at their matrix origin rather than lightmaps.
- Each dynamic light source is processed in turn, each one rendered using the following steps:
- Calculate the region of the framebuffer that can be affected by the light source, and set it as a glScissor rectangle (avoiding any rendering outside this region).
- If the light source has shadows enabled, clear the stencil buffer (this only affects the scissored rectangle).
- If the light source has shadows enabled, render shadow volumes describing the shadowed regions of the scene from the world model (in this process the surfaces of the world are treated as one single mesh, only the triangles that are visible to the light source will be used - unless the r_shadow_frontsidecasting 0 setting is used in which case all triangles facing away from the light source are used instead), as well as rendering shadow volumes from dynamic models, these create non-zero values in the stencil buffer where pixels are inside shadow volumes.
- A loop is run over the world model surfaces within the area of effect of the light, as well as dynamic models, rendering their geometry with an OpenGL Shading Language shader (or other methods on older hardware) to perform per pixel lighting on the relevant pixels, this is done using an additive blend mode (so that multiple lights are accumulated onto the pixels), and is masked by the stencil buffer if shadowing was enabled on this light source (the stencil buffer preventing it from rendering light on shadowed pixels).
- Transparent surfaces are rendered in sorted order from back to front, these include water and other geometry from the world, as well as particles and lightning beams and other effects, if several transparent surfaces are encountered with the same type of rendering (surfaces belonging to a model, or particle, or lightning beam, etc) they are combined into one array of surface numbers and passed to the rendering function for faster rendering (particles in particular tend to be in groups of hundreds of particles in a row without interruption, so they can be rendered more quickly together).
- A full screen rectangular polygon may be rendered over the view if necessary (recent damage causing a red pain flash, or an underwater view coloration).
- If High Dynamic Range Bloom is not enabled but Bloom is enabled (r_bloom), the screen is copied to a texture and reduced in size, and its brightness modified in multiple ways, and then blurred, and copied to a new texture (the bloom texture).
- If High Dynamic Range Bloom or Bloom is enabled (r_hdr or r_bloom), the bloom texture is rendered onto a full screen rectangular polygon, using an additive blend (accumulating the glow effect known as bloom, which appears around bright pixels).
- -- 2D rendering --
- if the hud is enabled (viewsize less than 120), the appropriate numbers and images are drawn onto the screen, similarly crosshair, showfps, showtime, showdate, r_speeds, scoreboard, and centerprint (messages sent by the server recently to announce things to the player).
- if the menu is currently active, it is rendered over the view.
- if the console is currently active, it is rendered over the view and menu and anything else.
Culling methods
- Potentially Visible Set - a bit array indicating whether each leaf (convex region) of the world model is to be rendered or not (this is compiled by the vis utility during map compilation, it is determined by checking if each portal polygon intersects a view frustum created from another pair of portals - basically imagine that a room is looking outward through a portal, it sees another portal which restricts its view of the outside world beyond that portal, and that restricted view is intersected with another portal beyond that to determine if a leaf beyond it can be seen). The world model contains one of these pvs bit arrays for each leaf, indicating whether it can see every other leaf.
- BSP recursion - for dynamic light sources to determine the affected world geometry, BSP recursion is utilized, additionally leafs may be enabled or disabled by the PVS array (looked up at the light position) as well as the view pvs (looked up at the viewer position), and the two frustums (view area, and shadow-casting area - that one is hard to explain).
- Surface PVS - this is refreshed each frame by R_View_WorldVisibility, it is a bit array indicating whether each surface (renderable primitives, such as a polygon, triangle mesh, or quadratic bspline patch) can be seen from the current position (for details of how, see the code).
- View Frustum - the 4 side planes (top, bottom, left, and right) of the view are utilized for basic plane side checks, if an object is entirely behind any one of these planes it is not rendered (with the exception of shadow volumes - because they may be cast from an off-screen location onto the visible view area).
- Shadow Frustum - if a light source is within the View Frustum, the Shadow Frustum is the same as the View Frustum, however if the light is outside the View Frustum, the Shadow Frustum consists of 5 planes formed by the 5 lines involved (the 4 corners of the view, and the light source position), this describes the region of space that an object can be inside to be eligible for shadow casting (anything outside those 4 or 5 planes in the Shadow Frustum is incapable of casting a shadow onto the visible area, and is thus skipped).
Surface rendering tricks
All geometry in each model is merged into one set of vertex arrays (non-interleaved) and uploaded in a Vertex Buffer Object, this minimizes state changes during the rendering of the world model, as surfaces are simply ranges of triangles to render, additionally multiple consecutive surfaces with the same texture are rendered with a single call (since their triangle range can be combined easily), this substantially reduces driver overhead in higher poly scenes.
Additionally, any texture scrolling effects requested by a Quake3 shader file are usually performed using glMatrixMode(GL_TEXTURE), rather than being done manually on the cpu as Quake3 did, other Quake3 shader effects remain cpu-based however (but are infrequently used - things like wavey water surfaces, and autosprite billboards).
Skeletal model rendering tricks
There are two different approaches to skeletal animation used in skeletal model file formats, one is based on each vertex storing an array of weights with vertex positions relative to the bone the weight references (this method is used by zym, dpm, and assorted id Software formats such as md4, mds, md5mesh), this method consumes more memory (and seems to be slightly slower) than Matrix Palette Skinning in which the base mesh is deformed by matrices multiplied by their basepose inverse (meaning all the weights take the same vertex position as input, and simply move it to where it should be), since Matrix Palette Skinning performs better, DarkPlaces engine converts the other type to it at load (which is quite trivial - simply rebuild the base mesh from the weights and basepose of the skeleton, and then only the influence and bone index has to be stored in each weight, discarding the relative position and normal data).
DarkPlaces engine model loaders encode all skeletal weight data as 8 bytes per vertex (two 4 byte per vertex arrays, one storing the influence values renormalized from 0 to 1, to 0 to 255, and the other stores bone indices since most models don't have more than 256 bones), this seems to help with cache performance in the cpu-based skeletal animation.
This format is also mostly suitable for hardware skeletal animation, however harsh limits on number of bones (about 30 on lower end hardware) make it impractical because the model would have to be sliced up and given multiple matrix palettes (each within limits of the hardware) so it has not been implemented (NVIDIA GeForce 6 hardware or ATI Radeon X2x00 series and above are capable of using texture reads in the vertex shader, which would avoid this limitation).
Adding additional complications to the use of hardware skeletal animation, stencil shadow volumes require a software transformed version of the mesh to build triangles from (even if the actual projection of the vertices may be done in hardware - indeed it would have to be, otherwise the skeletal animation may not match up exactly and cause zfighting between a software shadow volume and a hardware lighting pass).
Stencil shadow volume notes
First I would like to dispel a common rumor about stencil shadow volumes requiring specially made models - this is false, as long as the shadow volume is constructed from only lit or only dark faces of the model (meaning triangles duplicated, flipped, and stretched out to the protection distance), there is no need for sealed models, I do not see any particular benefit from other methods (building a volume from both front and back faces, stretched differently) - but if someone wishes to enlighten me on benefits of this approach I welcome the feedback.
Secondly on the subject of edge lists (also commonly used for possibly erroneous reasons) I highly recommend using a "triangle neighbors" array instead - each edge of a triangle can store an index to the neighboring triangle that shares the edge, or -1 if there is no neighbor (or if the edge is shared by 3 or more triangles - yes this does happen on a few models, it must be handled by storing -1), this format is very simple to understand and very practical, in particular you can get better performance by scanning edges of caster triangles than by scanning a separate edgelist (which may have many edges entirely in the non-casting regions of the model).
The DarkPlaces shadow volume builder operates in a multi-stage process:
- A BSP recursion function identifies surfaces touching the light boundary and in the light PVS (except if r_shadow_frontsidecasting is 0), and identifies each triangle that is touching the light boundary and facing toward the light (or away if r_shadow_frontsidecasting is 0), the indices of these triangles are combined in an array. (There are two alternative functions that are used for compiling static light sources, one uses portal recursion clipping, and the other uses Shadow Volume BSP building and querying)
- A byte array which indicates whether each triangle is a caster or not is cleared to 0.
- A loop over all casting triangles in the supplied array sets their casting flags to 1.
- A remapping array for vertex numbers is cleared.
- A loop over all casting triangles in the supplied array checks if each referenced vertex of the triangle is already used, if it is not then two additional vertices (one at its original position and one at the projected position) are added to the output buffer and the remapping table is set accordingly. (This part of the process could be done in hardware using a vertex shader)
- The same loop also produces two triangles, one is the original triangle (with vertex indices remapped) and the other is a flipped copy using the corresponding projected vertices.
- The same loop also checks if each triangle neighbor is set as a caster, any edge without a caster on the far side produces two additional triangles (a quad) between this edge on the two new triangles produced. (This seals the sides of the shadow volume)
Collision detection method
To be written.
DP7 network protocol concepts
To be written.
Feedback.
Email me (details on the email page) if you have ideas for additional sections or questions about other technical details missing on this page.