Tuesday, March 14, 2017

Stingray Renderer Walkthrough #8: stingray-renderer & mini-renderer

Stingray Renderer Walkthrough #8: stingray-renderer & mini-renderer

Introduction

In the last post we looked at our systems for doing data-driven rendering in Stingray. Today I will go through the two default rendering pipes we ship as templates with Stingray. Both are entirely described in data using two render_config files and a bunch of shader_source files.

We call them the “stingray renderer” and the “mini renderer”

Stingray Renderer

The “stingray renderer” is the default rendering pipe and is used in almost all template and sample projects. It’s a fairly standard “high-end” real-time rendering pipe and supports the regular buzzword features.

The render_config file is approx 1500 lines of sjson. While 1500 might sound a bit massive it’s important to remember that this configuration is highly configurable, pretty much all features can be dynamically switched on/off. It also run on a broad variety of different platforms (mobile -> consoles -> high-end PC), supports a bunch of different debug visualization modes, and features four different stereo rendering paths in addition to the default mono path.

If you are interested in taking a closer look at the actual implementation you can download stingray and you’ll find it under core/stingray_renderer/renderer.render_config.

Going through the entire file and all the implementation details would require multiple blog posts, instead I will try to do a high-level break down of the default layer_configuration and talk a bit about the feature set. Before we begin, please keep in mind that this rendering pipe is designed to handle lots of different content and run on lots of different platforms. A game project would typically use it as a base and then extend, optimize and simplify it based on the project specific knowledge of the content and target platforms.

Here’s a somewhat simplified dump of the contents of the layer_configs/default array found in core/stingray_renderer/renderer.render_config in Stingray v1.8:

// run any render_config_extensions that have requested to insert work at the insertion point named "first"
{ extension_insertion_point = "first" }

// kick resource generator for rendering all shadow maps
{ resource_generator="shadow_mapping" profiling_scope="shadow mapping" }

// kick resource generator for assigning light sources to clustered shading structure
{ resource_generator="clustered_shading" profiling_scope="clustered shading" }

// special layer, only responsible for clearing hdr0, gbuffer2 and the depth_stencil_buffer
{ render_targets=["hdr0", "gbuffer2"] depth_stencil_target="depth_stencil_buffer" 
    clear_flags=["SURFACE", "DEPTH", "STENCIL"] profiling_scope="clears" }      

// if vr is supported kick a resource generator laying down a stencil mask to reject pixels outside of the lens shape
{ type="static_branch" platforms=["win"] render_settings={ vr_supported=true }
    pass = [
        { resource_generator="vr_mask" profiling_scope="vr_mask" }
    ]
}

// g-buffer layer, bulk of all materials renders into this
{ name="gbuffer" render_targets=["gbuffer0", "gbuffer1", "gbuffer2", "gbuffer3"] 
    depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="gbuffer" }

{ extension_insertion_point = "gbuffer" }

// linearize depth into a R32F surface
{ resource_generator="stabilize_and_linearize_depth" profiling_scope="linearize_depth" }

// layer for blending decals into the gbuffer0 and gbuffer1
{ name="decals" render_targets=["gbuffer0" "gbuffer1"] depth_stencil_target="depth_stencil_buffer" 
    profiling_scope="decal" sort="EXPLICIT" }

{ extension_insertion_point = "decals" }

// generate and merge motion vectors for non written pixels with motion vectors in gbuffer
{ type="static_branch" platforms=["win", "xb1", "ps4", "web", "linux"]
    pass = [
        { resource_generator="generate_motion_vectors" profiling_scope="motion vectors" }
    ]
}

// render localized reflection probes into hdr1
{ name="reflections" render_targets=["hdr1"] depth_stencil_target="depth_stencil_buffer" 
    sort="FRONT_BACK" profiling_scope="reflections probes" }

{ extension_insertion_point = "reflections" }

// kick resource generator for screen space reflections
{ type="static_branch" platforms=["win", "xb1", "ps4"]
    pass = [
        { resource_generator="ssr_reflections" profiling_scope="ssr" }
    ]
}

// kick resource generator for main scene lighting
{ resource_generator="lighting" profiling_scope="lighting" }
{ extension_insertion_point = "lighting" }

// layer for emissive materials
{ name="emissive" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer" 
    sort="FRONT_BACK" profiling_scope="emissive" }

// kick debug visualization
{ type="static_branch" render_caps={ development=true }
    pass=[
        { resource_generator="debug_visualization" profiling_scope="debug_visualization" }
    ]
}

// kick resource generator for laying down fog 
{ resource_generator="fog" profiling_scope="fog" }

// layer for skydome rendering
{ name="skydome" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer" 
    sort="BACK_FRONT" profiling_scope="skydome" }
{ extension_insertion_point = "skydome" }

// layer for transparent materials 
{ name="hdr_transparent" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer" 
    sort="BACK_FRONT" profiling_scope="hdr_transparent" }
{ extension_insertion_point = "hdr_transparent" }

// kick resource generator for reading back any requested render targets / buffers to the CPU
{ resource_generator="stream_capture_buffers" profiling_scope="stream_capture" }

// kick resource generator for capturing reflection probes
{ type="static_branch" platform=["win"] render_caps={ development=true }
    pass = [
        { resource_generator="cubemap_capture" }
    ]
}

// layer for rendering object selections from the editor
{ type="static_branch" platforms=["win", "ps4", "xb1"]
    pass = [
        { type = "static_branch" render_settings={ selection_enabled=true }
            pass = [
                { name="selection" render_targets=["gbuffer0" "ldr1_dev_r"] 
                    depth_stencil_target="depth_stencil_buffer_selection" sort="BACK_FRONT" 
                    clear_flags=["SURFACE" "DEPTH"] profiling_scope="selection"}
            ]
        }
    ]
}

// kick resource generators for AA resolve and post processing
{ resource_generator="post_processing" profiling_scope="post_processing" }
{ extension_insertion_point = "post_processing" }

// layer for rendering LDR materials, primarily used for rendering HUD and debug rendering
{ name="transparent" render_targets=["output_target"] depth_stencil_target="stable_depth_stencil_buffer_alias" 
    sort="BACK_FRONT" profiling_scope="transparent" }

// kick resource generator for rendering shadow map debug overlay
{ type="static_branch" render_caps={ development=true }
    pass = [
        { resource_generator="debug_shadows" profiling_scope="debug_shadows" }
    ]
}

// kick resource generator for compositing left/right eye
{ type="static_branch" platforms=["win"] render_settings={ vr_supported=true }
    pass = [
        { resource_generator="vr_present" profiling_scope="present" }
    ]
}

{ extension_insertion_point = "last" }

So what we have above is a fairly standard breakdown of a rendered frame, if you have worked with real-time rendering before there shouldn’t be much surprises in there. Something that is kind of cool with having the frame flow in this representation and pairing that with the hot-reloading functionality of render_configs, is that it really encourages experimentations: move things around, comment stuff out, inject new resource generators, etc.

Let’s go through the frame in a bit more detail:

Extension insertion points

First of all there are a bunch of extension_insertion_point at various locations during the frame, these are used by render_config_extensions to be able to schedule work into an existing render_config. You could argue that an extensions system to the render_configs is a bit superfluous, and for an in-house game engine targeting a specific industry that might very well be the case. But for us the extension system allows building features a bit more modular, it also encourages sharing of various rendering features across teams.

Shadows

// kick resource generator for rendering all shadow maps
{ resource_generator="shadow_mapping" profiling_scope="shadow mapping" }

We start off by rendering shadow maps. As we want to handle shadow receiving on alpha blended geometry there’s no simple way to reuse our shadow maps by interleaving the rendering of them into the lighting code. Instead we simply gather all shadow casting lights, try to prioritize them based on screen coverage, intensity, etc. and then render all shadows into two shadow maps.

One shadow map is dedicated to handle a single directional light which uses a cascaded shadow map approach, rendering each cascade into a region of a larger shadow map atlas. The other shadow map is an atlas for all local light sources, such as spot and point lights (interpreted as 6 spot lights).

Clustered shading

// kick resource generator for assigning light sources to clustered shading structure
{ resource_generator="clustered_shading" profiling_scope="clustered shading" }

We separate local light sources into two kinds: “simple” and “custom”. Simple lights are either spot lights or point lights that don’t have a custom material graph assigned. Simple light sources, which tend to be the bulk of all visible light sources in a frame, get inserted into a clustered shading acceleration structure.

While simple lights will affect both opaque and transparent materials, custom lights will only affect opaque geometry as they run a more traditional deferred shading path. We will touch on the lighting a bit more soon.

Clearing & VR mask

// special layer, only responsible for clearing hdr0, gbuffer2 and the depth_stencil_buffer
{ render_targets=["hdr0", "gbuffer2"] depth_stencil_target="depth_stencil_buffer" 
    clear_flags=["SURFACE", "DEPTH", "STENCIL"] profiling_scope="clears" }      

// if vr is supported kick a resource generator laying down a stencil mask to reject pixels outside of the lens shape
{ type="static_branch" platforms=["win"] render_settings={ vr_supported=true }
    pass = [
        { resource_generator="vr_mask" profiling_scope="vr_mask" }
    ]
}

Here we use the layer system to record a bind and a clear for a few render targets into a RenderContext generated by the LayerManager.

Then, depending on if the vr_supported render setting is true or not we kick a resource generator that marks in the stencil buffer any pixels falling outside of the lens region. This resource generator only does something if the renderer is running in stereo mode. Also note that the branch above is a static_branch so if vr_supported is set to false the execution of the vr_mask resource generator will get eliminated completely during boot up of the renderer.

G-buffer

// g-buffer layer, bulk of all materials renders into this
{ name="gbuffer" render_targets=["gbuffer0", "gbuffer1", "gbuffer2", "gbuffer3"] 
    depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="gbuffer" }

{ extension_insertion_point = "gbuffer" }

// linearize depth into a R32F surface
{ resource_generator="stabilize_and_linearize_depth" profiling_scope="linearize_depth" }

// layer for blending decals into the gbuffer0 and gbuffer1
{ name="decals" render_targets=["gbuffer0" "gbuffer1"] depth_stencil_target="depth_stencil_buffer" 
    profiling_scope="decal" sort="EXPLICIT" }

{ extension_insertion_point = "decals" }

// generate and merge motion vectors for non written pixels with motion vectors in gbuffer
{ type="static_branch" platforms=["win", "xb1", "ps4", "web", "linux"]
    pass = [
        { resource_generator="generate_motion_vectors" profiling_scope="motion vectors" }
    ]
}

Next we lay down the gbuffer. We are using a fairly fat “floating” gbuffer representation. By floating I mean that we interpret the gbuffer channels differently depending on material. I won’t go into details of the gbuffer layout in this post but everything builds upon a standard metallic PBR material model, same as most modern engines runs today. We also stash high precision motion vectors to be able to do accurate reprojection for TAA, RGBM encoded irradiance from light maps (if present, else irradiance is looked up from an IBL probe), high precision normals, AO, etc. Things quickly add up, in the default configuration on PC we are looking at 192 bpp for the color targets (i.e not counting depth/stencil). The gbuffer layout could use some love, I think we should be able to shrink it somewhat without losing any features.

We then kick a resource generator called stabilize_and_linerize_depth, this resource generator does two things:

  1. It linearizes the depth buffer and stores the result in an R32F target using a fullscreen_pass.
  2. It does a hacky TAA resolve pass for depth in an attempt to remove some intersection flickering for materials rendering after TAA resolve. We call the output of this pass stable_depth and use it when rendering editor selections, gizmos, debug lines, etc. We also use this buffer during post processing for any effects that depends on depth (e.g. depth of field) as those runs after AA resolve.

After that we have another more minimalistic gbuffer layer for splatting deferred decals.

Last but not least we kick another resource generator that calculates per pixel velocity for any pixels that haven’t been rendered to during the gbuffer pass (i.e skydome).

Reflections & Lighting

// render localized reflection probes into hdr1
{ name="reflections" render_targets=["hdr1"] depth_stencil_target="depth_stencil_buffer" 
    sort="FRONT_BACK" profiling_scope="reflections probes" }

{ extension_insertion_point = "reflections" }

// kick resource generator for screen space reflections
{ type="static_branch" platforms=["win", "xb1", "ps4"]
    pass = [
        { resource_generator="ssr_reflections" profiling_scope="ssr" }
    ]
}

// kick resource generator for main scene lighting
{ resource_generator="lighting" profiling_scope="lighting" }
{ extension_insertion_point = "lighting" }

At this point we are fully done with the gbuffer population and are ready to do some lighting. We start by laying down the indirect specular / reflections into a separate buffer. We use a rather standard three-step fallback scheme for our reflections: screen-space reflections, falling back to localized parallax corrected pre-convoluted radiance cubemaps, falling back to a global pre-convoluted radiance cubemap.

The reflections layer is the target layer for all cubemap based reflections. We are naively rendering the cubemap reflections by treating each reflection probe as a light source with a custom material. These lights gets picked up by a resource generator performing traditional deferred shading - i.e it renders proxy volumes for each light. One thing that some people struggle to wrap their heads around is that the resource generator responsible for running the deferred shading modifier isn’t kicked until a few lines down (in the lighting resource generator). If you’ve paid attention in my previous posts this shouldn’t come as a surprise for you, as what we describe here is the GPU scheduling of a frame, nothing else.

When the reflection probes are laid down we move on and run a resource generator for doing Screen-Space Reflections. As SSR typically runs in half-res we store the result in a separate render target.

We then finally kick the lighting resource generator, which is responsible for the following:

  1. Build a screen space mask for sun shadows, this is done by running multiple fullscreen_passes. The fullscreen_passes transform the pixels into cascaded shadow map space and perform PCF. Stencil culling makes sure the shader only runs for pixels within a certain cascade.
  2. SSAO with a bunch of different quality settings.
  3. A fullscreen pass we refer to as the “global lighting” pass. This is the pass that does most of the heavy lifting when it comes to the lighting. It handles mixing SSR with probe reflections, mixing of SSAO with material AO, lighting from all simple lights looked up from the clustered shading structure as well as calculates sun lighting masked with the result from sun shadow mask (step 1).
  4. Run a traditional deferred shading modifier for all light sources that has a material graph assigned. If the shader doesn’t target a specific layer the lights proxy volume will be rendered at this point, else it will be scheduled to render into whatever layer the shader has specified.

At this point we have a fully lit HDR output for all of our opaque materials.

Various stuff

// layer for emissive materials
{ name="emissive" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer" 
    sort="FRONT_BACK" profiling_scope="emissive" }

// kick debug visualization
{ type="static_branch" render_caps={ development=true }
    pass=[
        { resource_generator="debug_visualization" profiling_scope="debug_visualization" }
    ]
}

// kick resource generator for laying down fog 
{ resource_generator="fog" profiling_scope="fog" }

// layer for skydome rendering
{ name="skydome" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer" 
    sort="BACK_FRONT" profiling_scope="skydome" }
{ extension_insertion_point = "skydome" }

// layer for transparent materials 
{ name="hdr_transparent" render_targets=["hdr0"] depth_stencil_target="depth_stencil_buffer" 
    sort="BACK_FRONT" profiling_scope="hdr_transparent" }
{ extension_insertion_point = "hdr_transparent" }

// kick resource generator for reading back any requested render targets / buffers to the CPU
{ resource_generator="stream_capture_buffers" profiling_scope="stream_capture" }

// kick resource generator for capturing reflection probes
{ type="static_branch" platform=["win"] render_caps={ development=true }
    pass = [
        { resource_generator="cubemap_capture" }
    ]
}

// layer for rendering object selections from the editor
{ type="static_branch" platforms=["win", "ps4", "xb1"]
    pass = [
        { type = "static_branch" render_settings={ selection_enabled=true }
            pass = [
                { name="selection" render_targets=["gbuffer0" "ldr1_dev_r"] 
                    depth_stencil_target="depth_stencil_buffer_selection" sort="BACK_FRONT" 
                    clear_flags=["SURFACE" "DEPTH"] profiling_scope="selection"}
            ]
        }
    ]
}

Next follows a bunch of layers for doing various stuff, most of this is straightforward:

  • emissive - Layer for adding any emissive material influences to the light accumulation target (hdr0)
  • debug_visualization - Kick of a resource generator for doing debug rendering. When debug rendering is enabled, the post processing pipe is disabled so we can render straight to the output target / back buffer here. Note: This doesn’t need to be scheduled exactly here, it could be moved later down the pipe.
  • fog - Kick of a resource generator for blending fog into the accumulation target.
  • skydome - Layer for rendering anything skydome related.
  • hdr_transparent - Layer for rendering transparent materials, traditional forward shading using the clustered shading acceleration structure for lighting. VFX with blending usually also goes into this layer.
  • stream_capture_buffer - Arbitrary location for capturing various render targets and dumping them into system memory.
  • cubemap_capture - Capturing point for reflection cubemap probes.
  • selection - Layer for rendering selection outlines.

So basically a bunch of miscellaneous stuff that needs to happen before we enter post processing…

Post Processing

// kick resource generators for AA resolve and post processing
{ resource_generator="post_processing" profiling_scope="post_processing" }
{ extension_insertion_point = "post_processing" }

Up until this point we’ve been in linear color space accumulating lighting into a 4xf16 render target (hdr0). Now its time to take that buffer and push it through the post processing resource generator.

The post processing pipe in the Stingray Renderer does:

  1. Temporal AA resolve
  2. Depth of Field
  3. Motion Blur
  4. Lens Effects (chromatic aberration, distortion)
  5. Bloom
  6. Auto exposure
  7. Scene Combine (exposure, tone map, sRGB, LUT color grading)
  8. Debug rendering

All steps of the post processing pipe can dynamically be enabled/disabled (not entirely true, we will always have to run some variation of step 7 as we need to output our result to the back buffer).

Final touches

// layer for rendering LDR materials, primarily used for rendering HUD and debug rendering
{ name="transparent" render_targets=["output_target"] depth_stencil_target="stable_depth_stencil_buffer_alias" 
    sort="BACK_FRONT" profiling_scope="transparent" }

// kick resource generator for rendering shadow map debug overlay
{ type="static_branch" render_caps={ development=true }
    pass = [
        { resource_generator="debug_shadows" profiling_scope="debug_shadows" }
    ]
}

// kick resource generator for compositing left/right eye
{ type="static_branch" platforms=["win"] render_settings={ vr_supported=true }
    pass = [
        { resource_generator="vr_present" profiling_scope="present" }
    ]
}

Before we present we allow rendering of unlit geometry in LDR (mainly used for HUDs and debug rendering), potentially do some more debug rendering and if we’re in VR mode we kick a resource generator that handles left/right eye combining (if needed).

That’s it - a very high-level breakdown of a rendered frame when running Stingray with the default “Stingray Renderer” render_config file.

Mini Renderer

We also have a second rendering pipe that we ship with Stingray called the “Mini Renderer” - mini as in minimalistic. It is not as broadly used as the Stingray Renderer so I won’t walk you through it, just wanted to mention it’s there and say a few words about it.

The main design goal behind the mini renderer was to build a rendering pipe with as little overhead from advanced lighting effects and post processing as possible. It’s primarily used for doing mobile VR rendering. High-resolution, high-performance rendering on mobile devices is hard! You pretty much need to avoid all kinds of fullscreen effects to hit target frame rate. Therefore the mini renderer has a very limited feature set:

  • It’s a forward renderer. While it’s capable of doing per pixel lighting through clustered shading it rarely gets used, instead most applications tend to bake their lighting completely or run with only a single directional light source.
  • No post processing.
  • While all lighting is done in linear color space we don’t store anything in HDR, instead we expose, tonemap and output sRGB directly into an LDR target (usually directly to the back buffer).

The mini_renderer.render_config file is ~400 lines, i.e. less than 1/3 of the stingray renderer. It is still in a somewhat experimental state but is the fastest way to get up and running doing mobile VR. I also feel that it makes sense for us to ship an example of a more lightweight rendering pipe; it is simpler to follow than the render_config for the full stingray renderer, and it makes it easy to grasp the benefits of data-driven rendering compared to a more static hard-coded rendering pipe (especially if you don’t have source access to the full engine as then the hard-coded rendering pipe would likely be a complete black box for the user).

Wrap up

I realize that some of you might have hoped for a more complete walkthrough of the various lighting and post processing techniques we use in the Stingray renderer. Unfortunately that would have become a very long post and also it feels a bit out of context as my goal with this blog series has been to focus on the architecture of the stingray rendering pipe rather than specific rendering techniques. Most of the techniques we use can probably be considered “industry standard” within real-time rendering nowadays. If you are interested in learning more there are lots of excellent information available, to name a few:

In the next and final post of this series we will take a look at the shader and material system we have in Stingray.

2 comments:

  1. I know it is not related to the post in itself, but I couldn't help but wonder: is there a place where you describe how your Hashset is implemented? I saw the code of the Bitsquid foundational library, but I mean a more conceptual description. I think it would be pretty useful and important, since you seem to have tried to implement a kind of hash table that is laid contiguously in memory (which is non trivial to find out there). Thanks anyways!

    ReplyDelete