Many virtual reality games, at least in the short term, will be based on current rendering systems on the market today. On the one hand, this is a good thing in that it allows people to create compelling content quickly. On the other hand, these rendering systems weren’t built with VR in mind, and there are several challenges in getting the best performance and experience out of them. Through talking with people trying to integrate VR into their own rendering stacks, and through experimenting with different techniques myself, I’ve started gathering a list of gotchas that will hopefully help others as we move into this new realm.
Rendering stereo efficiently
The first issue is how to generate stereoscopic rendering efficiently. The easiest “bolt on” solution is to just invoke your rendering system twice, once per eye & viewport. However, there are a bunch of problems with this. Rendering a scene normally involves many different GPU pipeline states, switching between them as you switch between different materials, rendering techniques, etc… These state switches are expensive, and you end up having to do all of them again for the second eye. For example, this means a scene that has 100 different pipeline states now ends up switching between 200.
It is much, much, cheaper to switch a viewport and update a constant (view matrix) and redraw the same vertices using the same pipeline state that’s already loaded. This means if you pair up your rendering of objects and do both eyes back to back, per object, you get a very significant savings in performance. You only switch between the pipeline states as much as necessary, same as with a single eye.
Let’s look at an example to really make it clear what we mean:
Imagine your scene is made up of a few objects: a large piece of geometry representing a room rendered with pipeline state A, and a few skinned characters all rendered with pipeline state B, and a fire in the fireplace rendered with pipeline state C.
Traditionally, this would be rendered something like this, which is 3 state changes:
Set A, draw room. Set B, draw characters, Set C, draw fire.
The naïve, but easy, approach to rendering this scene in stereo is to do this:
Setup camera for eye 1, Set A, draw room. Set B, draw characters. Set C, draw fire.
Setup camera for eye 2, Set A, draw room. Set B, draw characters. Set C, draw fire.
That’s now 6 state changes, and if the vertices for the entire scene don’t all fit in VRAM, then we’ve possibly incurred a bit of paging activity as well as we rotate through all the data binding & unbinding it.
Here’s what a more optimal approach may look like:
Set A, setup eye 1, draw room, setup eye 2, draw room.
Set B, setup eye 1 draw characters, setup eye 2, draw characters.
Set C, setup eye 1 draw fire, setup eye 2, draw fire.
Here, we still only go through 3 state changes and we continue to leverage the vertices as they are already bound. This makes a tremendous difference in more complicated scenes. The only drawback is that you now have to pass enough information throughout your rendering pipeline to know how to setup both eyes & viewports, as opposed to just a single camera.
It may be tempting to just treat each eye as its own individual camera, but this poses several problems. The first is that usually you determine which objects to render based on the camera. While it would certainly work to run the visibility determination logic and generate a list of objects to render per eye, it’s also incredibly inefficient. The vast majority of the two lists would be the same, with only a few items unique to one eye or the other. It is usually much better to build a “fattened” view frustum which encompasses both eyes, and use that to do a single visibility culling pass. This list generated should then be used for both eyes. There will sometimes be a handful of objects that may have only been visible out of the corner of 1 eye, and will not generate any pixels for the other eye, but these are corner cases and conservative. This will generate the proper image, and saves a tremendous amount of processing time on the CPU.
Level of Detail (LOD)
Similar to visibility determination, determining LOD levels on a per-camera basis is the norm. However, this causes problems if you pick LOD on a per eye basis. An object that is out on the far left or far right of view might be on the boundary of two detail levels based on your current position. Such that one eye would pick level X, and the other eye level X + 1. If you have good blending between LODs, this may not be very obvious, but it’s still undesirable. The solution, again, is to just use a single fattened camera view with a single average position of the two eyes to do the LOD determination, and then use the results for both eyes.
Through my experimentation, it appears something about rendering an image in stereo makes the effect of normal maps diminish significantly. Normal maps occupying a small amount of view space (such as small items in the distance) look fine, but large walls and surfaces up close really start looking quite flat and cheesy with normal maps looking like their just cheaply painted onto the surface (which is in fact, exactly what they are J). I’m not sure why, but I suspect it has to do with the discrepancy between the true depth of the surface your eyes perceive and the fake perturbations the normal map is trying to impose. I believe actual displacement techniques, such as displacement maps, and tessellation will start becoming more popular as build more rich VR experiences. I also hope this means we can push on the GPU venders to finally fix their tessellation engines, as most have performance issues and inefficiencies.
View dependent lighting
It seems logical that view dependent computations, such as that for specular highlights or reflections, would benefit from using a per eye view vector. However, in practice this actually looks odd and I’m not sure why. In my experience, using separate view vectors per eye for rendering specular highlights makes the final lighting or reflected image look blurry and incorrect. Using a single view vector from the average or “middle” eye generally seems to look much better, even if it feels physically incorrect. I need to do some more research and experimentation here to see what’s going on, but for now my recommendation is to use a single vector to get a better looking output. But certainly try both and see which works best for you particular setup.
That’s it for now, I wanted to keep this post fairly short and just summarize a few things that may help people building content for VR. I’ll continue to document additional issues as I run into them, or hear about others running into them in the future. Good luck on getting your content on to the VR platform, and please let me know if you have additional questions or comments.