Implementing a Graphic Driver Abstraction

March 3, 2008 @ 19:26 | In Programming | | del.icio.us digg devbump rss
A screenshot from a mesh in wireframe mode

If you are a graphic programmer you have probably implemented lot of times what I will refer here as the graphic driver of the engine. The graphic driver is an abstraction over a low level graphic API like DirectX, OpenGL, libgcm, etc. I want to dedicate this article to some ideas and rules that have worked fine for me in the past when implementing this part of an engine. Probably your mileage may vary, so the comment section is open to discuss any detail you want.

  • Why you need it

    You could use directly the native API in your project instead of adding an useless abstraction to your project. If you are targeting a single platform you may probably end up doing it. I don’t think it to be a good idea. Even when you are targeting a single platform you want to hide a lot of details from the API. Graphics API are too generic and you probably only want a subset of them. Apart from being generic not all the capabilities are supported by all the hardware. You do not want to expose that combinatory outside. All those details and the corresponding fallbacks must be under control. A good place for that is the graphic driver abstraction. Furthermore If an API upgrade is needed (for example, from DirectX9 to DirectX10) having the graphic API properly encapsulate will allow for a painless transition.

    So, before taking the decision you must have a clear idea of the pros and cons of each option.

  • How to abstract

    I like to abstract at first using interfaces (pure virtual functions). These way you are hiding all the implementations details from the rest of the project decreasing the time to compile the project. In most of the scenarios these virtual layer shouldn’t add any inefficiency. If you discover later, when profiling, that the virtual calls are adding an important overhead, you can easily create a typedef from the interface class to the implementation class without altering the rest of the project. Even you can activate this only on specific final configurations. Obviously, you will only be able to do this if you are targeting one platform. If you want to support for example DirectX9 and DirectX10 within the same executable, you can not do this trick.

  • Choose the correct abstraction

    Choosing the barrier where to start abstracting is a difficult decision. If you abstract at a very low level, you end up using the lowest common denominator among the features of all the APIs you are abstracting. This is something to avoid. If you do it at a very high level, porting your architecture to a new platform will be a hard work. Taking this to the extreme implies having a total different engine for each platform. I have tried in the past abstracting at the level of render primitives. You define some basic primitives (eg: mesh, skinned mesh, morph mesh, particles, billboards) and implement them without any restriction for each platform. This is the option taken by some public engines (GameBryo for example). On the other side, you can choose for a low level abstraction exposing directly streams of data and implementing the high level primitives on top of this abstraction. This way, adding new primitives can be done without touching the graphic abstraction. You are more limited when implementing the primitives because you are using an abstraction for all the platforms, but all the platforms I know of have the concept of vertex streams. This should not be a real problem. This option is chosen by lot of engines (OGRE 3D and Unreal Engine for example) and is the one that have worked best for me in the past. To avoid the problem of choosing the common capabilities of all the platforms you are abstracting I recommend delegating that part to the shader subsystem and avoiding exposing anything related to that in the API. This is the data driven approach described later.

  • Support Immediate Mode Rendering

    Allow for Immediate rendering of primitives. This means that the functions exposed in your abstraction should allow direct rendering of primitives without any kind of batching or retained mode behavior. Of course, you will need a retained mode for rendering your scene. Batching by shader for example is absolutely needed for maximum performance. There is a great article by Tom Forsyth describing the costs of changing render states. But you must do it on top of your graphic abstraction. There are lot of scenarios where the immediate mode is needed: rendering GUI and HUD are good examples.

    There are short entries in the Wikipedia explaining the concept of Retained mode and Immediate mode.

  • Do not expose individual render states

    Individual render states are hard to track, hard to batch and hard to debug. I even choose to not allow setting states at all in the interface (read the next point). New graphic APIs like DirectX10 are oriented toward the concept of State Blocks. A State Block is a group of render states stored in the driver side. With State Blocks, Render States are not transfered, but cached inside the driver: you do not send individual render states but you activate them from a previously create object. I really recommend this in the abstraction. Even for older APIs (like DirectX 9), this will be a huge benefit. To avoid redundancy those blocks must be grouped by usage frequency: world - material - object- instance.

  • Choose for a shader centric abstraction

    Shaders are a powerful abstraction for exposing platform capabilities. The term shader is used here as an association of vertex shader, pixel shader, geometry shader, shader variables and render states. I recommend here a format similar to the Effect File Format provided by Microsoft or the CgFX provided by Nvidia. You implement different shaders for different platforms and when the shader is loaded, the proper one is chosen. With some preprocessor tricks, you can have all your shaders written in a common language valid for DirectX9, DirectX10, OpenGL, Xbox and Ps3.

    On top of that, you should have your material system. A material can be a shader plus a state block. For example, you could have your shader for rendering solid colors, and a solid red material would be the solid shader plus the state block containing the red value. State blocks can be layered, overriding values by material, by object, etc. For example:

    renderTarget->BeginRender();
        renderTarget->Clear();
        renderTarget->BeginShader(shader);
            renderTarget->SetConstantBuffer(materialBlock);
            renderTarget->SetConstantBuffer(objectBlock);
            renderTarget->DrawGeometry(geom);
            renderTarget->SetConstantBuffer(objectBlock);
            renderTarget->DrawGeometry(geom);
        renderTarget->EndShader()
    renderTarget->EndRender();

    As you can see, setting states is not needed by the API. You delegate that to the shaders following a data driven approach. Describing a robust material system deserves a full post and it is not he purpose of this article. Some interesting links can be found at the end of this writing.

  • Allows for caching commands

    A powerful abstraction must allow for caching commands. This is something very similar to display lists in OpenGL or command buffers in XBox. It is a very efficient way to avoid the abstraction overhead at all and saving CPU cycles. Instead of sending shaders, states and primitives to the driver, you compile all these commands to a new object that can be applied later giving the same results that the individuals commands. This compiled object can be implemented very efficiently in some platforms. Consoles are an ideal target for this because you have a lot more control over the hardware. Commands buffers are clear win in XBox for example.

  • Take advantage of multi-core architectures

    Although a graphic driver is inherently single-threaded (graphics commands must be processed in sequential order to make sense) you can get benefits if you have a thread for dispatching commands to the graphic card. This approach will hide the driver and runtime latency from the main thread. As observed in some commercial projects, the driver and runtime overhead can get to 25 - 40% of the frame (from the CPU work). Apart from this you will need a render thread on top of the graphic driver. This thread will be in charge of traversing the scene and sorting render command. This implies having your scene data double buffered to some degree, but that is outside the scope of this article. To benefit that layer you must allow generating render commands in parallel. This way, for example, render commands for several shadow maps can be generated in different threads.

I think I have covered the most important points you have to consider when implementing a graphic abstraction from the beginning. You can find more details in the provided bibliography. Any suggestion, correction or new ideas are welcome for improving the article.

Thanks for reading.

 
 


  1. Frosbite Rendering Architecture (GDC2007)
  2. Rapid-fire Material Systems with Direct3D 10
  3. Shader Abstraction by Tom Forsyth (Shader X2)
  4. From the Trenches: Xbox 360 Development War Stories
  5. Optimizing DirectX on Multi-core Architectures
  6. A Flexible Material System In Design



  1. Great post! It was nice to read and still have interesting tips.

    I’ve taken the Gamebryo aproach which, if I understood correctly, doesn’t have a custom vertex stream object, but only high level objects isn’t it?

    The only thing that surprised me was the ‘immediate mode’ in the GUI. Really? I thought immediate mode was very inneficient and so I only use it to debug.



    Comment by Miguel Herrero
    March 4, 2008 @ 9:38 #

  2. Pretty interesting

    I have applied some of these tips in my engine for DS and Symbian :P



    Comment by Zalo
    March 4, 2008 @ 14:23 #

  3. Hi Miguel,

    Yes, the GameBryo approach exposes only high level primitives: meshes, particles systems. You implement those primitives in each platform, for example in DX10 you could implement the particles using geometry shaders and export capabilities while you could use dynamic vertex buffers in DX9.

    With immediate mode I mean the mode in which the orders send to the API are executed immediately. Of course, you need some kind of batch for the GUI. You should render all the quads with the same texture in a single call.

    So, instead of having only a retained mode API, I prefer having a retained mode API implemented using a immediante mode API. This way you can have several retained mode APIs: one for the scene3D, one for the GUI, one for draw debug elements, etc



    Comment by ent
    March 4, 2008 @ 17:03 #

  4. Now it makes more sense :D
    Though it seems like it’s hard work.



    Comment by Miguel Herrero
    March 4, 2008 @ 18:47 #

  5. Great post!!
    I’m looking forward to learn more about this subject in the upcoming course about Videogame Graphics Engines you are going to give for the GameLab on April. :)



    Comment by Ricky
    March 5, 2008 @ 11:35 #

  6. Implementing a Graphic Driver Abstraction…

    En esta entrada del blog de Jesus de Santos, nos explica a rasgos generales los puntos fundamentales a tener en cuenta a la hora de crear la abstraccion del API del rasterizador. Muy recomendado para la gente que esta haciendo motores. No os perdais lo…



    Trackback by pixelame.net
    March 6, 2008 @ 14:01 #

  7. I hate the idea of using pure virtual functions, not because of the overhead but because it’s another, useless class to change when you have to modify a definition. Also it’s not safe, someone could add a function only in the DirectX10 implementation and forget to add it to the abstract interface. Also they are an abuse, an interface that has only one implementation (because the PS3 one won’t be in the X360 project, so in a given project you end up having only one)…
    What I did is to have a common header, with platform specific includes for the inline functions and platform specific implementations.



    Comment by Angelo Pesce
    April 23, 2008 @ 3:14 #

  8. Hi Angelo,

    There is one clear advantage to me in using interfaces: hiding the implementation details. Or what is the same: reducing incremental building times. As the architecture grows and grows, building times can easily become a serious bottleneck.

    If, when profiling, you discover that the vtable is adding a notable overhead you always can do a simple:

    typedef MyInterface MyImplementationClass;

    and voila!, your problems disappear. :)



    Comment by ent
    April 23, 2008 @ 11:34 #


Fri, 25 Jul 2008 11:43:30 +0200 / 25 queries. 1.510 seconds / 2 Users Online

gentoo link wordpress link apache link PHP link website stats

Theme modified from Pool theme. Valid XHTML and CSS