EntBlog
Code, 3D, Games, Linux and much more...
Compile-Time Strings
April 28, 2009 @ 23:19 | In CodeGems, Programming | 6 Comments |
It would be nice if we had such a feature in the C language, wouldn’t it? The term ‘compile-time string’ is referred here as strings that are converted to unique integer identifiers at compile time. At run-time those identifiers are simple integers that can be compared and hashed very fast. In other languages, like for example Smalltalk, the concept of Symbol implements a similar idea. The following post describes a possible implementation of this feature in C/C++.
Imagine, for example, a generic object factory where object instances are created using unique identifiers. The classical solution here is having a shared-by-all-code header where all the identifiers are declared in a C enumeration. This solution, apart from creating a serious physical dependency where adding a new identifier to the enumeration forces a recompilation for all the project, is unfeasible in modular architectures where modules are isolated. In those architectures having a global header is not an option.
One viable solution may be using strings as identifiers. But strings are heavy objects, hard to compare and prone to typing errors because miswritten symbols would be detected at run-time instead of compile-time. Other equally insufficient solutions to this problem include FourCC and esoteric template tricks for generating a hash at compile time (desist from it, it is not possible to solve this 100% with templates because strings cannot be used as template parameters and anyway hashing a string is not collision-free. More information in this usenet thread). Mick West proposes more solutions in his Practical Hash IDs article.
What follows is an implementation that has been working nicely for me and that satisfactorily fits the requirements for simulating compile-time strings. First let me show you two examples of the usage:
namespace { DECLARE_SYMBOL(CubeMesh) DECLARE_SYMBOL(SphereMesh) DECLARE_SYMBOL(DuckMesh) } void CollectNodes(Ptr<Node>& node) { Ptr<Mesh> mesh0 = CreateObject(S(CubeMesh)); node->Add(mesh0); Ptr<Mesh> mesh1 = CreateObject(S(SphereMesh)); node->Add(mesh1); Ptr<Mesh> mesh2 = CreateObject(S(DuckMesh)); node->Add(mesh2); }
namespace { DECLARE_SYMBOL(FirstMessage) DECLARE_SYMBOL(SecondMessage) DECLARE_SYMBOL(ThirdMessage) } void ProcessMessage(const Message& msg) { if (msg.id == S(FirstMessage)) { /// ... } else if (msg.id == S(SecondMessage)) { /// ... } else if (msg.id== S(ThirdMessage)) { /// ... } }
A symbol represents a compile-time string. They must be declared before being used. The macro for declaring a symbol is hiding an inline function with a static inside itself:
#define DECLARE_SYMBOL(id)\
inline Symbol __GetSymbol##id() throw()\
{\
static size_t sym;\
if (sym == 0)\
{\
sym = GetIdFromString(#id);\
}\
return Symbol(sym);\
}The function GetIdFromString() hashes the string, stores it in an internal table and returns the table position for that string (the Symbol class is a simple wrapper around the identifier). This is done only the first time the symbol is requested. For future requests the static ID is returned. This adds a little overhead against using simple integers as symbols. Beware of local static initializations: they are not thread-safe. That is the reason of the manual comparison against 0. GetIdFromString() must be thread-safe for this code to work.
The S macro simply invokes the local function previously generated:
#define S(id) __GetSymbol##id()And there you have it. Compile-time strings with negligible (in case you are doing anything more that simply comparing symbols) overhead. In case you need 100% efficient code you could pre-generate a table with the symbols being used by your project (searching for all DECLARE_SYMBOL blocks) and substitute each S() with a really unique identifier generated at compile-time. And that would be so easy if the preprocessor could be extended in a standard way…
Hope this makes sense. Thank you for reading.
Sat, 21 Nov 2009 20:57:04 +0100 / 29 queries. 1.620 seconds / 5 Users Online
|
|
|
|
Theme modified from Pool theme. Valid XHTML and CSS
About
Categories
Why didn’t you choose to do something like this?
typedef size_t Symbol;
#define DECLARE_SYMBOL(id)\
static Symbol SYMBOL_##id = GetIdFromString(#id);
#define S(id) SYMBOL_##id
That way you don’t have to worry about making GetIdFromString thread-safe because it will only be called from the C runtime static initializer, which is guaranteed to be single threaded, and all the cost associated with it will be removed from the runtime. Is there any downside to this approach?
Comment by hcpizzi
April 29, 2009 @ 13:16 #
Do you have a way to know the associated string to a Symbol(number), for instance, when you are debugging?
Comment by Gus
May 2, 2009 @ 13:28 #
Gus, I’ve implemented that using #defines for debug configurations in VS.
My Symbol class has a string field and a constructor requiring the string, that are only visible when the _DEBUG symbol is defined:
class Symbol
{
public:
#if _DEBUG
Symbol(size_t id, char* str)
{
_id=id;
_str = str;
}
std::string getStr()
{
return _str;
}
#else
Symbol(size_t id) { _id=id; }
#endif
size_t getID() { return _id; }
private:
#if _DEBUG
const char* _str;
#endif
size_t _id;
};
so you could use:
…
return Symbol(sym, #id); /
…
I’ve just realized that a cleaner approach could be leaving that constructor and field available for all configurations, but saving the string only in debug, something like this:
Symbol::Symbol(u32 sym, char * str)
{
_sym = sym;
#if _DEBUG
_str = str;
#else
_str = 0;
#endif
}
const char * Symbol::getStr()
{
if (!_str) return “”;
return _str;
}
Comment by Rickyah
May 4, 2009 @ 9:57 #
Hi pizzi,
we try to avoid statics in our architecture. That way, we have under control the init order of the different subsystems (for example, the symbol system depends on the log system and memory system).
If you do not have that restriction, your solution seems to be right.
Comment by ent
May 4, 2009 @ 12:56 #
Gus,
yes, we have static functions that can be invoked from the watch window of the debugger:
Comment by ent
May 4, 2009 @ 12:59 #
actually, the C standard defines that “inline” doesnt enforce the function to necessarily be inline. It only “hints” the compiler that it is a good candidate for inline function
Comment by Daniel "NeoStrider" Monteiro
June 24, 2009 @ 4:32 #