Tuesday, September 1, 2009

Adventures in C++ land

I have recently discovered an interesting bug in a prototype implementation that I did for my research. For the current discussion, what exactly the implementation is about is irrelevant. It suffices to say that the code is C++ code.

I thought I'd share the experience so that others can avoid the same pitfall. The code in question is slightly complicated, but the problem is well illustrated by the following abstracted version:


std::vector<Foo> foos;

void bar(const Foo &foo)
{
while (*) {
foos.push_back(foo);
}
}

int main()
{
// ...
assert(foos.size() > 0);
while (*) {
bar(foos[0]);
}
}


Here "Foo" is a very well behaved user defined class, such as the favorite class of every C++ book author, the "ComplexNumber" class. The tests in the while loops are irrelevant, so I replaced them by nondeterministic tests (denoted by "*"; so just imagine for example that each loop runs for a small number of iterations).

The above code can crash. I experienced two types of crashes occurring: either the program terminates with an uncaught std::bad_alloc, or simply the program is terminated by a segfault.

So let me describe why the crash happens. The problem is that the "foo" argument is passed by reference. The std::vector class implements a variable-sized array. Therefore, when the underlying implementation of the vector runs out of space, the std::vector class
decides to allocate a bigger piece of memory, copies the old values of the vector into the newly allocated memory, and deletes the old memory.

So this entire thing could very well happen during the push_back call. But what is "foo"? Well, "foo" is exactly a reference to memory that might be deleted during the push_back call. Ooops. Accessing such memory is of course undefined behavior which means very bad things could happen. I solved the problem by passing the argument by value.

So the next time you do the classic "best practice" of passing arguments by const reference you might want to be more careful. It is interesting of course to analyze what led to this bug. One of the problems is having a global variable ("foos") which is used in an imperative fashion. This leads to the surprising sharing of state between the global variable "foos" and the local variable "foo".

The second problem is the leaky mental model that I had of the vector class. I thought of it as an array that can grow. But in fact it's not, it's an array that is sometimes replaced by a bigger array...

Also note that the bug lies entirely in the "bar" function, which is not correct in the particular case where "foo" is a reference to an element of "foos". However, the syntax of passing arguments by reference (indistinguishable from passing by value) does not help in detecting the possible bug at the point of call.

I scratched my head for a long while without finding the source of the bug. I finally installed valgrind, which is a great piece of software of course, and which I recommend whole-heartedly.