gnu logo

GCC Undefined Behavior Sanitizer – ubsan

This article is written as opinion. The opinions expressed within are solely those of the author, and do not represent the views of Red Hat.


Not every software bug has as serious consequences as seen in the Ariane 5 rocket crash. Notwithstanding that, bugs cost software companies a lot of money every year and upset customers, users, and developers. Some bugs happen as a result of undefined behavior occurring in the program. Undefined behavior is a concept known especially in the C and C++ languages which means that the semantics of certain operations is undefined and the compiler presumes that such operations never happen. For instance, using non-static variable before it has been initialized is undefined. If an undefined behavior occurs, the compiler is free to do anything. The application can produce wrong results, crash, or print the complete text of Proust’s oeuvre.gnu logo

Luckily, there are ways to detect at least some of the undefined behavior in a program. The compiler can issue a warning at compile time, but only in case it can statically detect some kind of wrongdoing.  Often this is not the case and the checking has to take place at run time.

Enter ubsan

GCC recently (version 4.9) gained Undefined Behavior Sanitizer (ubsan), a run-time checker for the C and C++ languages. In order to check your program with ubsan, compile and link the program with -fsanitize=undefined option. Such instrumented binaries have to be executed; if ubsan detects any problem, it outputs a “runtime error:” message, and in most cases continues executing the program. There is a possibility of  making these diagnostic messages abort — just use the option -fno-sanitize-recover.

At present, ubsan can offer a handful kinds of checking. The simplest is probably the integer division by zero sanitization: if a division by zero occurs, or INT_MIN / -1 for signed types, a run-time error is issued. Floating-point type division by zero is off by default, but can be turned on with the -fsanitize=float-divide-by-zero command-line option.

Shifts

Sanitization of the shift operation ensures that the result of a shift operation is not undefined. Note that what exactly is considered undefined differs slightly between C and C++, as well as between ISO C90 and C99. Generally, the right operand must not be negative and must not be greater than or equal to the width of the (promoted) left operand. An example of invalid shift operation is the following:

int i = 23;
i <<= 32;

Overflow

One of the most important checking is the signed integer overflow checking. The practice shows that this undefined behavior is very common in real programs. Ubsan is able to check that the result of addition, subtraction, multiplication and negation does not overflow in signed arithmetic. For instance, in the example below ubsan would issue a run-time error:

int i = INT_MIN;
int j = -i;

But since one has to take the integer promotions into account, the following snippet is valid:

signed char c = SCHAR_MAX;
c++;

Even a conversion of a floating-point value to an integer value can overflow. Such a case is not diagnosed by default, but can be enabled specifically with the -fsanitize=float-cast-overflow option.

NULL

Ubsan also provides a NULL pointer dereference checking. Thus, if a program tries to dereference or store to a NULL pointer, a run-time error is displayed. Furthermore, the NULL pointer checking handles even the case when a method is invoked on an object pointed by a NULL pointer.

Instrumentation of __builtin_unreachable calls simply invokes a run-time error any time __builtin_unreachable is reached in the program. Return statement  instrumentation is only valid for C++ programs. It triggers when the end of a non-void  function is reached without actually returning a value.

Bounds

Out-of-bounds access is one of the most serious mistakes. Ubsan can help here, since it is able to instrument out-of-bounds accesses as well. Note that a pointer that points just past the end of an array is valid in C; a single object is treated as a 1-element array. Bounds instrumentation works on variable length arrays (VLAs) as well, but flexible array members are not instrumented.

Similar to the above, the VLA checking merely checks that a VLA’s size is a positive integer.

Alignment

Accessing a misaligned pointer also results in undefined behavior. Ubsan provides checking of alignment of pointers as they are dereferenced. Calling a method or a constructor on an improperly aligned object is not valid either, and ubsan is able to detect  this mistake as well.

Arguments

GCC provides two attributes that can be used to hint the compiler that a function either should never get a NULL as an argument (nonnull attribute), or that a function does not return NULL (returns_nonnull). With this, the compiler is able to better optimize the  code. But if the function gets or returns NULL pointer nevertheless, all bets are off.  Ubsan’s nonnull attribute checking can be used to catch such wrongdoings.

Enums

Yet another feature is bool-enum load checking, which makes sure that storing a value other than 0/1 into a boolean does not go unnoticed, as well as storing a value of an enumerated type which is outside the values of that enumerated type.

And more to come

Some features are currently under development. The first one is object size checking. This makes use of the __builtin_object_size function, which returns the size of an object. Typically, compiler optimizations must be enabled for __builtin_object_size to work properly. If the compiler can prove that the program is accessing bytes outside an object, it churns out a run-time error.

And finally, another feature that is currently in the works is virtual pointer checking. As the name suggest, it is intended for C++ programs, and ought to verify that virtual pointers are in order – if not, the application is likely wrong and prone to fail.

With this work we attempted to discover many bugs in the programs as possible. That said, ubsan can’t prove that the program does not contain any bugs. Yet, especially together with -fsanitize=address, it proved useful in hunting down the creeping bugs, if used regularly.

We’re always interested in receiving your feedback and questions, so feel free to add a comment or drop us an email at RHELdevelop AT redhat DOT com or tweet!


Join Red Hat Developers, a developer program for you to learn, share, and code faster – and get access to Red Hat software for your development.  The developer program and software are both free!

  1. I find it amusing that Red Hat pretend to take undefined behaviour so seriously. Most GNOME projects are maintained or contributed to by Red Hat and the use of reserved identifiers is pervasive among those projects. By pervasive, I mean close to 100% of GNOME projects do it and many of the examples in the documentation do too. It’s almost like some kind of style guide there. Red Hat desperately need to read section 7.1.3 of the C99/11 spec and collect the low hanging fruit before they start working on tools like this.

    Like

    1. We’re delighted to see GCC support ubsan, asan, and tsan. The more reach, the more users, the better. Bugs are the common enemy.

      Like

    2. The sanitizer projects came from Google, and they did ports to both LLVM and GCC (first to GCC iirc actually). Pointless fanboyism like this is purely noise.

      Like

  2. The example for shifts could be improved. Indeed, with “int i = 23; i <<= 32;", the behavior could be undefined because of signed integer overflow (if the specific rule on the shift count were not present). "unsigned int i = 23; i <<= 32;" or "int i = 0; i <<= 32;" would be better.

    Like

  3. The examples for shifts assume that an int is 32 bits, which is generally a fairly good assumption, but when it comes to picking nits and knowing the underlying details of the target hardware architecture (which is what the sanitizer is all about), may not always be true.

    C99 defines int32_t in ; it may be worth adding a one-liner to clarify the code a little:

    int32_t i; /* 32-bit signed integer, from C99’s . */

    Like

  4. I am using g++ (tdm64) version 5.1.0. when I try to compile my C++ program using -fsanitize=undefined option it gives me an error that cannot find lubsan. What is the problem here? Please help me.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s