gnu logo

Why < cstdlib > is more complicated than you might think

One of the reasons that C++ has been so successful and become so widely used is that it was (at least initially) compatible with C, which was already very popular. C++ programs were able to make use of lots of pre-existing C code, and in particular to make use of the C Standard Library. Although the C and C++ languages have diverged, the C++ Standard Library incorporates most of the C99 library by reference.

This article explains some of the difficulties that arise when implementing the parts of the C++ standard library that are shared with C. Some of the difficulties were the cause of longstanding bugs in the GNU C++ standard library, bugs which have only recently been fixed for the forthcoming GCC 6 release. Understanding these issues involves some history of the relevant standards, and some gory details of how a C++ standard library interacts with a C standard library. The gory details will be kept to a minimum though, only referring to them from a high level.

Note: Throughout this article I will use <cxxx> and <xxx.h> as placeholders for C++ standard library headers that correspond to headers inherited from C. For example, <cxxx> refers to headers such as <cstdio>, <cstdlib> and <cmath>, and <xxx.h> refers to headers such as <stdio.h>, <stdlib.h> and <math.h>.

An end to global pollution

The original 1998 C++ standard (C++98) said that the correct way to use the functions inherited from the C library was via a <cxxx> header, such as <cstring>. Those headers would declare their contents in namespace std and would not “pollute” the global namespace. Accordingly, the traditional <xxx.h> headers (which were deprecated in the 1998 standard, and remain so today) were required to do:

#include <cxxx>
using std::foo;
using std::bar;
using std::baz;

This specification meant that including <cxxx> would only declare names in namespace std, but including <xxx.h> would declare those same names in both namespace std and the global namespace. This allowed code ported from C to include <xxx.h> and still compile, because functions such as strlen would be found in the global namespace and so could be used without qualification. New, modern code written from scratch as C++ was supposed to use the <cxxx> headers and qualify names, for example std::strlen.

In practice, few C++ implementations on Unix-like systems implemented their headers this way. Most C++ implementations didn’t include the C library functions, they just piggy-backed on top of the native C library already provided by the OS. On Unix the C library is an integral part of the OS, and is always present, so it made sense to build an implementation of the C++ library on top of that.

This meant that many C++98 implementations failed to conform to the requirement that <cxxx> should only declare names in namespace std. Instead, they did it “backwards”, so that C++ compilers came with <cxxx> headers which looked like this:

extern "C" {
#include <xxx.h>  // header from native C library
}
// C++ disallows macros that "mask" functions of the same name:
#undef foo
#undef bar
#undef baz
namespace std
{
  // Re-declare each required function in namespace std:
  using ::foo;
  using ::bar;
  using ::baz;
}

(If the native C library is at least minimally C++-aware, which wasn’t always true in 1998 but is almost universally true now, then the extern "C" linkage specification shown above will be present in the C headers themselves, and so isn’t needed in the <cxxx> header.)

The technique above was not a valid implementation of the C++ standard library, because although the <cxxx> headers did declare their contents in namespace std, they also “polluted” the global namespace, declaring the names there too. Providing a valid C++98 library requires re-implementing the entire C library, or required close co-operation with the C library vendor to make the C headers fully C++-conforming. Neither is practical for cross-platform compilers such as G++ which can be used on dozens of different platforms with slightly different native C libraries. This meant that non-conforming C++98 headers were the norm for a long time after the 1998 standard was published.

Global pollution is here to stay

By the mid-2000s, a carefully written C++ program could use the common intersection of conforming and non-conforming C++ implementations and be mostly portable between implementations. Such code needed to consistently use std:: to qualify names after including a <cxxx> header, and consistently not qualify names after including a <xxx.h> header. But in the real world, developers aren’t always so careful. Even if they are aware of the precise requirements of the standard, it’s easy to accidentally use strlen (without qualification) after including <cstring>, and if the code compiles then it will get shipped. This meant that millions of lines of C++ code were unwittingly relying on properties of their implementation which violated the standard, creating a serious portability problem.

Another aspect of the portability problem is that a <cxxx> header might expose additional non-standard names if the <xxx.h> declares them. For example, POSIX functions defined in <time.h> might become available via <ctime> (but only in the global namespace, because there is no using declaration for them in <ctime>). In some cases this is convenient for developers, as many programs do want to use the richer APIs defined by POSIX or GNU rather than only those functions defined in the C++ standard. But it’s not portable to rely on those names being declared by <cxxx> headers, because that only works when the headers use the common but non-conforming implementation technique shown above.

This non-conformance issue was so widespread among C++ implementations, and the portability problem affected so much code, that for the 2011 revision of the C++ standard (aka C++11) the ISO C++ committee took the pragmatic step of deciding that the common approach was a valid implementation strategy. Thanks to Library Working Group (LWG) Defect Report 456 it is now unspecified whether <cxxx> headers include <xxx.h> and import names from the global namespace to namespace std, or vice versa.

The convention described above, where programs only use the portable intersection of conforming and non-conforming implementations, is now the correct way to write C++. You must not assume that <cxxx> adds any names to the global namespace, and you must not assume that <xxx.h> adds any names to namespace std. The standard no longer specifies which of the headers has the original declarations and which header just includes the other and re-declares the names in another namespace.

So as of C++11, it’s easy to implement conforming <cxxx> headers, right? Surely C++ implementations that didn’t conform to C++98 don’t need to change anything and their headers are now conforming? Unfortunately it’s still a bit more complicated than that.

Language overload

The C++ standard requires extra overloads of several functions, either to improve usability or to avoid type-safety holes. An example of the changes for type-safety (or more precisely, const-correctness) is strchr. The C library defines this in <string.h>:

char *strchr(const char *s, int c);

This function takes a const string and returns a pointer into that string, as a non-const pointer, effectively casting away the const. Because C++ supports overloading of functions it replaces that signature with a pair of overloads:

char* strchr(char* s, int c);
const char* strchr(const char* s, int c);

This means in C++ if you call strchr with a const pointer you get a const pointer back, and if you call it with a non-const pointer you get a non-const pointer back. C++ mandates similar changes to strrchr, strpbrk, strstr, memchr, wcschr, wcspbrk, wcsrchr,wcsstr, andwmemchr, which is good for const-correctness. But this is difficult to implement if the C++ library just piggy-backs on the native C library. The C library has to co-operate and adjust its <string.h> header to be compatible with C++, omitting the C version of strchr, and then either defining the two C++ signatures, or leaving the C++ library to add those declarations itself.

The other kind of changes that C++ mandates to the C library headers are to overload functions to work with different argument types. The C header <math.h> defines the functions log, logf, and logl to calculate the natural logarithm of a double, float, or long double respectively. Again, this is necessary because C doesn’t support overloading. But C++ does, so in addition to those functions, C++ requires that in <math.h> (and <cmath>) log is overloaded to work with any floating-point type, and similarly for the other groups of math functions such as exp/expf/expl, pow/powf/powl, etc. This means in C++ you can call log(x) for any floating-point type and it will do the right thing.

Unlike for strchr, these extra overloads can be implemented purely in the C++ headers, without co-operation from the C library. A simplified version of <cmath> might look like this:

#include <math.h>
#undef log
#undef logf
#undef logl
// ...
namespace std
{
  // Import C functions into namespace std:
  using ::log;
  using ::logf;
  using ::logl;
  // Define additional overloads required by C++:
  inline float log(float x) { return logf(x); }
  inline long double log(long double x) { return logl(x); }
  // ...
}

Prior to GCC 6 this is how the GNU C++ Standard Library implemented <cmath>, but there’s a problem: the additional overloads are only declared if <cmath> is included. If <math.h> is included then only the double log(double) signature is defined (and additionally, any masking macros for log etc. are not undefined).

This creates yet another portability problem: the <cmath> header might be fully conforming, but <math.h> isn’t. The two headers don’t just define names in different namespaces, but one only has a subset of the required function declarations, and might have masking macros that the C++ standard doesn’t allow. The differences between the two headers, and what works with one and what works with the other, are often not obvious and can be very confusing to developers who have been told that the headers should declare the same functions.

For GCC 6 this has been fixed, by making the C++ library install its own version of <math.h>, which includes <cmath>, which then includes the real <math.h> from the C library. If this sounds confusing, that’s because it is! In order to ensure that <cmath> finds the right <math.h> from the C library and not the C++ library’s own version it is necessary to use a non-standard GNU extension: #include_next. This allows the C++ library to selectively ignore its own <math.h> and include the C library’s version. The new <math.h> header in GCC 6 looks like:

#include <cmath>  // as above, but using #include_next <math.h>
// Re-declare the additional overloads in the global namespace:
using std::log;
using std::exp;
using std::pow;
// ...

This means that whether a program includes <cmath> or <math.h> it gets all the required overloads, and no masking macros.

An absolute mess

To complicate things further, C++ also requires different overloads of abs to be defined in two separate headers.

  • C defines the functions abs, labs, and llabs in <stdlib.h> to get the absolute value of an int, long, or long long respectively.
  • C defines the functions fabs, fabsf, and fabsl in <math.h>to get the absolute value of a double, float, or long double respectively.

C++ inherits all of these functions, but also overloads abs so it works for any of those types. The problem is that calling the overloaded abs(x) only works as intended if the right header has been included. If you only include <cmath> and then call std::abs(1) it might call abs(double) not abs(int), because the latter is only defined in <cstdlib>. This is being treated as a defect in the C++ standard (see LWG 2294 and LWG 2192) so at some point it should be possible to include either header and get all the overloads of abs for all the signed integer and floating-point types.

New standards bring new functions, and some old ones too

The next complication in our story arises because C99 added lots more functions to <math.h>, which were then also added to C++11 by rebasing the C++ library on the C99 library, rather than the original C89 library. C99 support is still not as universally available as C89 support, so when GCC is being compiled the GNU C++ library detects whether the C library supports C99. This means that the installed C++ headers will only add the new functions to namespace std when they are provided by the C library.

C99 supports a form of function overloading via “type-generic” macros that select different functions for different types, and this feature is used so that some functions like sin and cos work for all floating-point and complex types. C++ doesn’t support the C99 generic macros feature, so to allow sin and cos to work for different types a C++ library must #undef the generic macros and define overloaded functions (or function templates) instead. C defines those type-generic macros in <tgmath.h> but C++ requires that the additional overloads are added directly to <cmath> and <ccomplex> instead (and <ctgmath> just includes those two headers).

As well as the type-generic macros in <tgmath.h> C99 provides “classification macros” that work with any floating-point type: fpclassify, isfinite,isinf, isnan, isnormal, and signbit. In the log and sin examples discussed above, the C library defines them as functions and also type-generic macros, and C++ keeps those functions but replaces the macros with overloads. However, for the classification macros like isnan the C library may not define any actual functions at all, only macros. That means the C++ library must provide all the overloads, not just the float and long double ones. But it isn’t that simple, of course. Many C libraries do provide functions for some of the classifications macros. Specifically, the X/Open Portability Guide (which was later incorporated into the POSIX specification) defined int isnan(double) and int isinf(double) functions. Those functions were superseded by the C99 macros of the same name when POSIX was rebased on the C99 standard, but many C libraries still provide the obsolete X/Open functions for backwards compatibility. In C programs the obsolete isnan and isinf functions are hidden by the newer C99 classification macros, but the C++ <math.h> and <cmath> headers undefine those macros, unintentionally exposing the obsolete X/Open functions.

Prior to GCC 6 the <cmath> header declared bool std::isnan(double) and bool std::isinf(double) functions for C++11 and later, and the <math.h> header (which came from the C library) declared the X/Open int ::isnan(double) and int ::isinf(double) functions. If a program added a using namespace std; to the global namespace then the two sets of functions would clash, and the program would fail to compile (but only in C++11 mode!)

Sadly, this means that some applications wanting to use isnan had to weave a complex and fragile web of configure tests and preprocessor macros just to work out how to reliably call isnan for a given platform or C++ standard.

GCC 6 now detects when the obsolete functions are present in the C <math.h> header and uses them, avoiding the clash (but meaning that the function might return int instead of bool as C++11 requires):

namespace std
{
#if _GLIBCXX_HAVE_OBSOLETE_ISINF
  using ::isinf;  // int isinf(double)
#else
  bool isinf(double);
#endif
  bool isinf(float);
  bool isinf(long double);
}

In addition to fixing GCC 6 to handle the obsolete X/Open functions, the GNU C library has been patched so that those X/Open functions are not declared when <math.h> is included in a C++11 (or later) source file (this change will be in Glibc 2.23).

Light at the end of the tunnel

In summary, with GCC 6 all of the important differences between the <cxxx> and <xxx.h> headers are fixed, with both headers providing all the required overloads in the correct namespace. (Although depending on your C library, the return types of isinf(double) and isnan(double) might be wrong, and for now you still need to include the right header to get the right overload of abs!).

The changes in GCC 6 will cause some short-term pain, breaking some code that makes non-portable assumptions. In the long-term though it means a more conforming implementation of the C++ Standard Library, and a simpler, more predictable experience for C++ developers using functions from the C library.


Join Red Hat Developers, a developer program for you to learn, share, and code faster – and get access to Red Hat software for your development.  The developer program and software are both free!

 

  1. To add to the abs() confusion currently libc++ and libstdc++ treat abs() of an unsigned type differently, with libc++ treating it as ill-formed in some cases. While libstdc++ is doing the correct thing wrt standard, libc++ treatment probably catches a lot bugs. I found this odd edge case because clang/libc++ helped me catch a bug due to this difference and I was so fascinated I ended up writing a self-answer SO question to document what I found: http://stackoverflow.com/questions/29750946/is-stdabs0u-ill-formed

    Hopefully LWG 2192 is resolved soon.

    Like

  2. Nice! Explains at least some of the issues [compile and runtime] I’ve had over the years related to #include hell – sometimes fixed by re-ordering the #includes in code
    To make the transition to using rather than the old for system headers, an optional compiler ‘-W’ flag that generated a warning when user code includes the old header version [when a new was available] would be useful🙂 Obviously not a warning when a system header includes a …
    M

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s