JacksonDunstan.com

Today’s article continues the series by introducing C++’s build model, which is very different from C#. We’ll go into preprocessing, compiling, linking, header files, the one definition rule, and many other aspects of how our source code gets built into an executable.

Table of Contents

Compiling and Linking

With C#, we compile all our source code files (.cs) into an assembly such as an executable (.exe) or a library (.dll).

C# Build Model

With C++, we compile all our translation units (source code files with .cpp, .cxx, .cc, .C, or .c++) into object files (.obj or .o) and then link them together into an executable (app.exe or app), static library (.lib or .a), or dynamic library (.dll or .so).

C++ Build Model

If any of the source code files changed, we recompile them to generate a new object files and then run the linker with all the unchanged object files too.

This model brings up a couple of questions. First, what is an object file? This is known as an “intermediate” file since it’s neither the source code nor an output file like an executable. The C++ language standard doesn’t say anything about what the format of this file is. In practice, it’s a binary file that is specific to a particular version of a particular compiler configured with particular settings. If the compiler, version, or settings change, all the code needs to be rebuilt.

Second, what is the difference between a static library and a dynamic library? A dynamic library is very similar to a dynamic library in C#. It’s a library of machine code, just like an executable. However, it can be loaded and unloaded by an executable or other dynamic library at runtime. A static library, on the other hand, can only be loaded at compile time and can never be unloaded. In this way, it functions more like just another object file:

Building with a Static Library

Because static libraries are available at build time, the linker builds them directly into the resulting executable. This means there’s no need to distribute a separate dynamic library file to end users, no need to open it from the file system separately, and no possibility of overriding its location such as by setting the LD_LIBRARY_PATH environment variable.

Critically for performance, all calls into functions in the static library are just normal function calls. This means there’s no indirection through a pointer that is set at runtime when a dynamic library is loaded. It also means that the linker can perform “link time optimizations” such as inlining these functions.

The main downsides stem from needing the static libraries to be present at compile time. This makes them unsuitable for tasks such as loading user-created plugins. Perhaps most importantly for large projects, they must be linked in every build even if just one small source file was changed. Link times grow proportionally and can hinder rapid iteration. As a result, sometimes dynamic libraries will be used in development builds and static libraries will be used in release builds.

We won’t discuss the specifics of how to run the compiler and linker in this series. This is heavily dependent on the specific compiler, OS, and game engine being used. Usually game engines or console vendors will provide documentation for this. Also typical is to use an IDE like Microsoft Visual Studio or Xcode that provides a “project” abstraction for managing source code files, compiler settings, and so forth.

Header Files and the Preprocessor

In C#, we add using directives to reference code in other files. C++ has a similar “module” system added in C++20 which we’ll cover in a future article in this series. For now, we’ll pretend like that doesn’t exist and only discuss the way that C++ has traditionally been built.

Header files (.h, .hpp, .hxx, .hh, .H, .h++, or no extension) are by far the most common way for code in one file to reference code in another file. These are simply C++ source code files that are intended to be copy-and-pasted into another C++ source code file. The copy-and-paste operation is performed by the preprocessor.

Just like in C#, preprocessor directives like #if are evaluated before the main phase of compilation. There is no separate preprocessor executable that must be called to produce an intermediate file that the compiler receives. Preprocessing is simply an earlier step for the compiler.

C++ uses a preprocessor directive called #include to copy and paste a header file’s contents into another header file (.h) or a translation unit (.cpp). Here’s how it looks:

// math.h
int Add(int a, int b);
 
 
// math.cpp
#include "math.h"
int Add(int a, int b)
{
    return a + b;
}

The #include "math.h" tells the preprocessor to search the directory that math.cpp is in for a file named math.h. If it finds such a file, it reads its contents and replaces the #include directive with them. Otherwise, it searches the “include paths” it’s been configured with. The C++ Standard Library is implicitly searched. If math.h isn’t found in any of these locations, the compiler produces an error.

Afterward, math.cpp looks like this:

int Add(int a, int b);
int Add(int a, int b)
{
    return a + b;
}

Recall from last week’s article that the first Add is a function declaration and the second is a function definition. Since the signatures match, the compiler knows we’re defining the earlier declaration.

So far we’ve split the declaration and definition across two files, but without much benefit. Now let’s make this pay off by adding another translation unit:

// user.cpp
#include "math.h"
int AddThree(int a, int b, int c)
{
    return Add(a, Add(b, c));
}

This shows how user.cpp can add the same #include "math.h" to access the declaration of Add, resulting in this:

int Add(int a, int b);
int AddThree(int a, int b, int c)
{
    return Add(a, Add(b, c));
}

Now the compiler will encounter the declaration of Add and be OK with AddThree calling it even though there’s no definition of Add yet. It simply makes a note in the object file it outputs (user.obj) that Add is an unsatisfied dependency.

When the linker executes, it reads in user.obj and math.obj. math.obj contains the definition of Add and user.obj contains the definition of AddThree. At that point, the linker really needs the definition of Add, so it uses the one it found in math.obj.

There is an alternative version of #include that’s commonly seen:

#include <math.h>

This version is meant to search just for the C++ Standard Library and other header files that the compiler provides. For example, Microsoft Visual Studio allows #include <windows.h> to make Windows OS calls. This is useful to disambiguate file names that are both in the application’s codebase and provided by the compiler. Imagine this program:

#include "math.h"
bool IsNearlyZero(float val)
{
    return fabsf(val) < 0.000001f;
}

fabsf is a function in the C Standard Library to take the absolute value of a float. When the preprocessor runs with the quotes version of #include it finds our math.h, so we get this:

int Add(int a, int b);
 
bool IsNearlyZero(float val)
{
    return fabsf(val) < 0.000001f;
}

Then the compiler can’t find fabsf so it errors. Instead, we should use the angle brackets version of #include since we’re looking for the compiler-provided math.h:

#include <math.h>
bool IsNearlyZero(float val)
{
    return fabsf(val) < 0.000001f;
}

This produces what we wanted:

float fabsf(float arg);
// ...and many, many more math function declarations...
 
bool IsNearlyZero(float val)
{
    return fabsf(val) < 0.000001f;
}

Also note that we can specify paths in the #include that correspond to a directory structure:

#include "utils/math.h"
#include <nlohmann/json.hpp>

Finally, while it’s esoteric and usually best avoided, there is nothing stopping us from using #include to pull in non-header files. We can #include any file as long as the result is legal C++. Sometimes #include is even placed in the middle of a function to fill in part of its body!

ODR and Include Guards

C++ has what it calls the “one definition rule,” commonly abbreviated to ODR. This says that there may be only one definition of something in a translation unit. This includes variables and functions, which presents us some problems as our codebase grows. Imagine we’ve expanded our math library and added a vector math library on top of it:

// math.h
int Add(int a, int b);
float PI = 3.14f;
 
 
// vector.h
#include "math.h"
float Dot(float aX, float aY, float bX, float bY);
 
 
// user.cpp
#include "math.h"
#include "vector.h"
int AddThree(int a, int b, int c)
{
    return Add(a, Add(b, c));
}
bool IsOrthogonal(float aX, float aY, float bX, float bY)
{
    return Dot(aX, aY, bX, bY) == 0.0f;
}

Here we have vector.h using #include to pull in math.h. We also have user.cpp using #include to pull in both vector.h and math.h. This is a good practice since it avoids an implicit dependency on math.h that would break if vector.h was ever changed to remove the #include "math.h". Still, we’re about to see that this presents a problem. Let’s look at user.cpp after the preprocessor has replaced the #include "math.h" directive:

int Add(int a, int b);
float PI = 3.14f;
#include "vector.h"
int AddThree(int a, int b, int c)
{
    return Add(a, Add(b, c));
}
bool IsOrthogonal(float aX, float aY, float bX, float bY)
{
    return Dot(aX, aY, bX, bY) == 0.0f;
}

Now the compiler replaces the #include "vector.h":

int Add(int a, int b);
float PI = 3.14f;
#include "math.h"
float Dot(float aX, float aY, float bX, float bY);
int AddThree(int a, int b, int c)
{
    return Add(a, Add(b, c));
}
bool IsOrthogonal(float aX, float aY, float bX, float bY)
{
    return Dot(aX, aY, bX, bY) == 0.0f;
}

Finally, it replaces the #include "math.h" from the contents of vector.h that it copied in:

int Add(int a, int b);
float PI = 3.14f;
int Add(int a, int b);
float PI = 3.14f;
float Dot(float aX, float aY, float bX, float bY);
int AddThree(int a, int b, int c)
{
    return Add(a, Add(b, c));
}
bool IsOrthogonal(float aX, float aY, float bX, float bY)
{
    return Dot(aX, aY, bX, bY) == 0.0f;
}

Multiple declarations of the Add function are OK because they’re not definitions so they don’t violate the ODR. The compiler simply ignores the duplicate declarations.

The definition of PI, on the other hand, is most certainly a definition. Having two definitions of the same variable name violates the ODR and we get a compiler error.

To work around this, we add what’s called an “include guard” to our header files. There are two basic forms this can take, but both make use of the preprocessor. Here’s the first form in math.h:

#if (!defined MATH_H)
#define MATH_H
 
int Add(int a, int b);
float PI = 3.14f;
 
#endif

This makes use of the #if, #define, and #endif directives, which are similar to their C# counterparts. The only real difference in this case is the use of !defined MATH_H in C++ instead of just !MATH_H in C#.

One variant of this is to make use of a C++-only #ifndef MATH_H as a sort of shorthand for #if (!defined MATH_H):

#ifndef MATH_H
#define MATH_H
 
int Add(int a, int b);
float PI = 3.14f;
 
#endif

In either case, we choose a naming convention and apply our file name to it to generate a unique identifier for the file. There are many popular forms for this including these:

math_h
MATH_H
MATH_H_
MYGAME_MATH_H

To avoid needing to come up with unique names, all common compilers offer the non-standard #pragma once directive:

#pragma once
 
int Add(int a, int b);
float PI = 3.14f;

Regardless of the form chosen, let’s look at how this helps avoid the ODR violation. Here’s how user.cpp looks after all the #include directives are resolved: (indentation added for clarity)

#ifndef MATH_H
#define MATH_H
 
    int Add(int a, int b);
    float PI = 3.14f;
 
#endif
 
#ifndef VECTOR_H
#define VECTOR_H
 
    #ifndef MATH_H
    #define MATH_H
 
        int Add(int a, int b);
        float PI = 3.14f;
 
    #endif
 
    float Dot(float aX, float aY, float bX, float bY);
 
#endif
 
int AddThree(int a, int b, int c)
{
    return Add(a, Add(b, c));
}
bool IsOrthogonal(float aX, float aY, float bX, float bY)
{
    return Dot(aX, aY, bX, bY) == 0.0f;
}

On the first line (#ifndef MATH_H), the preprocessor finds that MATH_H isn’t defined so it keeps all the code until the #endif. That includes a #define MATH_H, so now it’s defined.

Likewise, the #ifndef VECTOR_H succeeds and allows VECTOR_H to be defined. The nested #ifndef MATH_H, however, fails because MATH_H is now defined. Everything until the matching #endif is stripped out.

In the end, we have this result:

int Add(int a, int b);
float PI = 3.14f;
 
float Dot(float aX, float aY, float bX, float bY);
 
int AddThree(int a, int b, int c)
{
    return Add(a, Add(b, c));
}
bool IsOrthogonal(float aX, float aY, float bX, float bY)
{
    return Dot(aX, aY, bX, bY) == 0.0f;
}

The duplicate definition of PI has been effectively removed from the translation unit by the include guard, so we no longer get a compiler error for the ODR violation.

Inline

Even with the ODR compiler error fixed, we still have a problem: a linker error. The reason for this is that the vector.cpp translation unit also contains a copy of PI. Here’s how it looks originally:

#include "vector.h"
 
float Dot(float aX, float aY, float bX, float bY)
{
    return Add(aX*bX, aY+bY);
}

Here it is after the preprocessor resolves the #include directives:

#ifndef VECTOR_H
#define VECTOR_H
 
    #ifndef MATH_H
    #define MATH_H
 
        int Add(int a, int b);
        float PI = 3.14f;
 
    #endif
 
    float Dot(float aX, float aY, float bX, float bY);
 
#endif
 
float Dot(float aX, float aY, float bX, float bY)
{
    return Add(aX*bX, aY+bY);
}

Remember that each translation unit is compiled separately. In this translation unit, MATH_H and VECTOR_H have not been set with #define as they were in the user.cpp translation unit. So both of the include guards succeed and we get this:

int Add(int a, int b);
float PI = 3.14f;
 
float Dot(float aX, float aY, float bX, float bY);
 
float Dot(float aX, float aY, float bX, float bY)
{
    return Add(aX*bX, aY+bY);
}

That’s great for the purposes of compiling this translation unit since there are no duplicate definitions to violate the ODR. Compilation will succeed, but linking will fail.

The reason for the linker error is that, by default, we can’t have duplicate definitions of PI at link time either. If we want to do that, we need to add the inline keyword to PI to tell the compiler that multiple definitions should be allowed. That’ll result in these translation units:

// user.cpp
int Add(int a, int b);
inline float PI = 3.14f;
 
float Dot(float aX, float aY, float bX, float bY);
 
int AddThree(int a, int b, int c)
{
    return Add(a, Add(b, c));
}
bool IsOrthogonal(float aX, float aY, float bX, float bY)
{
    return Dot(aX, aY, bX, bY) == 0.0f;
}
 
 
// vector.cpp
int Add(int a, int b);
inline float PI = 3.14f;
 
float Dot(float aX, float aY, float bX, float bY);
 
float Dot(float aX, float aY, float bX, float bY)
{
    return Add(aX*bX, aY+bY);
}

It may seem strange that inline is a keyword applied to variables. The historical reason for this is that it was originally a hint to the compiler that it should inline functions but, like the register keyword, this was non-binding and virtually always ignored. It’s come to mean “multiple definitions are allowed” instead, so it can now be applied to both variables and functions.

For example, we could add a function definition to math.h as long as it’s inline:

inline int Sub(int a, int b)
{
    return a - b;
}

This is often avoided though because any change to the function will require recompiling all of the translation units that include it, directly or indirectly, which may take quite a while in a big codebase.

Linkage

Finally for today, C++ has the concept of “linkage.” By default, variables like PI have external linkage. This means it can be referenced by other translation units. For example, say we added a variable to math.cpp:

float SQRT2 = 1.4f;

Now say we want to reference it from user.cpp. The #include "math.h" won’t work because SQRT2 is in math.cpp, not math.h. We can still reference it using the extern keyword:

extern float SQRT2;
 
float GetDiagonalOfSquare(float widthOrHeight)
{
    return SQRT2 * widthOrHeight;
}

This is similar to a function declaration in that we’re telling the compiler to trust us and pretend a float exists with the name SQRT2. So when it compiles user.cpp it makes a note in the user.obj object file that we haven’t yet satisfied the dependency for SQRT2. When the compiler compiles math.cpp, it makes a note that there is a float named SQRT2 available for linking.

Later on, the linker runs and reads in user.obj as well as all the other object files including math.obj. While processing user.obj, it reads that note from the compiler saying that the definition of SQRT2 is missing and it goes looking through the other object files to find it. Lo and behold, it finds a note in math.obj saying that there’s a float named SQRT2 so the linker makes GetDiagonalOfSquare refer to that variable.

Quick note: the extern keyword can also be applied in math.cpp, but this has no effect since external linkage is the default. Still, here’s how it’d look:

extern float SQRT2 = 1.4f;

One way to prevent this behavior is to add the static keyword to SQRT2. This changes the linkage to “internal” and prevents the compiler from adding that note to math.obj to say that a float variable named SQRT2 is available for linking.

static float SQRT2 = 1.4f;

Now if we try to link user.obj and math.obj, the linker can’t find any available definition of SQRT2 in any of the object files so it produces an error.

Both extern and static can be used with functions, too. For example:

// math.cpp
int Sub(int a, int b)
{
    return a - b;
}
static int Mul(int a, int b)
{
    return a * b;
}
 
 
// user.cpp
extern int Sub(int a, int b);
 
int SubThree(int a, int b, int c)
{
    return Sub(Sub(a, b), c);
}
 
extern int Mul(int a, int b); // compiler error: Mul is `static`

Conclusion

Today we’ve seen C++’s very different approach to building source code. The “compile then link” approach combined with header files has domino effects into the ODR, linkage, and include guards. We’ll go into C++20’s module system that solves a lot of these problems and results in a much more C#-like build model later on in the series, but header files will still be very relevant even with modules. There’s also a lot more detail to go into with respect to the ODR and linkage, but we’ll cover that incrementally as we introduce more language concepts like templates and thread-local variables.

#1 by Jonathan Pace on June 17th, 2020 · Reply

“If math.cpp isnâ€™t found in any of these locations, the compiler produces an error.”
I think you meant ‘If math.h isn’t found…’

#2 by jackson on June 17th, 2020 · Reply

Thanks for pointing this out. I’ve fixed the typo.

#3 by M. S. Farzan on October 7th, 2020 · Reply

One of the best explanations of header guards that I’ve seen. Thank you!

#4 by typoman on March 24th, 2021 · Reply

typo: “porportionally”

#5 by jackson on March 26th, 2021 · Reply

Thanks, Typoman!

#6 by Rick on May 14th, 2021 · Reply

Misspelling: The correct spelling is “Lo and behold…” (not “Low…”)
https://getproofed.com/writing-tips/idiom-tips-lo-and-behold-or-low-and-behold/

#7 by jackson on May 14th, 2021 · Reply

Thanks for letting me know. I’ve updated the article to fix the typo.

#8 by typoman's squire on November 4th, 2022 · Reply

‘…and no possibily of overriding’ you prob meant possibility

#9 by jackson on November 4th, 2022 · Reply

Fixed. Thanks!

#10 by Chipboard on December 28th, 2022 · Reply

I just wanted to say thank you for the depth you have gone to properly cover and teach all of these complex C++ topics. I really appreciate this course, and would like to mention that as a C# developer, this is the best course I’ve found so far. Well done, and much appreciated! Seriously! I cannot express the gratitude I have enough!

C++ For C# Developers: Part 5 – Build Model