Today we’ll explore some of the lower-level concepts in C++. These are tools that get brought out of the toolchest when performance really matters and interoperability is paramount. Read on to learn about C++’s escape hatches and take fine-grained control over memory!

Table of Contents

Alignof

Let’s start with a simple operator: alignof. We specify a type and it evaluates to a std::size_t indicating the type’s alignment requirement in terms of number of bytes:

struct EmptyStruct
{
};
 
struct Vector3
{
    float X;
    float Y;
    float Z;
};
 
// Examples on x64 macOS with Clang compiler
DebugLog(alignof(char)); // 1
DebugLog(alignof(int)); // 4
DebugLog(alignof(bool)); // 1
DebugLog(alignof(int*)); // 8
DebugLog(alignof(EmptyStruct)); // 1
DebugLog(alignof(Vector3)); // 4
DebugLog(alignof(int[100])); // 4

Because the alignment requirements of all types in C++ is known at compile time, the alignof operator is evaluated at compile time. This means the above is compiled to the same machine code as if we logged constants:

DebugLog(1);
DebugLog(4);
DebugLog(1);
DebugLog(8);
DebugLog(1);
DebugLog(4);
DebugLog(4);

Note that when using alignof with an array, like we did with int[100], we get the alignment of the array’s element type. That means we get 4 for int, not 8 for int* even though arrays and strings are very similar in C++.

Alignas

Next we have the alignas specifier. This is applied to classes, data members, and variables to control how they’re aligned. All kinds of variables are supported except bit fields, parameters, or variables in catch clauses. For example, say we wanted to align a struct to 16-byte boundaries:

// Use default alignment
struct Vector3
{
    float X;
    float Y;
    float Z;
};
 
// Change alignment to 16 bytes
struct alignas(16) AlignedVector3
{
    float X;
    float Y;
    float Z;
};
 
DebugLog(alignof(Vector3)); // 4
DebugLog(alignof(AlignedVector3)); // 16

We’re not allowed to reduce the alignment requirements because the resulting code wouldn’t work on the CPU. If we try, we’ll get a compiler error:

// Compiler error: requested alignment (1) is lower than the default (4)
struct alignas(1) AlignedVector3
{
    float X;
    float Y;
    float Z;
};

Similarly, invalid alignments also produce a compiler error. What’s valid depends on the CPU architecture being compiled for, but usually powers of two are required:

// Compiler error: requested alignment (3) is invalid
struct alignas(3) AlignedVector3
{
    float X;
    float Y;
    float Z;
};

Aligning to 0 is simply ignored:

// OK, but requested alignment (0) is ignored
struct alignas(0) AlignedVector3
{
    float X;
    float Y;
    float Z;
};
 
DebugLog(alignof(AlignedVector3)); // 4

As a shorthand, we can also use alignas(type). This is equivalent to alignas(alignof(type)) and it’s useful when we want the alignment to match another type’s alignment:

struct AlignedToDouble
{
    double Double;
 
    // Each data member has the same alignment as the double type
    alignas(double) float Float;
    alignas(double) uint16_t Short;
    alignas(double) uint8_t Byte;
};
 
// Struct is 32 bytes because of alignment requirements
DebugLog(sizeof(AlignedToDouble)); // 32
 
// Print distances between data members to see 8-byte alignment
AlignedToDouble atd;
DebugLog((char*)&atd.Float - (char*)&atd.Double); // 8
DebugLog((char*)&atd.Short - (char*)&atd.Double); // 16
DebugLog((char*)&atd.Byte - (char*)&atd.Double); // 24

It’s rare, but if we specify multiple alignas then the largest value is used:

struct Aligned
{
    // 16 is the largest, so it's used as the alignment
    alignas(4) alignas(8) alignas(16) int First = 123;
 
    alignas(16) int Second = 456;
};
 
DebugLog(sizeof(Aligned)); // 32
 
Aligned a;
DebugLog((char*)&a.Second - (char*)&a.First); // 16

This leads to the third form of alignas where we pass a template parameter pack instead of an integer or a type. In this case, it’s just like we specified one alignas per element of the parameter pack and therefore the largest value is chosen:

template<int... Alignments>
struct Aligned
{
    alignas(Alignments...) int First = 123;
    alignas(16) int Second = 456;
};
 
DebugLog(sizeof(Aligned<1, 2, 4, 8, 16>)); // 32
 
Aligned<1, 2, 4, 8, 16> a;
DebugLog((char*)&a.Second - (char*)&a.First); // 16
Assembly

C++ allows us to embed assembly code. This is called “inline assembly” and its meaning is highly-specific to the compiler and the CPU being compiled for. All that the C++ language standard says is that we write asm("source code") and the rest is left up to the compiler. For example, here’s some inline assembly that subtracts 5 from 20 on x86 as compiled by Clang on macOS:

int difference = 0;
asm(
    "movl $20, %%eax;" // Put 20 in the eax register
    "movl $5, %%ebx;" // Put 5 in the ebx register
    "subl %%ebx, %%eax ":"=a"(difference)); // difference = eax - ebx
DebugLog(difference); // 15

Also compiler-specific is how the assembly code interacts with the surrounding code. In this case, Clang allows us to write :"=a"(difference) to reference a the difference local variable as an output from inside the asm statement.

Each compiler will put its own constraints on inline assembly code. This includes whether the Intel or AT&T assembly syntax is used, how C++ code interacts with the inline assembly, and of course the supported CPU architecture instruction sets.

All of this inconsistency has lead to most uses of inline assembly being eschewed in favor of so-called “intrinsics.” These are functions that are replaced with a single CPU instruction. They are almost always named after that CPU instruction, take the parameters that the CPU instruction operates on, and evaluate to the result of the CPU instruction. There’s a lot of variance in just what this means, but it’s a lot simpler and more natural way to embed assembly in a C++ program:

// x86 SSE intrinsics
#include <xmmintrin.h>
 
// Component-wise addition of four floats in two arrays into a third array
void Add4(const float* a, const float* b, float* c)
{
    // Load a's four floats from memory into a 128-bit register
    __m128 reg1 = _mm_load_ps(a);
 
    // Load b's four floats from memory into a 128-bit register
    const auto reg2 = _mm_load_ps(b);
 
    // Add corresponding floats of a and b into the first 128-bit register
    reg1 = _mm_add_ps(reg1, reg2);
 
    // Store the result register into c's memory
    _mm_store_ps(c, reg1);
}
 
float a[] = { 1, 1, 1, 1 };
float b[] = { 1, 2, 3, 4 };
float c[] = { 9, 9, 9, 9 };
 
Add4(a, b, c);
 
DebugLog(a[0], a[1], a[2], a[3]); // 1, 1, 1, 1 (unmodified)
DebugLog(b[0], b[1], b[2], b[3]); // 1, 2, 3, 4 (unmodified)
DebugLog(c[0], c[1], c[2], c[3]); // 2, 3, 4, 5 (sum)

There are several advantages to this approach. Specific register names don’t need to be named as the compiler’s register allocator simply does its normal work. We’re allowed to use normal C++ conventions like parameters, return values, const variables, and even auto typing. Those variables are strongly-typed, meaning we get the compiler error-checking we’re used to:

// Compiler error: too many arguments
__m128 reg1 = _mm_load_ps(a, b);
 
// Compiler error: return value is __m128, not bool
bool reg2 = _mm_load_ps(b);
Language Linkage

When object files are linked together by the linker, it’s important that they follow the same conventions. Normally this isn’t a problem because we’re linking together object files compiled from source code in the same language (C++) that was compiled by the same version of the same compiler with the same compiler settings.

In other cases, we want to link together code that was compiled differently. One common scenario is to link together C++ and C code, such as when C++ code is using a C library or visa versa. In this case, the languages have different object file conventions that cause them to clash. Take, for example, the case of overloaded functions in C++. These aren’t supported in C, so C’s object files simply name the function the same as in the source code. C++ needs to disambiguate, so it “mangles” the names to make them unique. It does this even if there’s only one overload of the function:

////////////////////
// library.h (C++)
////////////////////
 
int Madd(int a, int b, int c);
 
////////////////////
// library.cpp (C++)
////////////////////
 
#include "library.h"
 
// Compiled into object file with name Maddiii_i
// Example name only. Actual name is effectively unpredictable.
int Madd(int a, int b, int c)
{
    return a*b + c;
}
 
////////////////////
// main.c (C)
////////////////////
 
#include "library.h"
 
void Foo()
{
    // OK: library.h declares a Madd that takes three ints and returns an int
    int result = Madd(2, 4, 6);
 
    // Print the result
    printf("%d\n", result);
}

Both library.cpp and main.c compile, but the linker that takes in library.o and main.o fails to link them together. The problem is that main.o is trying to find a function called Madd but there isn’t one. There’s a function called Maddiii_i, but that doesn’t count because only exact names are matched.

To solve this problem, C++ provides a way to tell the compiler that code should be compiled with the same language linkage rules as C:

////////////////////
// library.h (C++)
////////////////////
 
// Everything in this block should be compiled with C's linkage rules
extern "C"
{
    int Madd(int a, int b, int c);
}
 
////////////////////
// library.cpp (C++)
////////////////////
 
#include "library.h"
 
// Definitions need to match the language linkage of their declarations
extern "C"
{
    // Compiled into object file with name Madd
    // Not mangled into Maddiii_i
    int Madd(int a, int b, int c)
    {
        return a*b + c;
    }
}

Now that Madd doesn’t have its name mangled the linker can find it and produce a working executable.

Some special rules apply to code that’s been switched to C language linkage. First, class members always have C++ linkage regardless of whether C linkage is specified.

Second, because C doesn’t support function overloading, any functions with the same name are assumed to be the same function. This means we’ll typically get compiler errors for redefining the same function when we try to make an overload.

Third, and similarly, variables in different namespaces with the same name are assumed by the compiler to be the same variable. This is because C doesn’t support namespaces. We’ll typically get the same compiler errors for trying to redefine these variables.

Fourth, and again similarly, variables and functions can’t have the same name even if they’re in different namespaces. All of these rules stem from C’s requirement that everything has a unique name.

If only a single entity needs its language linkage changed, the curly braces can be omitted similar to how they’re optional for one-statement if blocks. This doesn’t, however, create a block scope as it does with other curly braces:

extern "C" int Madd(int a, int b, int c);
 
extern "C" int Madd(int a, int b, int c)
{
    return a*b + c;
}

The C++ language guarantees that two languages’ linkage are supported: C and C++. Compilers are free to implement support for more languages. The default language linkage is clearly C++, but it can be specified explicitly if so desired:

extern "C++" int Madd(int a, int b, int c);
 
extern "C++" int Madd(int a, int b, int c)
{
    return a*b + c;
}

Linkage rules can be nested. In this case, the innermost linkage is used:

// Change linkage to C
extern "C"
{
    // Change linkage back to C++
    extern "C++" int Madd(int a, int b, int c);
}
 
// OK: linkage is C++
extern "C++" int Madd(int a, int b, int c)
{
    return a*b + c;
}

Finally, a common convention is to use the preprocessor to check for __cplusplus which indicates whether the code is being compiled as C++. In response, C++ language linkage is used. This allows code to be compiled as either C++ as a library that can be linked with C code. The code can also be compiled directly as C, such as when a C++ compiler isn’t available. This approach requires the code to use only the subset that is legal for both languages:

////////////////////
// library.h
////////////////////
 
// If compiled as C++, this is defined
#ifdef __cplusplus
    // Make a macro called EXPORTED with the code to set C language linkage
    #define EXPORTED extern "C"
// If compiled as C (assumed since not C++)
#else
    // Make an empty macro called EXPORTED
    #define EXPORTED
#endif
 
// Add EXPORTED at the beginning
// For C++, this sets the language linkage to C
// For C, this does nothing
EXPORTED int Madd(int a, int b, int c);
 
////////////////////
// library.c
////////////////////
 
#include "library.h"
 
// Compiled into object file with name Madd regardless of language
EXPORTED int Madd(int a, int b, int c)
{
    return a*b + c;
}
Conclusion

In addition to the myriad low-level controls C++ gives us, these features provide us with even more control. We can query and set the alignment of various data types and variables to make optimal use of specific CPU architectures’ requirements to improve performance in a variety of ways. C# provides some control over struct field layout, but that’s a far more limited tool than alignas in C++.

One way to use this control over alignment is by writing inline assembly, another C++ feature, or by making use of CPU-specific intrinsics. These features combine together to provide precise control over what machine code actually gets executed and without the need to write entire programs in assembly. C# is beginning to offer intrinsics starting with x86 in .NET Core 3.0 and Unity’s Burst-specific intrinsics.

C++ also allows for a high level of compatibility with its predecessor: C. Despite having far more features, C++ code can be easily integrated with C code by setting the language linking mode and following a few special rules. This makes our C++ libraries available for usage in C and in environments that follow C’s linkage rules. There are quite a few of those, including language bindings for C#, Rust, Python, and JavaScript via Node.js. The same goes for C# with its P/Invoke system of language bindings that enables interoperability with the C linkage model.