C++ For C# Developers: Part 33 – Alignment, Assembly, and Language Linkage
Today we’ll explore some of the lower-level concepts in C++. These are tools that get brought out of the toolchest when performance really matters and interoperability is paramount. Read on to learn about C++’s escape hatches and take fine-grained control over memory!
Table of Contents
- Part 1: Introduction
- Part 2: Primitive Types and Literals
- Part 3: Variables and Initialization
- Part 4: Functions
- Part 5: Build Model
- Part 6: Control Flow
- Part 7: Pointers, Arrays, and Strings
- Part 8: References
- Part 9: Enumerations
- Part 10: Struct Basics
- Part 11: Struct Functions
- Part 12: Constructors and Destructors
- Part 13: Initialization
- Part 14: Inheritance
- Part 15: Struct and Class Permissions
- Part 16: Struct and Class Wrap-up
- Part 17: Namespaces
- Part 18: Exceptions
- Part 19: Dynamic Allocation
- Part 20: Implicit Type Conversion
- Part 21: Casting and RTTI
- Part 22: Lambdas
- Part 23: Compile-Time Programming
- Part 24: Preprocessor
- Part 25: Intro to Templates
- Part 26: Template Parameters
- Part 27: Template Deduction and Specialization
- Part 28: Variadic Templates
- Part 29: Template Constraints
- Part 30: Type Aliases
- Part 31: Deconstructing and Attributes
- Part 32: Thread-Local Storage and Volatile
- Part 33: Alignment, Assembly, and Language Linkage
- Part 34: Fold Expressions and Elaborated Type Specifiers
- Part 35: Modules, The New Build Model
- Part 36: Coroutines
- Part 37: Missing Language Features
- Part 38: C Standard Library
- Part 39: Language Support Library
- Part 40: Utilities Library
- Part 41: System Integration Library
- Part 42: Numbers Library
- Part 43: Threading Library
- Part 44: Strings Library
- Part 45: Array Containers Library
- Part 46: Other Containers Library
- Part 47: Containers Library Wrapup
- Part 48: Algorithms Library
- Part 49: Ranges and Parallel Algorithms
- Part 50: I/O Library
- Part 51: Missing Library Features
- Part 52: Idioms and Best Practices
- Part 53: Conclusion
Alignof
Let’s start with a simple operator: alignof
. We specify a type and it evaluates to a std::size_t
indicating the type’s alignment requirement in terms of number of bytes:
struct EmptyStruct { }; struct Vector3 { float X; float Y; float Z; }; // Examples on x64 macOS with Clang compiler DebugLog(alignof(char)); // 1 DebugLog(alignof(int)); // 4 DebugLog(alignof(bool)); // 1 DebugLog(alignof(int*)); // 8 DebugLog(alignof(EmptyStruct)); // 1 DebugLog(alignof(Vector3)); // 4 DebugLog(alignof(int[100])); // 4
Because the alignment requirements of all types in C++ is known at compile time, the alignof
operator is evaluated at compile time. This means the above is compiled to the same machine code as if we logged constants:
DebugLog(1); DebugLog(4); DebugLog(1); DebugLog(8); DebugLog(1); DebugLog(4); DebugLog(4);
Note that when using alignof
with an array, like we did with int[100]
, we get the alignment of the array’s element type. That means we get 4
for int
, not 8
for int*
even though arrays and strings are very similar in C++.
Alignas
Next we have the alignas
specifier. This is applied to classes, data members, and variables to control how they’re aligned. All kinds of variables are supported except bit fields, parameters, or variables in catch clauses. For example, say we wanted to align a struct to 16-byte boundaries:
// Use default alignment struct Vector3 { float X; float Y; float Z; }; // Change alignment to 16 bytes struct alignas(16) AlignedVector3 { float X; float Y; float Z; }; DebugLog(alignof(Vector3)); // 4 DebugLog(alignof(AlignedVector3)); // 16
We’re not allowed to reduce the alignment requirements because the resulting code wouldn’t work on the CPU. If we try, we’ll get a compiler error:
// Compiler error: requested alignment (1) is lower than the default (4) struct alignas(1) AlignedVector3 { float X; float Y; float Z; };
Similarly, invalid alignments also produce a compiler error. What’s valid depends on the CPU architecture being compiled for, but usually powers of two are required:
// Compiler error: requested alignment (3) is invalid struct alignas(3) AlignedVector3 { float X; float Y; float Z; };
Aligning to 0
is simply ignored:
// OK, but requested alignment (0) is ignored struct alignas(0) AlignedVector3 { float X; float Y; float Z; }; DebugLog(alignof(AlignedVector3)); // 4
As a shorthand, we can also use alignas(type)
. This is equivalent to alignas(alignof(type))
and it’s useful when we want the alignment to match another type’s alignment:
struct AlignedToDouble { double Double; // Each data member has the same alignment as the double type alignas(double) float Float; alignas(double) uint16_t Short; alignas(double) uint8_t Byte; }; // Struct is 32 bytes because of alignment requirements DebugLog(sizeof(AlignedToDouble)); // 32 // Print distances between data members to see 8-byte alignment AlignedToDouble atd; DebugLog((char*)&atd.Float - (char*)&atd.Double); // 8 DebugLog((char*)&atd.Short - (char*)&atd.Double); // 16 DebugLog((char*)&atd.Byte - (char*)&atd.Double); // 24
It’s rare, but if we specify multiple alignas
then the largest value is used:
struct Aligned { // 16 is the largest, so it's used as the alignment alignas(4) alignas(8) alignas(16) int First = 123; alignas(16) int Second = 456; }; DebugLog(sizeof(Aligned)); // 32 Aligned a; DebugLog((char*)&a.Second - (char*)&a.First); // 16
This leads to the third form of alignas
where we pass a template parameter pack instead of an integer or a type. In this case, it’s just like we specified one alignas
per element of the parameter pack and therefore the largest value is chosen:
template<int... Alignments> struct Aligned { alignas(Alignments...) int First = 123; alignas(16) int Second = 456; }; DebugLog(sizeof(Aligned<1, 2, 4, 8, 16>)); // 32 Aligned<1, 2, 4, 8, 16> a; DebugLog((char*)&a.Second - (char*)&a.First); // 16
Assembly
C++ allows us to embed assembly code. This is called “inline assembly” and its meaning is highly-specific to the compiler and the CPU being compiled for. All that the C++ language standard says is that we write asm("source code")
and the rest is left up to the compiler. For example, here’s some inline assembly that subtracts 5 from 20 on x86 as compiled by Clang on macOS:
int difference = 0; asm( "movl $20, %%eax;" // Put 20 in the eax register "movl $5, %%ebx;" // Put 5 in the ebx register "subl %%ebx, %%eax ":"=a"(difference)); // difference = eax - ebx DebugLog(difference); // 15
Also compiler-specific is how the assembly code interacts with the surrounding code. In this case, Clang allows us to write :"=a"(difference)
to reference a the difference
local variable as an output from inside the asm
statement.
Each compiler will put its own constraints on inline assembly code. This includes whether the Intel or AT&T assembly syntax is used, how C++ code interacts with the inline assembly, and of course the supported CPU architecture instruction sets.
All of this inconsistency has lead to most uses of inline assembly being eschewed in favor of so-called “intrinsics.” These are functions that are replaced with a single CPU instruction. They are almost always named after that CPU instruction, take the parameters that the CPU instruction operates on, and evaluate to the result of the CPU instruction. There’s a lot of variance in just what this means, but it’s a lot simpler and more natural way to embed assembly in a C++ program:
// x86 SSE intrinsics #include <xmmintrin.h> // Component-wise addition of four floats in two arrays into a third array void Add4(const float* a, const float* b, float* c) { // Load a's four floats from memory into a 128-bit register __m128 reg1 = _mm_load_ps(a); // Load b's four floats from memory into a 128-bit register const auto reg2 = _mm_load_ps(b); // Add corresponding floats of a and b into the first 128-bit register reg1 = _mm_add_ps(reg1, reg2); // Store the result register into c's memory _mm_store_ps(c, reg1); } float a[] = { 1, 1, 1, 1 }; float b[] = { 1, 2, 3, 4 }; float c[] = { 9, 9, 9, 9 }; Add4(a, b, c); DebugLog(a[0], a[1], a[2], a[3]); // 1, 1, 1, 1 (unmodified) DebugLog(b[0], b[1], b[2], b[3]); // 1, 2, 3, 4 (unmodified) DebugLog(c[0], c[1], c[2], c[3]); // 2, 3, 4, 5 (sum)
There are several advantages to this approach. Specific register names don’t need to be named as the compiler’s register allocator simply does its normal work. We’re allowed to use normal C++ conventions like parameters, return values, const
variables, and even auto
typing. Those variables are strongly-typed, meaning we get the compiler error-checking we’re used to:
// Compiler error: too many arguments __m128 reg1 = _mm_load_ps(a, b); // Compiler error: return value is __m128, not bool bool reg2 = _mm_load_ps(b);
Language Linkage
When object files are linked together by the linker, it’s important that they follow the same conventions. Normally this isn’t a problem because we’re linking together object files compiled from source code in the same language (C++) that was compiled by the same version of the same compiler with the same compiler settings.
In other cases, we want to link together code that was compiled differently. One common scenario is to link together C++ and C code, such as when C++ code is using a C library or visa versa. In this case, the languages have different object file conventions that cause them to clash. Take, for example, the case of overloaded functions in C++. These aren’t supported in C, so C’s object files simply name the function the same as in the source code. C++ needs to disambiguate, so it “mangles” the names to make them unique. It does this even if there’s only one overload of the function:
//////////////////// // library.h (C++) //////////////////// int Madd(int a, int b, int c); //////////////////// // library.cpp (C++) //////////////////// #include "library.h" // Compiled into object file with name Maddiii_i // Example name only. Actual name is effectively unpredictable. int Madd(int a, int b, int c) { return a*b + c; } //////////////////// // main.c (C) //////////////////// #include "library.h" void Foo() { // OK: library.h declares a Madd that takes three ints and returns an int int result = Madd(2, 4, 6); // Print the result printf("%d\n", result); }
Both library.cpp
and main.c
compile, but the linker that takes in library.o
and main.o
fails to link them together. The problem is that main.o
is trying to find a function called Madd
but there isn’t one. There’s a function called Maddiii_i
, but that doesn’t count because only exact names are matched.
To solve this problem, C++ provides a way to tell the compiler that code should be compiled with the same language linkage rules as C:
//////////////////// // library.h (C++) //////////////////// // Everything in this block should be compiled with C's linkage rules extern "C" { int Madd(int a, int b, int c); } //////////////////// // library.cpp (C++) //////////////////// #include "library.h" // Definitions need to match the language linkage of their declarations extern "C" { // Compiled into object file with name Madd // Not mangled into Maddiii_i int Madd(int a, int b, int c) { return a*b + c; } }
Now that Madd
doesn’t have its name mangled the linker can find it and produce a working executable.
Some special rules apply to code that’s been switched to C language linkage. First, class members always have C++ linkage regardless of whether C linkage is specified.
Second, because C doesn’t support function overloading, any functions with the same name are assumed to be the same function. This means we’ll typically get compiler errors for redefining the same function when we try to make an overload.
Third, and similarly, variables in different namespaces with the same name are assumed by the compiler to be the same variable. This is because C doesn’t support namespaces. We’ll typically get the same compiler errors for trying to redefine these variables.
Fourth, and again similarly, variables and functions can’t have the same name even if they’re in different namespaces. All of these rules stem from C’s requirement that everything has a unique name.
If only a single entity needs its language linkage changed, the curly braces can be omitted similar to how they’re optional for one-statement if
blocks. This doesn’t, however, create a block scope as it does with other curly braces:
extern "C" int Madd(int a, int b, int c); extern "C" int Madd(int a, int b, int c) { return a*b + c; }
The C++ language guarantees that two languages’ linkage are supported: C and C++. Compilers are free to implement support for more languages. The default language linkage is clearly C++, but it can be specified explicitly if so desired:
extern "C++" int Madd(int a, int b, int c); extern "C++" int Madd(int a, int b, int c) { return a*b + c; }
Linkage rules can be nested. In this case, the innermost linkage is used:
// Change linkage to C extern "C" { // Change linkage back to C++ extern "C++" int Madd(int a, int b, int c); } // OK: linkage is C++ extern "C++" int Madd(int a, int b, int c) { return a*b + c; }
Finally, a common convention is to use the preprocessor to check for __cplusplus
which indicates whether the code is being compiled as C++. In response, C++ language linkage is used. This allows code to be compiled as either C++ as a library that can be linked with C code. The code can also be compiled directly as C, such as when a C++ compiler isn’t available. This approach requires the code to use only the subset that is legal for both languages:
//////////////////// // library.h //////////////////// // If compiled as C++, this is defined #ifdef __cplusplus // Make a macro called EXPORTED with the code to set C language linkage #define EXPORTED extern "C" // If compiled as C (assumed since not C++) #else // Make an empty macro called EXPORTED #define EXPORTED #endif // Add EXPORTED at the beginning // For C++, this sets the language linkage to C // For C, this does nothing EXPORTED int Madd(int a, int b, int c); //////////////////// // library.c //////////////////// #include "library.h" // Compiled into object file with name Madd regardless of language EXPORTED int Madd(int a, int b, int c) { return a*b + c; }
Conclusion
In addition to the myriad low-level controls C++ gives us, these features provide us with even more control. We can query and set the alignment of various data types and variables to make optimal use of specific CPU architectures’ requirements to improve performance in a variety of ways. C# provides some control over struct field layout, but that’s a far more limited tool than alignas
in C++.
One way to use this control over alignment is by writing inline assembly, another C++ feature, or by making use of CPU-specific intrinsics. These features combine together to provide precise control over what machine code actually gets executed and without the need to write entire programs in assembly. C# is beginning to offer intrinsics starting with x86 in .NET Core 3.0 and Unity’s Burst-specific intrinsics.
C++ also allows for a high level of compatibility with its predecessor: C. Despite having far more features, C++ code can be easily integrated with C code by setting the language linking mode and following a few special rules. This makes our C++ libraries available for usage in C and in environments that follow C’s linkage rules. There are quite a few of those, including language bindings for C#, Rust, Python, and JavaScript via Node.js. The same goes for C# with its P/Invoke system of language bindings that enables interoperability with the C linkage model.