C++ Scripting: Part 28 – Value Types Overhaul
Value types like int
, structs, and enums seem simple, but much of what we think we know about them just isn’t true. This article explores how value types actually work in C# and uses that knowledge to improve how they’re implemented in the C++ scripting system.
Table of Contents
- Part 1: C#/C++ Communication
- Part 2: Update C++ Without Restarting the Editor
- Part 3: Object-Oriented Bindings
- Part 4: Performance Validation
- Part 5: Bindings Code Generator
- Part 6: Building the C++ Plugin
- Part 7: MonoBehaviour Messages
- Part 8: Platform-Dependent Compilation
- Part 9: Out and Ref Parameters
- Part 10: Full Generics Support
- Part 11: Collaborators, Structs, and Enums
- Part 12: Exceptions
- Part 13: Operator Overloading, Indexers, and Type Conversion
- Part 14: Arrays
- Part 15: Delegates
- Part 16: Events
- Part 17: Boxing and Unboxing
- Part 18: Array Index Operator
- Part 19: Implement C# Interfaces with C++ Classes
- Part 20: Performance Improvements
- Part 21: Implement C# Properties and Indexers in C++
- Part 22: Full Base Type Support
- Part 23: Base Type APIs
- Part 24: Default Parameters
- Part 25: Full Type Hierarchy
- Part 26: Hot Reloading
- Part 27: Foreach Loops
- Part 28: Value Types Overhaul
- Part 29: Factory Functions and New MonoBehaviours
- Part 30: Overloaded Types and Decimal
Let’s start off with some misconceptions about how value types work. First up, one common phrase programmers say about C# is “everything extends from Object.” C# goes a long way to make it seem as though this is true. Even Microsoft’s MSDN documention says as much when it describes ValueType
like so:
Provides the base class for value types.
It also shows this under “Inheritance Hierarchy”:
System.Object   System.ValueType     System.Enum
Likewise, the documentation for Enum
says this:
Provides the base class for enumerations.
Then again their documentation on value types says this:
All value types are derived implicitly from the System.ValueType.
And this:
Unlike with reference types, you cannot derive a new type from a value type. However, like reference types, structs can implement interfaces.
Adding weight to the interfaces claim, the documentation for Int32
(a.k.a. int
) shows all the interfaces it supposedly implements:
public struct Int32 : IComparable, IFormattable, IConvertible, IComparable<int>, IEquatable<int>
To summarize, the official documentation has told us that value types including primitives, structs, and enums ultimately derive from Object
and implement interfaces. Unfortunately, none of that is true.
It would be much more accurate to say that value types can be boxed to a class that ultimately derives from Object
and can implement interfaces. Let’s see this in action by treating a value type like an Object
:
static class Test { static void Foo() { ReturnHashCode(123); } static int ReturnHashCode(object o) { return o.GetHashCode(); } }
ReturnHashCode
takes an Object
, but 123
is a primitive of type System.Int32
(a.k.a. int
). If it were really true that all value types derived from ValueType
which derives from Object
, there’d be no need to do anything here. Since that’s not true, 123
gets boxed into an instance of a class that does actually derive from ValueType
and Object
. We can see this clearly when looking at the IL that this C# gets compiled to:
.method private hidebysig static void Foo() cil managed { .maxstack 8 IL_0000: nop IL_0001: ldc.i4.s 123 IL_0003: box [mscorlib]System.Int32 IL_0008: call int32 Test::ReturnHashCode(object) IL_000d: pop IL_000e: ret } .method private hidebysig static int32 ReturnHashCode(object o) cil managed { .maxstack 1 .locals init (int32 V_0) IL_0000: nop IL_0001: ldarg.0 IL_0002: callvirt instance int32 [mscorlib]System.Object::GetHashCode() IL_0007: stloc.0 IL_0008: br.s IL_000a IL_000a: ldloc.0 IL_000b: ret }
The key line is this one in Foo
where the Int32
is boxed:
IL_0003: box [mscorlib]System.Int32
This line is also a hint:
IL_0002: callvirt instance int32 [mscorlib]System.Object::GetHashCode()
An Int32
, by definition, is just 32 bits. For it to support virtual functions such as GetHashCode
, it would need a virtual method table which would add size beyond the integer value. It needs to grow in size by the process of boxing to support virtual functions.
This also means that structs can’t implement interfaces, despite what the documentation says. Methods, properties, events, and indexers in interfaces are all implicitly types of virtual functions. Interface types are also reference types, again by definition. Let’s look at another example to show this:
static class Test { static void Goo() { Compare(123, 456); } static int Compare(IComparable a, IComparable b) { return a.CompareTo(b); } }
Compare
takes two IComparable
parameters. Since the documentation says that Int32
implements IComparable
, there shouldn’t be anything to do. Let’s look at the IL to see what actually happens.
.method private hidebysig static void Goo() cil managed { .maxstack 8 IL_0000: nop IL_0001: ldc.i4.s 123 IL_0003: box [mscorlib]System.Int32 IL_0008: ldc.i4 0x1c8 IL_000d: box [mscorlib]System.Int32 IL_0012: call int32 Test::Compare(class [mscorlib]System.IComparable, class [mscorlib]System.IComparable) IL_0017: pop IL_0018: ret } .method private hidebysig static int32 Compare( class [mscorlib]System.IComparable a, class [mscorlib]System.IComparable b) cil managed { .maxstack 2 .locals init (int32 V_0) IL_0000: nop IL_0001: ldarg.0 IL_0002: ldarg.1 IL_0003: callvirt instance int32 [mscorlib]System.IComparable::CompareTo(object) IL_0008: stloc.0 IL_0009: br.s IL_000b IL_000b: ldloc.0 IL_000c: ret }
Again we see the Int32
parameters get boxed:
IL_0001: ldc.i4.s 123 IL_0003: box [mscorlib]System.Int32 IL_0008: ldc.i4 0x1c8 IL_000d: box [mscorlib]System.Int32
Then we see the virtual function call for the interface method:
IL_0003: callvirt instance int32 [mscorlib]System.IComparable::CompareTo(object)
So how does boxing solve these problem of value types not actually deriving from Object
and not actually implementing interfaces? Well the boxed type is free to do that. While the actual name and details of the class our value types are boxed to is hidden from us, we can imagine a class like this:
class BoxedInt32 : ValueType , IComparable , IFormattable , IConvertible , IComparable<int> , IEquatable<int> { private Int32 Value; // Boxing calls this to create an instance of the boxed type class public BoxedInt32(int value) { Value = value; } // Implement this to satisfy IComparable public int CompareTo(object other) { if (!(other is Int32)) { throw new ArgumentException("Object must be of type Int32"); } return Value.CompareTo((Int32)other); } // ... methods implementing the other interfaces }
This class fulfills all the requirements. It’s a class, so it’s a reference type. It extends from ValueType
, so it ultimately derives from Object
. It implements all the interfaces that an Int32
supposedly does. It also holds the actual value type as an Int32
field. So this class is like a reference type version of a what the documentation says an Int32
is.
One final detail: notice how the boxed type doesn’t do much work of its own. Instead, its interface functions are implemented by calling the actual function with the same name on the value type. This is possible because non-virtual methods don’t require a virtual method table and therefore don’t add any size to the value type. To show this in action, let’s make another little example:
static class Test { struct StructWithInterface : IComparable { public int CompareTo(object o) { Debug.Log("StructWithInterface.CompareTo(object) called!"); return 0; } } static void Bar() { StructWithInterface swi = new StructWithInterface(); Compare(swi, 123); } // ... same Compare() as before }
Here we have a struct that supposedly implements an interface. Let’s look at the IL to see what happens:
.method private hidebysig static void Bar() cil managed { .maxstack 2 .locals init (valuetype Test/StructWithInterface V_0) IL_0000: nop IL_0001: ldloca.s V_0 IL_0003: initobj Test/StructWithInterface IL_0009: ldloc.0 IL_000a: box Test/StructWithInterface IL_000f: ldc.i4.s 123 IL_0011: box [mscorlib]System.Int32 IL_0016: call int32 Test::Compare(class [mscorlib]System.IComparable, class [mscorlib]System.IComparable) IL_001b: pop IL_001c: ret }
First we see the boxing of both the StructWithInterface
struct and the Int32
primitive:
IL_000a: box Test/StructWithInterface IL_000f: ldc.i4.s 123 IL_0011: box [mscorlib]System.Int32
Then we run the code and see the debug log message get printed:
StructWithInterface.CompareTo(object) called!
This means that whatever class StructWithInterface
got boxed to, its CompareTo
was implemented by calling our CompareTo
in StructWithInterface
.
So how does this relate to the C++ scripting system? Well, since part 11 we’ve strived to implement all three forms of value types: primitives, enums, and structs. Boxing and unboxing were added in part 17. Still, there were several discrepencies that needed to be addressed for a more accurate representation in C++ of how value types really work in C#.
Let’s start with primitives. Until now, C# primitives were represented as just C++ primitives. An Int32
would turn into int32_t
, Single
would turn into float
, and Byte
would turn into int8_t
. This is very close to being right, but needed a tweak to how they’re boxed. Previously, we’d box primitives by calling a constructor on Object
:
int32_t i = 123; Object o(123);
That worked great as long as we just wanted an Object
, but it didn’t give us any way to get a ValueType
or any interfaces such as IComparable
. To get that, we need to wrap the primitive in its own struct so we can provide conversion operators:
namespace System { struct Int32 { // Just holds one field, so this is still just the size of the field int32_t Value; // Default to 0 Int32() : Value(0) { } // Implicitly convert from int32_t primitive to Int32 struct Int32(int32_t value) : Value(value) { } // Implicitly convert from Int32 struct to int32_t primitive operator int32_t() const { return Value; } // Explicitly box to all base classes explicit operator Object() const { return Object(BoxInt32(Value)); } explicit operator ValueType() const { return ValueType(BoxInt32(Value)); } // Explicitly box to all interfaces explicit operator IComparable() const { return IComparable(BoxInt32(Value)); } explicit operator IFormattable() const { return IFormattable(BoxInt32(Value)); } explicit operator IConvertible() const { return IConvertible(BoxInt32(Value)); } }; }
This struct gives us the ability to interoperate with the primitive type and to box to any base class or interface:
// Implicitly convert between struct and primitive Int32 i = 123; int32_t i2 = i; // Box to any base class or interface ValueType v = (ValueType)i; IComparable c = (IComparable)i;
Note that boxing is explicit in the C++ scripting system, unlike in C# where it is implicit. This is intentionally different from C# because boxing causes garbage to be created and it is far too easy to accidentally box in C#. Examples like above and many cases involving generics are ample proof that even experienced C# programmers will inadertently trigger GC allocations quite frequently. Forcing an explicit boxing via a cast, similar to unboxing, should provide a little bit of helpful friction and hopefully avoid inadvertent boxing.
Now let’s consider enums. Previously, these were using enum struct
as it matched C# enums quite well:
enum struct Name : int32_t { First = 0, Middle = 1, Last = 2 };
We could use it like this:
// Create and convert enums Name n = Name::First; Name n2 = (Name)1; int32_t i = (int32_t)n; // Box, but only to Object Object o(n);
This also didn’t provide the ability to box the enum to various base classes and interfaces. So the enum struct
was swapped out for a struct
with static fields for each enumerator:
struct Name { static const Name First; static const Name Middle; static const Name Last; int32_t Value; // Explicit conversion from the primitive type explicit Name(int32_t value) : Value(value) { } // Explicit conversion to the primitive type explicit operator int32_t() const { return Value; } // Equality and inequality operators bool operator==(Name other) { return Value == other.Value; } bool operator!=(Name other) { return Value != other.Value; } // Explicitly box to all base types explicit operator Enum() { return Enum(BoxName(Value)); } explicit operator ValueType() { return ValueType(BoxName(Value)); } explicit operator Object() { return Object(BoxName(Value)); } // Explicitly box to all interface types explicit operator IFormattable() { return IFormattable(BoxName(Value)); } explicit operator IConvertible() { return IConvertible(BoxName(Value)); } explicit operator IComparable() { return IComparable(BoxName(Value)); } }; // Initialize static constants const Name Name::First(0); const Name Name::Middle(1); const Name Name::Last(2);
The enum behaves very similarly to before with the enum struct
approach, but it now supports boxing to more types:
// Explicitly convert between enum and primitive Name n(1); int32_t i = (int32_t)n; // Overloaded operators make comparison feel natural if (n == Name::Middle) { String msg = "Middle name: "; Debug::Log(msg); } // Box to any base class or interface Enum e = (Enum)n; IComparable c = (IComparable)n;
Finally, let’s look at structs. The C++ scripting system draws a distinction between “full structs” like Vector3
that can be represented in C++ and “managed structs” like RaycastHit
that can’t because of some field like transform
. They looked like this:
struct Vector3 { float x; float y; float z; // ... methods }; struct RaycastHit : ValueType { // ... methods };
And we’d use them like this:
// Box a "full struct" to an Object (and only Object) Vector3 v(1.0f, 2.0f, 3.0f); Object o(v); // Pass a "managed struct" as an Object with no boxing void Foo(Object o) { o.ToString(); } RaycastHit r; Foo(r);
Note that the latter case where no boxing is required is actually incorrect and will either cause an exception to be thrown or incorrect behavior. That’s because calling ToString
will pass the Handle
field into C# where it’ll be used to look up the Object
in its ObjectStore
. However, the Handle
for a “managed struct” actually refers to the StructStore
for that type of struct. So either the Object
won’t be found in the ObjectStore
or the wrong Object
will be found. We need to fix this!
We now know that RaycastHit
doesn’t really derive from ValueType
but instead boxes to a type that derives from ValueType
. So we need to adjust the “managed struct” to not derive from ValueType
. Boxing operators need to be added to both kinds of structs. So we end up with this:
struct Vector3 { float x; float y; float z; // Boxing to all base classes explicit operator Object() const { return Object(BoxVector3(*this)); } explicit operator ValueType() const { return ValueType(BoxVector3(*this)); } // ... boxing to all interface types (Vector3 has none) }; struct ManagedType { // C# StructStore handle int32_t Handle; }; struct RaycastHit : ManagedType { // Boxing to all base classes explicit operator Object() const { return Object(BoxRaycastHit(Handle)); } explicit operator ValueType() const { return ValueType(BoxRaycastHit(Handle)); } // ... boxing to all interface types (RaycastHit has none) };
Now we have full boxing support and we’ve fixed the problem of using the wrong kind of handle:
// Box a "full struct" to any base class or interface Vector3 v(1.0f, 2.0f, 3.0f); ValueType vt(v); // Box a "managed struct" to any base class or interface void Foo(Object o) { o.ToString(); } RaycastHit r; Object o(r); Foo(o);
Now that we’re passing the boxed RaycastHit
instead of the actual RaycastHit
, we’ll be using the boxed struct’s Handle
field. That’s the one returned from the boxing function, which actually refers to the ObjectStore
in C#. So we’re using the correct handle now and won’t have any incorrect behavior or exceptions as we did before.
That wraps up the discussion of value types for this week. Hopefully this has been helpful both for understanding how value types and boxing work and for how they’re implemented in the C++ scripting system. As usual, this is all pushed now to the GitHub project so feel free to check it out. If you’ve got any questions or comments, feel free to speak up.