IL2CPP Output for C# 7.3: Tuples
Unity 2018.3 officially launched last Thursday and with it comes support for the very latest version of C#: 7.3. This includes four new versions—7.0, 7.1, 7.2, and 7.3—so it’s a big upgrade from the C# 6 that we’ve had since 2018.1. Today we’ll begin an article series to learn what happens when we use some of the new features with IL2CPP. We’ll look at the C++ it outputs and even what the C++ compiles to so we know what the CPU will end up executing. Specifically, we’ll focus on the new tuples feature and talk about creating, naming, deconstructing, and comparing them.
Creating Simple Tuples
Let’s start out by creating a simple tuple with the (1, 2)
syntax. Since 1
and 2
have the int
type, this tuple has the (int, int)
type.
static class TestClass { static (int, int) TestCreateTuple() { return (1, 2); } }
Now let’s see the C++ that IL2CPP outputs with Unity 2018.3.0f2:
extern "C" IL2CPP_METHOD_ATTR ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606 (const RuntimeMethod* method) { static bool s_Il2CppMethodInitialized; if (!s_Il2CppMethodInitialized) { il2cpp_codegen_initialize_method (TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606_MetadataUsageId); s_Il2CppMethodInitialized = true; } { ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_0; memset(&L_0, 0, sizeof(L_0)); ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF((&L_0), 1, 2, /*hidden argument*/ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var); return L_0; } }
The signature of the function shows the tuple type that’s been created for us: ValueTuple_2_t...
. This is essentially equiavelent to ValueTuple<int, int>
in C#.
The body of the function starts with the usual method initialization overhead that’s generated any time a generic constructor is called. The first time through il2cpp_codegen_initialize_method
is called, but after that we only pay for checking the s_Il2CppMethodInitialized
flag.
After that, we see the actual work of the function. The ValueTuple_2_t...
variable is declared to hold the return type, cleared to all zeroes with memset
, the “constructor” ValueTuple_2__ctor_m...
is called with the 1
and 2
values we want to store in the tuple, and finally the variable is returned. Note that the “constructor” isn’t a real C++ constructor, but rather a global function whose purpose is to construct an object.
Let’s take a look at the ValueTuple_2_t...
type to see what was generated for us:
struct ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 { public: // T1 System.ValueTuple`2::Item1 int32_t ___Item1_0; // T2 System.ValueTuple`2::Item2 int32_t ___Item2_1; public: inline static int32_t get_offset_of_Item1_0() { return static_cast<int32_t>(offsetof(ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186, ___Item1_0)); } inline int32_t get_Item1_0() const { return ___Item1_0; } inline int32_t* get_address_of_Item1_0() { return &___Item1_0; } inline void set_Item1_0(int32_t value) { ___Item1_0 = value; } inline static int32_t get_offset_of_Item2_1() { return static_cast<int32_t>(offsetof(ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186, ___Item2_1)); } inline int32_t get_Item2_1() const { return ___Item2_1; } inline int32_t* get_address_of_Item2_1() { return &___Item2_1; } inline void set_Item2_1(int32_t value) { ___Item2_1 = value; } };
This type just contains the two 32-bit integers we’d expect. There are no other fields or base classes, so this is truly optimal.
Now let’s look at the constructor for this type:
inline void ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF (ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 * __this, int32_t p0, int32_t p1, const RuntimeMethod* method) { (( void (*) (ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 *, int32_t, int32_t, const RuntimeMethod*))ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_gshared)(__this, p0, p1, method); }
The first part is a big, ugly cast to a function pointer from a global variable: ValueTuple_2__ctor_m..._gshared
. Then the function pointer is called with the two int
parameters: 1
and 2
. A pointer to the instance (__this
) and runtime information for the method come along for the ride.
Let’s see what the constructor looks like:
extern "C" IL2CPP_METHOD_ATTR void ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_gshared (ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 * __this, int32_t ___item10, int32_t ___item21, const RuntimeMethod* method) { { int32_t L_0 = ___item10; __this->set_Item1_0(L_0); int32_t L_1 = ___item21; __this->set_Item2_1(L_1); return; } }
As expected, the constructor simply uses the accessor functions to set Item1
and Item2
. These are the names of the fields in a ValueTuple<T1, T2>
in C# and they carry over to C++.
Finally, let’s look at what the function ends up compiling to with a release build for iOS in Xcode 9.4.1. There’s no need to have a deep understanding of ARM64 assembly to read this.
push {r4, r5, r7, lr} add r7, sp, #8 movw r5, :lower16:(__ZZ67TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606E25s_Il2CppMethodInitialized-(LPC10_0+4)) mov r4, r0 movt r5, :upper16:(__ZZ67TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606E25s_Il2CppMethodInitialized-(LPC10_0+4)) LPC10_0: add r5, pc ldrb r0, [r5] cbnz r0, LBB10_2 movw r0, :lower16:(L_TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606_MetadataUsageId$non_lazy_ptr-(LPC10_1+4)) movt r0, :upper16:(L_TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606_MetadataUsageId$non_lazy_ptr-(LPC10_1+4)) LPC10_1: add r0, pc ldr r0, [r0] ldr r0, [r0] bl __ZN6il2cpp2vm13MetadataCache24InitializeMethodMetadataEj movs r0, #1 strb r0, [r5] LBB10_2: movw r0, :lower16:(L_ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var$non_lazy_ptr-(LPC10_2+4)) movs r1, #0 movt r0, :upper16:(L_ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var$non_lazy_ptr-(LPC10_2+4)) str r1, [r4, #4] LPC10_2: add r0, pc str r1, [r4] movs r1, #1 movs r2, #2 ldr r0, [r0] ldr r3, [r0] mov r0, r4 bl _ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_gshared pop {r4, r5, r7, pc}
This begins with a lot of code for method initialization and then we see the 1
and 2
literals being passed to the constructor function. Since that wasn’t inlined, let’s go take a look at it:
str r2, [r0, #4] str r1, [r0] bx lr
All this does is set the Item1
and Item2
fields based on the parameters. The accessor functions have been inlined, so this is now minimal.
Conclusion: Simple tuple types are minimal, containing only the necessary fields. Unfortunately, creating them adds method initialization overhead to a function and involves a function call to set the fields.
Creating Named Tuples
Now let’s try creating a tuple with explicit names. Previously we had Item1
and Item2
, but we’ll specify new names instead:
static class TestClass { static int TestCreateTupleWithNames() { var t = (horizontal: 1, vertical: 2); return t.horizontal + t.vertical; } }
In C#, we can access the fields of the tuple using the names we gave them during creation: horizontal
and vertical
, not Item1
and Item2
. Let’s see how this carries over to C++:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1 (const RuntimeMethod* method) { static bool s_Il2CppMethodInitialized; if (!s_Il2CppMethodInitialized) { il2cpp_codegen_initialize_method (TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1_MetadataUsageId); s_Il2CppMethodInitialized = true; } ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 V_0; memset(&V_0, 0, sizeof(V_0)); { ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF((ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 *)(&V_0), 1, 2, /*hidden argument*/ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var); ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_0 = V_0; int32_t L_1 = L_0.get_Item1_0(); ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_2 = V_0; int32_t L_3 = L_2.get_Item2_1(); return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3)); } }
Again, the function body begins with method initialization. After that, we see the same ValueTuple_2_t...
type being used as a local variable. That means C++ is using the same ValueType<int, int>
type as in the first example, even though its fields are named Item1
and Item2
instead of horizontal
and vertical
. Creation continues in the same way: memset
to zero then call the constructor. The constructor called is also the same as in the first example.
Immediately afterward, we see that the calls to get the horizontal
and vertical
fields have been converted to use the accessors for Item1
and Item2
. This makes it apparent that the names we give tuple fields in C# are only syntactic sugar and the real names used are always Item1
, Item2
, and so forth.
The function wraps up with the odd il2cpp_codegen_add
call, which we’ve seen before just expands to the +
operator.
Now let’s look at the ARM64 assembly to see how the function is compiled:
push {r4, r7, lr} add r7, sp, #4 sub sp, #8 movw r4, :lower16:(__ZZ76TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1E25s_Il2CppMethodInitialized-(LPC11_0+4)) movt r4, :upper16:(__ZZ76TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1E25s_Il2CppMethodInitialized-(LPC11_0+4)) LPC11_0: add r4, pc ldrb r0, [r4] cbnz r0, LBB11_2 movw r0, :lower16:(L_TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1_MetadataUsageId$non_lazy_ptr-(LPC11_1+4)) movt r0, :upper16:(L_TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1_MetadataUsageId$non_lazy_ptr-(LPC11_1+4)) LPC11_1: add r0, pc ldr r0, [r0] ldr r0, [r0] bl __ZN6il2cpp2vm13MetadataCache24InitializeMethodMetadataEj movs r0, #1 strb r0, [r4] LBB11_2: movw r0, :lower16:(L_ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var$non_lazy_ptr-(LPC11_2+4)) movs r1, #1 movt r0, :upper16:(L_ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var$non_lazy_ptr-(LPC11_2+4)) movs r2, #2 LPC11_2: add r0, pc ldr r0, [r0] ldr r3, [r0] movs r0, #0 strd r0, r0, [sp] mov r0, sp bl _ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_gshared ldrd r0, r1, [sp] add r0, r1 add sp, #8 pop {r4, r7, pc}
This is very similar to the first example. We see the method initialization followed by the call to the constructor function, but this function ends by adding together the two fields. It’s unfortunate that the compiler produced such a literal translation here as there was no need to create the tuple in the first place or perform the addition. It could have just returned 3
, or even better inlined all calls to this function with the literal 3
.
Conclusion: Tuple field names are syntactic sugar. The same type and constructor are used regardless of field names. Using tuples in general sometimes defeats compiler optimizations.
Creating Tuples with Inferred Names
Next we’ll create a tuple from variables, which allows the C# compiler to infer the field names as the same as the variable names:
static class TestClass { static int TestCreateTupleWithInferredNames(int x, int y) { var t = (x, y); return t.x + t.y; } }
Now let’s look at the IL2CPP output:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCreateTupleWithInferredNames_mE7F8534AD9E551B268E05DEBB0306B3B72D73A46 (int32_t ___x0, int32_t ___y1, const RuntimeMethod* method) { static bool s_Il2CppMethodInitialized; if (!s_Il2CppMethodInitialized) { il2cpp_codegen_initialize_method (TestClass_TestCreateTupleWithInferredNames_mE7F8534AD9E551B268E05DEBB0306B3B72D73A46_MetadataUsageId); s_Il2CppMethodInitialized = true; } ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 V_0; memset(&V_0, 0, sizeof(V_0)); { int32_t L_0 = ___x0; int32_t L_1 = ___y1; ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF((ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 *)(&V_0), L_0, L_1, /*hidden argument*/ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var); ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_2 = V_0; int32_t L_3 = L_2.get_Item1_0(); ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_4 = V_0; int32_t L_5 = L_4.get_Item2_1(); return ((int32_t)il2cpp_codegen_add((int32_t)L_3, (int32_t)L_5)); } }
This is essentially identical to the C++ that was generated when we explicitly gave names to the fields. That makes sense since the names were syntactic sugar anyhow.
Conclusion: Compiler-inferred tuple field names are syntactic sugar just like explicitly-provided tuple field names.
Deconstructing Simple Tuples
Now let’s start to “deconstruct” tuples. This is syntax to extract the tuple’s fields into local variables all in one line. Here’s how it looks:
static class TestClass { static int TestDeconstructTuple() { (int x, int y) = TestCreateTuple(); return x + y; } }
Let’s see what kind of C++ is generated for this:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructTuple_mFD8EA59ED69A94BCCE1CAA0FBB579925C1C34FF1 (const RuntimeMethod* method) { int32_t V_0 = 0; int32_t V_1 = 0; { ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_0 = TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606(/*hidden argument*/NULL); ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_1 = L_0; int32_t L_2 = L_1.get_Item1_0(); V_0 = L_2; int32_t L_3 = L_1.get_Item2_1(); V_1 = L_3; int32_t L_4 = V_0; int32_t L_5 = V_1; return ((int32_t)il2cpp_codegen_add((int32_t)L_4, (int32_t)L_5)); } }
This function doesn’t have method initialization overhead since it didn’t call a generic constructor. Instead, it just calls a function that happens to call a generic constructor, so presumably we’ll take the overhead there.
After the call to TestCreateTuple
, we see the same accessors for Item1
and Item2
called to get the tuple’s fields. The return values are stored in local variables: L_2
and L_3
. They also get copied, redundantly, to V_0
and V_1
before being finally added together.
So far it looks like deconstructing a tuple is just syntax sugar for copying its fields one-by-one. Let’s look at the assembly the C++ compiler generates to see how this all boils down into what the CPU actually executes:
push {r7, lr} mov r7, sp sub sp, #8 mov r0, sp bl _TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606 ldrd r0, r1, [sp] add r0, r1 add sp, #8 pop {r7, pc}
After the call to TestCreateTuple
, the fields are loaded out of the tuple, added together, and returned.
This is another missed opportunity by the compiler because TestCreateTuple
always returns (1, 2)
so the return value of this function is known at compile time to be 3
. The compiler still generated code that goes through the motions: create the tuple with method initialization overhead, get the fields, and add them together. This indicates that the complexity of tuples may defeat compiler optimizations like these when constant values are used in real game code.
Conclusion: Deconstructing a tuple is syntactic sugar for individually reading its fields.
Deconstructing Classes
Deconstructing also works with our own class types. All we have to do is provide a Deconstruct
method that takes out
parameters like this:
class DeconstructableClass { public int X; public int Y; public void Deconstruct(out int x, out int y) { x = X; y = Y; } }
The we can place it on the right side of the =
to deconstruct it:
static class TestClass { static int TestDeconstructClass(DeconstructableClass dc) { (int x, int y) = dc; return x + y; } }
Let’s see what happens when we do this:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructClass_m57AD9127FF5C070409FE98467A541E3FD141C478 (DeconstructableClass_t9072FF25F580F648CD2B2657234A18D85E353D12 * ___dc0, const RuntimeMethod* method) { int32_t V_0 = 0; int32_t V_1 = 0; int32_t V_2 = 0; { DeconstructableClass_t9072FF25F580F648CD2B2657234A18D85E353D12 * L_0 = ___dc0; NullCheck(L_0); DeconstructableClass_Deconstruct_mABF5441DF3A7F5833AAA4F2D1948489E78E6706D(L_0, (int32_t*)(&V_1), (int32_t*)(&V_2), /*hidden argument*/NULL); int32_t L_1 = V_1; int32_t L_2 = V_2; V_0 = L_2; int32_t L_3 = V_0; return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3)); } }
The generated C++ begins with a null
check of the class and proceeds to call the DeconstructableClass.Deconstruct
method we wrote with pointers to local variables as the out
parameters. These are then, again redundantly, copied to other local variables before finally being added together.
Let’s look at the Deconstruct
method to see what IL2CPP generated for it:
extern "C" IL2CPP_METHOD_ATTR void DeconstructableClass_Deconstruct_mABF5441DF3A7F5833AAA4F2D1948489E78E6706D (DeconstructableClass_t9072FF25F580F648CD2B2657234A18D85E353D12 * __this, int32_t* ___x0, int32_t* ___y1, const RuntimeMethod* method) { { int32_t* L_0 = ___x0; int32_t L_1 = __this->get_X_0(); *((int32_t*)L_0) = (int32_t)L_1; int32_t* L_2 = ___y1; int32_t L_3 = __this->get_Y_1(); *((int32_t*)L_2) = (int32_t)L_3; return; } }
All this does it call the accessors to get X
and Y
then set them to the out
parameters, which were turned into pointers in C++.
Finally, let’s look at the assembly for the test function to see how it all came together:
push {r4, r7, lr} add r7, sp, #4 mov r4, r0 cbnz r4, LBB14_2 movs r0, #0 bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEP19Il2CppSequencePoint LBB14_2: ldr r0, [r4, #8] ldr r1, [r4, #12] add r0, r1 pop {r4, r7, pc}
The first part is the null
check and the second part is the actual work of the function. The call to Deconstruct
has been inlined and we now have just two reads to get the fields of the class. They’re then added together and returned.
Conclusion: Deconstruct
methods provide an efficient and terse way to deconstruct classes.
Deconstructing Structs
If we can add a Deconstruct
method to a class, surely we can do the same with a struct. Let’s try:
struct DeconstructableStruct { public int X; public int Y; public void Deconstruct(out int x, out int y) { x = X; y = Y; } } static class TestClass { static int TestDeconstructStruct(DeconstructableStruct ds) { (int x, int y) = ds; return x + y; } }
This works, so let’s check the C++:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructStruct_m723954E47E070D42C482C0546A40BA24BA91D40D (DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4 ___ds0, const RuntimeMethod* method) { int32_t V_0 = 0; DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4 V_1; memset(&V_1, 0, sizeof(V_1)); int32_t V_2 = 0; int32_t V_3 = 0; { DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4 L_0 = ___ds0; V_1 = L_0; DeconstructableStruct_Deconstruct_m197F44C639A3201D98D99FB2CFD9FE4DEAEE5AFE((DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4 *)(&V_1), (int32_t*)(&V_2), (int32_t*)(&V_3), /*hidden argument*/NULL); int32_t L_1 = V_2; int32_t L_2 = V_3; V_0 = L_2; int32_t L_3 = V_0; return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3)); } }
This is basically the same as the class
version except that it doesn’t have the NullCheck
call. Let’s look at the Deconstruct
function now:
extern "C" IL2CPP_METHOD_ATTR void DeconstructableStruct_Deconstruct_m197F44C639A3201D98D99FB2CFD9FE4DEAEE5AFE (DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4 * __this, int32_t* ___x0, int32_t* ___y1, const RuntimeMethod* method) { { int32_t* L_0 = ___x0; int32_t L_1 = __this->get_X_0(); *((int32_t*)L_0) = (int32_t)L_1; int32_t* L_2 = ___y1; int32_t L_3 = __this->get_Y_1(); *((int32_t*)L_2) = (int32_t)L_3; return; } }
This is identical to the class
version, so let’s see what assembly was generated:
add r0, r1 bx lr
With the null
check gone, the generated assembly is truly tiny. It now consists of the bare minimum addition and return.
Conclusion: Deconstructing a struct
is free and still provides a terse, error-checked way of extracting its fields.
Deconstructing with Extension Methods
Now let’s try moving the Deconstruct
method out of the struct
and into a static class
as an extension method:
struct NonDeconstructableStruct { public int X; public int Y; } static class NonDeconstructableStructExtensions { public static void Deconstruct( this NonDeconstructableStruct nds, out int x, out int y) { x = nds.X; y = nds.Y; } }
Using it looks exactly like when Deconstruct
was inside the struct
type as an instance method:
static class TestClass { static int TestDeconstructStructExtension(NonDeconstructableStruct ds) { (int x, int y) = ds; return x + y; } }
This compiles just fine, so let’s see what C++ was generated:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructStructExtension_mFD580D0EB417138F4E03F3391B9FB9063526AC4F (NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054 ___ds0, const RuntimeMethod* method) { int32_t V_0 = 0; int32_t V_1 = 0; int32_t V_2 = 0; { NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054 L_0 = ___ds0; NonDeconstructableStructExtensions_Deconstruct_m029D9FA1627896EF3E8F38AAE1D469D0B713DBD2(L_0, (int32_t*)(&V_1), (int32_t*)(&V_2), /*hidden argument*/NULL); int32_t L_1 = V_1; int32_t L_2 = V_2; V_0 = L_2; int32_t L_3 = V_0; return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3)); } }
This looks the same as before, except that the call to Deconstruct
is now in NonDeconstructableStructExtensions
. Let’s see it:
extern "C" IL2CPP_METHOD_ATTR void NonDeconstructableStructExtensions_Deconstruct_m029D9FA1627896EF3E8F38AAE1D469D0B713DBD2 (NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054 ___nds0, int32_t* ___x1, int32_t* ___y2, const RuntimeMethod* method) { { int32_t* L_0 = ___x1; NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054 L_1 = ___nds0; int32_t L_2 = L_1.get_X_0(); *((int32_t*)L_0) = (int32_t)L_2; int32_t* L_3 = ___y2; NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054 L_4 = ___nds0; int32_t L_5 = L_4.get_Y_1(); *((int32_t*)L_3) = (int32_t)L_5; return; } }
This is slighly longer due to some unnecessary variable copying, but essentially the same. Let’s see if any of this affected the final assembly:
add r0, r1 bx lr
No, it’s still the same two instructions.
Conclusion: Placing Deconstruct
inside or outside the type doesn’t make any difference.
Deconstructing Enums
If Deconstruct
can be an extension method, then we should be able to apply it to all kinds of types. Let’s try adding one for an enum
:
enum TestEnum { } static class TestEnumExtensions { public static void Deconstruct( this TestEnum te, out int x, out int y) { x = (int)te; y = (int)te + 1; } } static class TestClass { static int TestDeconstructEnum(TestEnum te) { (int x, int y) = te; return x + y; } }
This particular example is just to keep things simple and doesn’t represent a good use of deconstructing an enum
. There may be good uses though, such as being able to write (float x, float y, float z) = Axis::X
and getting the equivalent of float x = 1.0f; float y = 0.0f; float z = 0.0f;
due to a switch
in Deconstruct
that outputs the appropriate values.
In the meantime, let’s return to the example and see how it looks in C++:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructEnum_m25C34CEC10F6CE4E36C917ABE450E5A43592788D (int32_t ___te0, const RuntimeMethod* method) { int32_t V_0 = 0; int32_t V_1 = 0; int32_t V_2 = 0; { int32_t L_0 = ___te0; TestEnumExtensions_Deconstruct_m53305AE31678C7BA3BAA729F3B549598F3DB1D2F(L_0, (int32_t*)(&V_1), (int32_t*)(&V_2), /*hidden argument*/NULL); int32_t L_1 = V_1; int32_t L_2 = V_2; V_0 = L_2; int32_t L_3 = V_0; return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3)); } }
Again, this output looks just like the struct
version. Let’s see how the Deconstruct
function works:
extern "C" IL2CPP_METHOD_ATTR void TestEnumExtensions_Deconstruct_m53305AE31678C7BA3BAA729F3B549598F3DB1D2F (int32_t ___te0, int32_t* ___x1, int32_t* ___y2, const RuntimeMethod* method) { { int32_t* L_0 = ___x1; int32_t L_1 = ___te0; *((int32_t*)L_0) = (int32_t)L_1; int32_t* L_2 = ___y2; int32_t L_3 = ___te0; *((int32_t*)L_2) = (int32_t)((int32_t)il2cpp_codegen_add((int32_t)L_3, (int32_t)1)); return; } }
Redundant variables aside, this does just what we wrote in C#: output the enum’s integer value to x
and add one to output y
. Let’s see what assembly we get for this:
movs r1, #1 orr.w r0, r1, r0, lsl #1 bx lr
This is three instructions because we have to add one, but still absolutely minimal instructions for the CPU to execute.
Conclusion: Extension methods can allow for deconstructing an enum
just as quickly and easily as a struct
or class
.
Deconstructing Primitives
Lastly, let’s even try adding a Deconstruct
extension method for a primitive type: int
.
static class IntExtensions { public static void Deconstruct( this int i, out int x, out int y) { x = i; y = i + 1; } } static class TestClass { static int TestDeconstructInt(int i) { (int x, int y) = i; return x + y; } }
The code to do this is basically the same as with enum
, class
, and struct
. Its debatable whether there are any good uses for deconstructing a primitive type, but we’ll set that aside for now as we go through this example. Let’s see how much the C++ varies:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructInt_mB98888AF0CA4300F0FFD1BD29673067F26CEFBFA (int32_t ___i0, const RuntimeMethod* method) { int32_t V_0 = 0; int32_t V_1 = 0; int32_t V_2 = 0; { int32_t L_0 = ___i0; IntExtensions_Deconstruct_mE01DC9A61BFAF35570CFD0176217F98232D2EC5B(L_0, (int32_t*)(&V_1), (int32_t*)(&V_2), /*hidden argument*/NULL); int32_t L_1 = V_1; int32_t L_2 = V_2; V_0 = L_2; int32_t L_3 = V_0; return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3)); } }
So far this looks the same as with struct
and enum
. Let’s look at the C++ for the Deconstruct
extension method:
extern "C" IL2CPP_METHOD_ATTR void IntExtensions_Deconstruct_mE01DC9A61BFAF35570CFD0176217F98232D2EC5B (int32_t ___i0, int32_t* ___x1, int32_t* ___y2, const RuntimeMethod* method) { { int32_t* L_0 = ___x1; int32_t L_1 = ___i0; *((int32_t*)L_0) = (int32_t)L_1; int32_t* L_2 = ___y2; int32_t L_3 = ___i0; *((int32_t*)L_2) = (int32_t)((int32_t)il2cpp_codegen_add((int32_t)L_3, (int32_t)1)); return; } }
This looks the same as with enum
. Here’s how the assembly looks:
movs r1, #1 orr.w r0, r1, r0, lsl #1 bx lr
Again, this is just as with enum
.
Conclusion: Extension methods even allow for deconstructing primitives. It’s just as efficient as enum
and struct
, but might not have any good practical uses.
Tuple Equality
For today’s final example, let’s see how we can use the equality (==
) and inequality (!=
) operators with tuples:
static class TestClass { static bool TestTupleEquality((int, int) t1, (int, int) t2) { return t1 == t2; } static bool TestTupleInequality((int, int) t1, (int, int) t2) { return t1 != t2; } }
Here’s the C++ that IL2CPP generates:
extern "C" IL2CPP_METHOD_ATTR bool TestClass_TestTupleEquality_m2FA8D3F7595837C30A9EF891092F87AF36287C5A (ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 ___t10, ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 ___t21, const RuntimeMethod* method) { ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 V_0; memset(&V_0, 0, sizeof(V_0)); ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 V_1; memset(&V_1, 0, sizeof(V_1)); { ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_0 = ___t10; V_0 = L_0; ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_1 = ___t21; V_1 = L_1; ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_2 = V_0; int32_t L_3 = L_2.get_Item1_0(); ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_4 = V_1; int32_t L_5 = L_4.get_Item1_0(); if ((!(((uint32_t)L_3) == ((uint32_t)L_5)))) { goto IL_0021; } } { ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_6 = V_0; int32_t L_7 = L_6.get_Item2_1(); ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 L_8 = V_1; int32_t L_9 = L_8.get_Item2_1(); return (bool)((((int32_t)L_7) == ((int32_t)L_9))? 1 : 0); } IL_0021: { return (bool)0; } }
This is pretty long compared to the previous functions, but it’s just simple and verbose. To begin with, two tuples are created: V_0
and V_1
. Both are memset
to zero, just as when we created the tuples directly but unlike when we tested deconstructing a tuple. Then they’re immediately overwritten by the parameters.
Next, Item1
is retrieved from both tuples via its accessor. This is compared for inequality with !(a==b)
and goto
is used to jump to a block that returns false
in that case. Otherwise, Item2
is retrieved and compared with ==
to return true
or false
using a conditional (?:
) operator. The is far from clean hand-written code, but still rather straightforward in this small example.
Finally, let’s look at the C++ compiler output, translated by the author into pseudo-C#:
eors r1, r3 # r1 = t1.Item1 ^ t2.Item2; // zero only if equal eors r0, r2 # r0 = t1.Item1 ^ t2.Item2; // zero only if equal orrs r0, r1 # Z = ((r0 | r1) == 0) ? 1 : 2; // zero only if both equal mov.w r0, #0 # r0 = 0; it eq # if (Z == 1) // only if both equal moveq r0, #1 # r0 = 1; bx lr # return r0;
Here we see a more radical transformation than in previous assembly code. The two if
statements have been entirely replaced with a more optimal set of instructions. There are no more branches and only one conditional instruction. This should execute much faster than a literal translation of the C++ would.
The C++ and assembly code for inequality is nearly identical to that of equality, so it’s not shown here.
Conclusion: Tuple equality and inequality is syntactic sugar for directly comparing all fields of the tuples. The resulting assembly may have no branches.
Conclusion
Tuples have provided us a fair amount of syntactic sugar to create structs and name, compare, and extract their fields. The resulting assembly is reasonably good in all cases. In some cases like deconstruction and equality, it’s truly minimal and should execute extremely quickly. Creation in particular is marred by method initialization overhead. Likewise, the C++ compiler can generate some sub-optimal code for usage of a tuple.
Additionally, deconstruction is a flexible tool that we can apply to classes and structs via instance methods. We can also apply it to these types as well as enums, primitives, or anything else via extension methods. Sometimes there aren’t any good use cases for this, but others such as (float x, float y, float z) = someVector3;
are quite compelling.
Stay tuned for next week when we’ll continue the series by looking at more new language features in C# 7.3!
#1 by Nate Allan on January 10th, 2019 ·
This type of measurement and analysis can be so informative, rather than endless speculation. Thank you so much for doing this!
#2 by Draugor on July 10th, 2019 ·
uh now i’m interested in knowing what tuples do to List/Dictionaries instead of structs in il2cpp :D
say a Dictionary or a List
#3 by Draugor on July 10th, 2019 ·
apparently comments don’t like generics
i meant a Dictionary with (int, int) as key and int as value or a list of (int,int,int) for example