Loops in IL2CPP
There are many permutations of loops we can write, but what do they compile to? We should know the consequences of using an array versus a List<T>
, for
versus foreach
, caching Length
, and other factors. So today’s article dives into the C++ code that IL2CPP outputs when we write these various types of loops to examine the differences. We’ll even go further and look at the ARM assembly that the C++ compiles to and really find out how much overhead our choices are costing us.
Array for
loop, checks disabled, Length
cached
Today we’ll progress from the fastest possible loop to the slowest. Along the way we’ll look at the C# source code we write, the C++ source code that IL2CPP in Unity 2017.3 generates for our C#, and the ARM assembly code that Xcode 9.2 compiles to for an release build on iOS. Let’s start with the C# for a for
loop on an array when we’ve told IL2CPP to disable null- and bounds-checks and we’re caching the array’s Length
field as a local variable. Everything’s exactly the same for while
and do-while
loops, so only for
loops will be covered here.
static class TestClass { [Il2CppSetOption(Option.NullChecks, false)] [Il2CppSetOption(Option.ArrayBoundsChecks, false)] static int ForArrayChecksDisabled(int[] array) { int sum = 0; int len = array.Length; for (int i = 0; i < len; ++i) { sum += array[i]; } return sum; } }
Now the C++ that IL2CPP outputs:
extern "C" int32_t TestClass_ForArrayChecksDisabled_m816776997 (RuntimeObject * __this /* static, unused */, Int32U5BU5D_t385246372* ___array0, const RuntimeMethod* method) { int32_t V_0 = 0; int32_t V_1 = 0; int32_t V_2 = 0; { V_0 = 0; Int32U5BU5D_t385246372* L_0 = ___array0; V_1 = (((int32_t)((int32_t)(((RuntimeArray *)L_0)->max_length)))); V_2 = 0; goto IL_0017; } IL_000d: { int32_t L_1 = V_0; Int32U5BU5D_t385246372* L_2 = ___array0; int32_t L_3 = V_2; int32_t L_4 = L_3; int32_t L_5 = (L_2)->GetAtUnchecked(static_cast<il2cpp_array_size_t>(L_4)); V_0 = ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_5)); int32_t L_6 = V_2; V_2 = ((int32_t)il2cpp_codegen_add((int32_t)L_6, (int32_t)1)); } IL_0017: { int32_t L_7 = V_2; int32_t L_8 = V_1; if ((((int32_t)L_7) < ((int32_t)L_8))) { goto IL_000d; } } { int32_t L_9 = V_0; return L_9; } }
There are a lot of pointless local variable aliases and unnecessary casts, but other than that it’s pretty clear what’s going on here. We don’t see any NullCheck
calls and we see GetAtUnchecked
, which means we’ve effectively disabled null- and bounds-checks. Other than that, the only weird parts are the calls to il2cpp_codegen_add
. This is an inline
templated function that returns the larger of the two argument sizes, so int
extends to long
and so forth. In this case, it’s equivalent to just using the +
operator.
Now let’s see how effective the C++ compiler is at removing all these local variable aliases, casts, and function calls by looking at the assembly it generates:
ldr r2, [r1, #12] cmp r2, #1 itt lt movlt r0, #0 bxlt lr adds r1, #16 movs r0, #0 LBB0_1: ldr r3, [r1], #4 subs r2, #1 add r0, r3 bne LBB0_1 bx lr
It’s OK to not understand all the details of this assembly. There’s not much going on here though, just some loop setup, the contents of the loop, incrementing the iterator variable, and jumping back to the LBB0_1
label at the start of the loop to keep it going. This is quite minimal and shows that the C# to C++ to Assembly pipeline can be really efficient under the right circumstances.
Array for
loop, checks enabled
Hardly anybody ever disables the null- and bounds-checks everywhere though, so let’s see what a for
loop on an array looks like normally when we don’t disable them. Here’s the C#:
static class TestClass { static int ForArray(int[] array) { int sum = 0; int len = array.Length; for (int i = 0; i < len; ++i) { sum += array[i]; } return sum; } }
And the C++:
extern "C" int32_t TestClass_ForArray_m2463453876 (RuntimeObject * __this /* static, unused */, Int32U5BU5D_t385246372* ___array0, const RuntimeMethod* method) { int32_t V_0 = 0; int32_t V_1 = 0; int32_t V_2 = 0; { V_0 = 0; Int32U5BU5D_t385246372* L_0 = ___array0; NullCheck(L_0); V_1 = (((int32_t)((int32_t)(((RuntimeArray *)L_0)->max_length)))); V_2 = 0; goto IL_0017; } IL_000d: { int32_t L_1 = V_0; Int32U5BU5D_t385246372* L_2 = ___array0; int32_t L_3 = V_2; NullCheck(L_2); int32_t L_4 = L_3; int32_t L_5 = (L_2)->GetAt(static_cast<il2cpp_array_size_t>(L_4)); V_0 = ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_5)); int32_t L_6 = V_2; V_2 = ((int32_t)il2cpp_codegen_add((int32_t)L_6, (int32_t)1)); } IL_0017: { int32_t L_7 = V_2; int32_t L_8 = V_1; if ((((int32_t)L_7) < ((int32_t)L_8))) { goto IL_000d; } } { int32_t L_9 = V_0; return L_9; } }
As expected, the NullCheck
calls are back and GetAtUnchecked
has been replaced with GetAt
. Let’s see the impact on the ARM assembly:
push {r4, r5, r6, r7, lr} add r7, sp, #12 push.w {r8, r10} mov r8, r1 cmp.w r8, #0 it eq bleq __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEv ldr.w r0, [r8, #12] cmp r0, #1 blt LBB0_6 sub.w r10, r0, #1 add.w r4, r8, #16 movs r5, #0 movs r6, #0 b LBB0_3 LBB0_2: adds r6, #1 ldr.w r0, [r8, #12] LBB0_3: cmp r0, r6 bhi LBB0_5 bl __ZN6il2cpp2vm9Exception27GetIndexOutOfRangeExceptionEv bl __ZN6il2cpp2vm9Exception5RaiseEP15Il2CppException LBB0_5: ldr.w r0, [r4, r6, lsl #2] cmp r10, r6 add r5, r0 bne LBB0_2 b LBB0_7 LBB0_6: movs r5, #0 LBB0_7: mov r0, r5 pop.w {r8, r10} pop {r4, r5, r6, r7, pc}
Well that got a lot longer! Again, there’s no need to understand all the details. The gist here is that a lot more assembly code got generated. There are more conditionals (e.g. bl
) now to do all the checks. These can be expensive so they’re best avoided in performance-critical code, but it’s not too bad.
Array for
loop, checks enabled, Length
cache
Next, let’s look at another for
loop over an array, but this time without caching the Length
of the array as a local variable. Many C# programmers don’t do this, so let’s see what difference it makes.
static class TestClass { static int ForArrayNoCache(int[] array) { int sum = 0; for (int i = 0; i < array.Length; ++i) { sum += array[i]; } return sum; } }
Here’s the C++:
extern "C" int32_t TestClass_ForArrayNoCache_m2316932607 (RuntimeObject * __this /* static, unused */, Int32U5BU5D_t385246372* ___array0, const RuntimeMethod* method) { int32_t V_0 = 0; int32_t V_1 = 0; { V_0 = 0; V_1 = 0; goto IL_0013; } IL_0009: { int32_t L_0 = V_0; Int32U5BU5D_t385246372* L_1 = ___array0; int32_t L_2 = V_1; NullCheck(L_1); int32_t L_3 = L_2; int32_t L_4 = (L_1)->GetAt(static_cast<il2cpp_array_size_t>(L_3)); V_0 = ((int32_t)il2cpp_codegen_add((int32_t)L_0, (int32_t)L_4)); int32_t L_5 = V_1; V_1 = ((int32_t)il2cpp_codegen_add((int32_t)L_5, (int32_t)1)); } IL_0013: { int32_t L_6 = V_1; Int32U5BU5D_t385246372* L_7 = ___array0; NullCheck(L_7); if ((((int32_t)L_6) < ((int32_t)(((int32_t)((int32_t)(((RuntimeArray *)L_7)->max_length))))))) { goto IL_0009; } } { int32_t L_8 = V_0; return L_8; } }
Notice that the if
line is no longer just comparing two local variables (if ((((int32_t)L_7) < ((int32_t)L_8)))
) but now queries the Length
field of the array via max_length
. How much of an impact does this have on the assembly? Let’s find out:
push {r4, r5, r6, r7, lr} add r7, sp, #12 str r8, [sp, #-4]! mov r4, r1 add.w r8, r4, #16 movs r6, #0 movs r5, #0 b LBB1_4 LBB1_1: cmp r0, r6 bhi LBB1_3 bl __ZN6il2cpp2vm9Exception27GetIndexOutOfRangeExceptionEv bl __ZN6il2cpp2vm9Exception5RaiseEP15Il2CppException LBB1_3: ldr.w r0, [r8, r6, lsl #2] adds r6, #1 add r5, r0 LBB1_4: cbnz r4, LBB1_6 bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEv LBB1_6: ldr r0, [r4, #12] cmp r6, r0 blt LBB1_1 mov r0, r5 ldr r8, [sp], #4 pop {r4, r5, r6, r7, pc}
This is shorter than before, but structured differently. The array is now checked for null (with cbnz
) on every loop iteration. So there is a reason to cache the array length as a local variable. It will reduce the amount of branching required in the loop when null checks are left enabled.
Array foreach
loop
Now that we’ve established a baseline of what normal for
loops look like, let’s try out our first foreach
loop using an array:
static class TestClass { static int ForeachArray(int[] array) { int sum = 0; foreach (int cur in array) { sum += cur; } return sum; } }
Here’s the C++:
extern "C" int32_t TestClass_ForeachArray_m760813861 (RuntimeObject * __this /* static, unused */, Int32U5BU5D_t385246372* ___array0, const RuntimeMethod* method) { int32_t V_0 = 0; int32_t V_1 = 0; Int32U5BU5D_t385246372* V_2 = NULL; int32_t V_3 = 0; { V_0 = 0; Int32U5BU5D_t385246372* L_0 = ___array0; V_2 = L_0; V_3 = 0; goto IL_0017; } IL_000b: { Int32U5BU5D_t385246372* L_1 = V_2; int32_t L_2 = V_3; NullCheck(L_1); int32_t L_3 = L_2; int32_t L_4 = (L_1)->GetAt(static_cast<il2cpp_array_size_t>(L_3)); V_1 = L_4; int32_t L_5 = V_0; int32_t L_6 = V_1; V_0 = ((int32_t)il2cpp_codegen_add((int32_t)L_5, (int32_t)L_6)); int32_t L_7 = V_3; V_3 = ((int32_t)il2cpp_codegen_add((int32_t)L_7, (int32_t)1)); } IL_0017: { int32_t L_8 = V_3; Int32U5BU5D_t385246372* L_9 = V_2; NullCheck(L_9); if ((((int32_t)L_8) < ((int32_t)(((int32_t)((int32_t)(((RuntimeArray *)L_9)->max_length))))))) { goto IL_000b; } } { int32_t L_10 = V_0; return L_10; } }
This looks very similar but not identical to the previous test where we used a for
loop without caching Length
. Let’s look at the assembly to strip away the C++ syntax and see what really gets run by the CPU:
push {r4, r5, r6, r7, lr} add r7, sp, #12 str r8, [sp, #-4]! mov r4, r1 add.w r8, r4, #16 movs r6, #0 movs r5, #0 b LBB1_4 LBB1_1: cmp r0, r6 bhi LBB1_3 bl __ZN6il2cpp2vm9Exception27GetIndexOutOfRangeExceptionEv bl __ZN6il2cpp2vm9Exception5RaiseEP15Il2CppException LBB1_3: ldr.w r0, [r8, r6, lsl #2] adds r6, #1 add r5, r0 LBB1_4: cbnz r4, LBB1_6 bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEv LBB1_6: ldr r0, [r4, #12] cmp r6, r0 blt LBB1_1 mov r0, r5 ldr r8, [sp], #4 pop {r4, r5, r6, r7, pc}
This is exactly the same assembly code! Every single character of every single line is the same, even down to the registers that local variables occupy. This means we can very easily state what a foreach
loop looks like on an array: it’s the same as a for
loop that doesn’t cache the Length
field and doesn’t disable null- or bounds-checks. That makes it equivalent to the slowest for
loop we can write. No more. No less.
List<T>
for
loop
Now let’s venture into List<T>
territory and explore what it looks like to iterate over one with a for
loop:
static class TestClass { static int ForList(List<int> list) { int sum = 0; int len = list.Count; for (int i = 0; i < len; ++i) { sum += list[i]; } return sum; } }
Here’s the C++ that IL2CPP outputs:
extern "C" int32_t TestClass_ForList_m2093568030 (RuntimeObject * __this /* static, unused */, List_1_t128053199 * ___list0, const RuntimeMethod* method) { static bool s_Il2CppMethodInitialized; if (!s_Il2CppMethodInitialized) { il2cpp_codegen_initialize_method (TestClass_ForList_m2093568030_MetadataUsageId); s_Il2CppMethodInitialized = true; } int32_t V_0 = 0; int32_t V_1 = 0; int32_t V_2 = 0; { V_0 = 0; List_1_t128053199 * L_0 = ___list0; NullCheck(L_0); int32_t L_1 = List_1_get_Count_m186164705(L_0, /*hidden argument*/List_1_get_Count_m186164705_RuntimeMethod_var); V_1 = L_1; V_2 = 0; goto IL_001e; } IL_0010: { int32_t L_2 = V_0; List_1_t128053199 * L_3 = ___list0; int32_t L_4 = V_2; NullCheck(L_3); int32_t L_5 = List_1_get_Item_m888956288(L_3, L_4, /*hidden argument*/List_1_get_Item_m888956288_RuntimeMethod_var); V_0 = ((int32_t)il2cpp_codegen_add((int32_t)L_2, (int32_t)L_5)); int32_t L_6 = V_2; V_2 = ((int32_t)il2cpp_codegen_add((int32_t)L_6, (int32_t)1)); } IL_001e: { int32_t L_7 = V_2; int32_t L_8 = V_1; if ((((int32_t)L_7) < ((int32_t)L_8))) { goto IL_0010; } } { int32_t L_9 = V_0; return L_9; } }
As we know, using methods of a generic type means IL2CPP will generate method initialization overhead for us. The first part of this function is just that and the rest is the actual loop. It’s a pretty literal translation of our C# code with calls to List_1_get_Count_m186164705
to get the Count
property and List_1_get_Item_m888956288
to use the indexer. Overall, it looks very similar to the array-based for
loop with length caching, null-, and bounds-checks except for those function calls. Let’s look at the functions to find out what they do before diving into the assembly.
#define List_1_get_Count_m186164705(__this, method) (( int32_t (*) (List_1_t128053199 *, const RuntimeMethod*))List_1_get_Count_m186164705_gshared)(__this, method) extern "C" int32_t List_1_get_Count_m186164705_gshared (List_1_t128053199 * __this, const RuntimeMethod* method) { { int32_t L_0 = (int32_t)__this->get__size_1(); return L_0; } } #define List_1_get_Item_m888956288(__this, p0, method) (( int32_t (*) (List_1_t128053199 *, int32_t, const RuntimeMethod*))List_1_get_Item_m888956288_gshared)(__this, p0, method) extern "C" int32_t List_1_get_Item_m888956288_gshared (List_1_t128053199 * __this, int32_t ___index0, const RuntimeMethod* method) { static bool s_Il2CppMethodInitialized; if (!s_Il2CppMethodInitialized) { il2cpp_codegen_initialize_method (List_1_get_Item_m888956288_MetadataUsageId); s_Il2CppMethodInitialized = true; } { int32_t L_0 = ___index0; int32_t L_1 = (int32_t)__this->get__size_1(); if ((!(((uint32_t)L_0) >= ((uint32_t)L_1)))) { goto IL_0017; } } { ArgumentOutOfRangeException_t777629997 * L_2 = (ArgumentOutOfRangeException_t777629997 *)il2cpp_codegen_object_new(ArgumentOutOfRangeException_t777629997_il2cpp_TypeInfo_var); ArgumentOutOfRangeException__ctor_m3628145864(L_2, (String_t*)_stringLiteral797640427, /*hidden argument*/NULL); IL2CPP_RAISE_MANAGED_EXCEPTION(L_2); } IL_0017: { Int32U5BU5D_t385246372* L_3 = (Int32U5BU5D_t385246372*)__this->get__items_0(); int32_t L_4 = ___index0; NullCheck(L_3); int32_t L_5 = L_4; int32_t L_6 = (L_3)->GetAt(static_cast<il2cpp_array_size_t>(L_5)); return L_6; } }
It turns out that each of these functions is really a macro to call an actual function. In the case of the Count
getter, it just returns a field of the List
. The indexer, however, is quite long. It has method initializaton overhead of its own since it can throw an exception. Then it performs its own bounds check to throw an ArgumentOutOfRangeException
. By calling GetAt
, it performs the exact same bounds check to see if it should throw an IndexOutOfRangeException
, which will never happen. It might as well disable bounds-checks since it performs its own bounds check, but it doesn’t.
With that in mind, let’s see how these function calls affect the assembly for the function:
push {r4, r5, r6, r7, lr} add r7, sp, #12 push.w {r8, r10} movw r5, :lower16:(__ZZ29TestClass_ForList_m2093568030E25s_Il2CppMethodInitialized-(LPC2_0+4)) mov r4, r1 movt r5, :upper16:(__ZZ29TestClass_ForList_m2093568030E25s_Il2CppMethodInitialized-(LPC2_0+4)) LPC2_0: add r5, pc ldrb r0, [r5] cbnz r0, LBB2_2 movw r0, :lower16:(L_TestClass_ForList_m2093568030_MetadataUsageId$non_lazy_ptr-(LPC2_1+4)) movt r0, :upper16:(L_TestClass_ForList_m2093568030_MetadataUsageId$non_lazy_ptr-(LPC2_1+4)) LPC2_1: add r0, pc ldr r0, [r0] ldr r0, [r0] bl __ZN6il2cpp2vm13MetadataCache24InitializeMethodMetadataEj movs r0, #1 strb r0, [r5] LBB2_2: cbnz r4, LBB2_4 bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEv LBB2_4: movw r0, :lower16:(L_List_1_get_Count_m186164705_RuntimeMethod_var$non_lazy_ptr-(LPC2_2+4)) movt r0, :upper16:(L_List_1_get_Count_m186164705_RuntimeMethod_var$non_lazy_ptr-(LPC2_2+4)) LPC2_2: add r0, pc ldr r0, [r0] ldr r1, [r0] mov r0, r4 bl _List_1_get_Count_m186164705_gshared mov r8, r0 movs r5, #0 cmp.w r8, #1 blt LBB2_9 movw r0, :lower16:(L_List_1_get_Item_m888956288_RuntimeMethod_var$non_lazy_ptr-(LPC2_3+4)) movs r6, #0 movt r0, :upper16:(L_List_1_get_Item_m888956288_RuntimeMethod_var$non_lazy_ptr-(LPC2_3+4)) LPC2_3: add r0, pc ldr.w r10, [r0] LBB2_6: cbnz r4, LBB2_8 bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEv LBB2_8: ldr.w r2, [r10] mov r0, r4 mov r1, r6 bl _List_1_get_Item_m888956288_gshared adds r6, #1 add r5, r0 cmp r8, r6 bne LBB2_6 LBB2_9: mov r0, r5 pop.w {r8, r10} pop {r4, r5, r6, r7, pc}
There’s a lot of code at the start for the method initialization, but after that it starts to look pretty normal. The major difference remains that there are now function calls for _List_1_get_Count_m186164705_gshared
and _List_1_get_Item_m888956288_gshared
that we looked at earlier. It’s worth noting that these were not inlined, even though _List_1_get_Count_m186164705_gshared
is just a single line of code. So we’ll have to eat the function call overhead once for the cached Count
getter call and every loop iteration to index into the contents of the List
.
List<T>
foreach
loop
Moving on, let’s see a foreach
loop on a List<T>
now that we know what a normal for
loop looks like:
static class TestClass { static int ForeachList(List<int> list) { int sum = 0; foreach (int cur in list) { sum += cur; } return sum; } }
Here’s the C++:
extern "C" int32_t TestClass_ForeachList_m2132133839 (RuntimeObject * __this /* static, unused */, List_1_t128053199 * ___list0, const RuntimeMethod* method) { static bool s_Il2CppMethodInitialized; if (!s_Il2CppMethodInitialized) { il2cpp_codegen_initialize_method (TestClass_ForeachList_m2132133839_MetadataUsageId); s_Il2CppMethodInitialized = true; } int32_t V_0 = 0; int32_t V_1 = 0; Enumerator_t2017297076 V_2; memset(&V_2, 0, sizeof(V_2)); Exception_t * __last_unhandled_exception = 0; NO_UNUSED_WARNING (__last_unhandled_exception); Exception_t * __exception_local = 0; NO_UNUSED_WARNING (__exception_local); int32_t __leave_target = 0; NO_UNUSED_WARNING (__leave_target); { V_0 = 0; List_1_t128053199 * L_0 = ___list0; NullCheck(L_0); Enumerator_t2017297076 L_1 = List_1_GetEnumerator_m593114157(L_0, /*hidden argument*/List_1_GetEnumerator_m593114157_RuntimeMethod_var); V_2 = L_1; } IL_0009: try { // begin try (depth: 1) { goto IL_001a; } IL_000e: { int32_t L_2 = Enumerator_get_Current_m207670954((&V_2), /*hidden argument*/Enumerator_get_Current_m207670954_RuntimeMethod_var); V_1 = L_2; int32_t L_3 = V_0; int32_t L_4 = V_1; V_0 = ((int32_t)il2cpp_codegen_add((int32_t)L_3, (int32_t)L_4)); } IL_001a: { bool L_5 = Enumerator_MoveNext_m3181700225((&V_2), /*hidden argument*/Enumerator_MoveNext_m3181700225_RuntimeMethod_var); if (L_5) { goto IL_000e; } } IL_0026: { IL2CPP_LEAVE(0x39, FINALLY_002b); } } // end try (depth: 1) catch(Il2CppExceptionWrapper& e) { __last_unhandled_exception = (Exception_t *)e.ex; goto FINALLY_002b; } FINALLY_002b: { // begin finally (depth: 1) Enumerator_Dispose_m222348240((&V_2), /*hidden argument*/Enumerator_Dispose_m222348240_RuntimeMethod_var); IL2CPP_END_FINALLY(43) } // end finally (depth: 1) IL2CPP_CLEANUP(43) { IL2CPP_JUMP_TBL(0x39, IL_0039) IL2CPP_RETHROW_IF_UNHANDLED(Exception_t *) } IL_0039: { int32_t L_6 = V_0; return L_6; } }
That’s a lot more C++, so let’s take it one step at a time. First, we have method initialization because we’re using methods of a generic class. Then we have what a foreach
loop breaks down into. It’s roughly equivalent to this C#:
var enumerator = list.GetEnumerator(); try { while (enumerator.MoveNext()) { var index = enumerator.Current; sum += list[index]; } } finally { enumerator.Dispose(); }
We can see all the parts of it clearly in the C++ code. List_1_GetEnumerator_m593114157
gets the enumerator, Enumerator_MoveNext_m3181700225
advances it, and Enumerator_get_Current_m207670954
gets the current value. Again, we need to look at these to find out what they do:
#define List_1_GetEnumerator_m593114157(__this, method) (( Enumerator_t2017297076 (*) (List_1_t128053199 *, const RuntimeMethod*))List_1_GetEnumerator_m593114157_gshared)(__this, method) extern "C" Enumerator_t2017297076 List_1_GetEnumerator_m593114157_gshared (List_1_t128053199 * __this, const RuntimeMethod* method) { { Enumerator_t2017297076 L_0; memset(&L_0, 0, sizeof(L_0)); Enumerator__ctor_m247851533((&L_0), (List_1_t128053199 *)__this, /*hidden argument*/IL2CPP_RGCTX_METHOD_INFO(method->declaring_type->rgctx_data, 23)); return L_0; } } #define Enumerator_MoveNext_m3181700225(__this, method) (( bool (*) (Enumerator_t2017297076 *, const RuntimeMethod*))Enumerator_MoveNext_m3181700225_gshared)(__this, method) extern "C" bool Enumerator_MoveNext_m3181700225_gshared (Enumerator_t2017297076 * __this, const RuntimeMethod* method) { int32_t V_0 = 0; { Enumerator_VerifyState_m1898450050((Enumerator_t2017297076 *)__this, /*hidden argument*/IL2CPP_RGCTX_METHOD_INFO(InitializedTypeInfo(method->declaring_type)->rgctx_data, 0)); int32_t L_0 = (int32_t)__this->get_next_1(); if ((((int32_t)L_0) >= ((int32_t)0))) { goto IL_0014; } } { return (bool)0; } IL_0014: { int32_t L_1 = (int32_t)__this->get_next_1(); List_1_t128053199 * L_2 = (List_1_t128053199 *)__this->get_l_0(); NullCheck(L_2); int32_t L_3 = (int32_t)L_2->get__size_1(); if ((((int32_t)L_1) >= ((int32_t)L_3))) { goto IL_0053; } } { List_1_t128053199 * L_4 = (List_1_t128053199 *)__this->get_l_0(); NullCheck(L_4); Int32U5BU5D_t385246372* L_5 = (Int32U5BU5D_t385246372*)L_4->get__items_0(); int32_t L_6 = (int32_t)__this->get_next_1(); int32_t L_7 = (int32_t)L_6; V_0 = (int32_t)L_7; __this->set_next_1(((int32_t)il2cpp_codegen_add((int32_t)L_7, (int32_t)1))); int32_t L_8 = V_0; NullCheck(L_5); int32_t L_9 = L_8; int32_t L_10 = (L_5)->GetAt(static_cast<il2cpp_array_size_t>(L_9)); __this->set_current_3(L_10); return (bool)1; } IL_0053: { __this->set_next_1((-1)); return (bool)0; } } #define Enumerator_get_Current_m207670954(__this, method) (( int32_t (*) (Enumerator_t2017297076 *, const RuntimeMethod*))Enumerator_get_Current_m207670954_gshared)(__this, method) extern "C" int32_t Enumerator_get_Current_m207670954_gshared (Enumerator_t2017297076 * __this, const RuntimeMethod* method) { { int32_t L_0 = (int32_t)__this->get_current_3(); return L_0; } }
Once again, these are macros that call real functions. List_1_GetEnumerator_m593114157_gshared
just returns a struct and Enumerator_get_Current_m207670954_gshared
just returns a field of that struct, but Enumerator_MoveNext_m3181700225_gshared
has a lot more going on. There’s nothing that surprising as it’s mostly just checking if the index has hit the end and caching the current value. There is a call to Enumerator_VerifyState_m1898450050
though, so let’s check that out:
#define Enumerator_VerifyState_m1898450050(__this, method) (( void (*) (Enumerator_t2017297076 *, const RuntimeMethod*))Enumerator_VerifyState_m1898450050_gshared)(__this, method) extern "C" void Enumerator_VerifyState_m1898450050_gshared (Enumerator_t2017297076 * __this, const RuntimeMethod* method) { static bool s_Il2CppMethodInitialized; if (!s_Il2CppMethodInitialized) { il2cpp_codegen_initialize_method (Enumerator_VerifyState_m1898450050_MetadataUsageId); s_Il2CppMethodInitialized = true; } { List_1_t128053199 * L_0 = (List_1_t128053199 *)__this->get_l_0(); if (L_0) { goto IL_0026; } } { Enumerator_t2017297076 L_1 = (*(Enumerator_t2017297076 *)__this); RuntimeObject * L_2 = Box(IL2CPP_RGCTX_DATA(InitializedTypeInfo(method->declaring_type)->rgctx_data, 2), &L_1); NullCheck((RuntimeObject *)L_2); Type_t * L_3 = Object_GetType_m88164663((RuntimeObject *)L_2, /*hidden argument*/NULL); NullCheck((Type_t *)L_3); String_t* L_4 = VirtFuncInvoker0< String_t* >::Invoke(18 /* System.String System.Type::get_FullName() */, (Type_t *)L_3); ObjectDisposedException_t21392786 * L_5 = (ObjectDisposedException_t21392786 *)il2cpp_codegen_object_new(ObjectDisposedException_t21392786_il2cpp_TypeInfo_var); ObjectDisposedException__ctor_m3603759869(L_5, (String_t*)L_4, /*hidden argument*/NULL); IL2CPP_RAISE_MANAGED_EXCEPTION(L_5); } IL_0026: { int32_t L_6 = (int32_t)__this->get_ver_2(); List_1_t128053199 * L_7 = (List_1_t128053199 *)__this->get_l_0(); NullCheck(L_7); int32_t L_8 = (int32_t)L_7->get__version_2(); if ((((int32_t)L_6) == ((int32_t)L_8))) { goto IL_0047; } } { InvalidOperationException_t56020091 * L_9 = (InvalidOperationException_t56020091 *)il2cpp_codegen_object_new(InvalidOperationException_t56020091_il2cpp_TypeInfo_var); InvalidOperationException__ctor_m237278729(L_9, (String_t*)_stringLiteral1621028992, /*hidden argument*/NULL); IL2CPP_RAISE_MANAGED_EXCEPTION(L_9); } IL_0047: { return; } }
Once again, there’s method initialization overhead here since this can throw an exception when the enumerator is in an invalid state, such as enumerating beyond the end of the List
or when modifying the List
during the foreach
loop. In addition to the method initialization check, both of these cases are checked for at each iteration of the loop.
With all of this expensive work present in the C++ code, let’s look at the assembly it compiles to. This is going to be long and there’s no need to understand it all or even read it all. Just taking in the broad strokes of the amount of code generated is probably enough to give some idea of its performance characteristics.
push {r4, r5, r6, r7, lr} add r7, sp, #12 push.w {r8, r10, r11} sub.w r4, sp, #64 bfc r4, #0, #4 mov sp, r4 vst1.64 {d8, d9, d10, d11}, [r4:128]! vst1.64 {d12, d13, d14, d15}, [r4:128] sub sp, #112 movw r5, :lower16:(__ZZ33TestClass_ForeachList_m2132133839E25s_Il2CppMethodInitialized-(LPC3_2+4)) mov r4, r1 movt r5, :upper16:(__ZZ33TestClass_ForeachList_m2132133839E25s_Il2CppMethodInitialized-(LPC3_2+4)) movw r0, :lower16:(L___gxx_personality_sj0$non_lazy_ptr-(LPC3_3+4)) movt r0, :upper16:(L___gxx_personality_sj0$non_lazy_ptr-(LPC3_3+4)) LPC3_2: add r5, pc LPC3_3: add r0, pc ldr r1, LCPI3_0 ldrb r6, [r5] ldr r0, [r0] LPC3_0: add r1, pc str r0, [sp, #84] ldr r0, LCPI3_1 str r1, [sp, #88] orr r0, r0, #1 str r7, [sp, #92] LPC3_1: add r0, pc str.w sp, [sp, #100] str r0, [sp, #96] add r0, sp, #60 bl __Unwind_SjLj_Register cbnz r6, LBB3_2 movw r0, :lower16:(L_TestClass_ForeachList_m2132133839_MetadataUsageId$non_lazy_ptr-(LPC3_4+4)) mov.w r1, #-1 movt r0, :upper16:(L_TestClass_ForeachList_m2132133839_MetadataUsageId$non_lazy_ptr-(LPC3_4+4)) str r1, [sp, #64] LPC3_4: add r0, pc ldr r0, [r0] ldr r0, [r0] bl __ZN6il2cpp2vm13MetadataCache24InitializeMethodMetadataEj movs r0, #1 strb r0, [r5] LBB3_2: vmov.i32 q8, #0x0 add r6, sp, #24 cmp r4, #0 vst1.64 {d16, d17}, [r6] bne LBB3_4 mov.w r0, #-1 str r0, [sp, #64] bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEv LBB3_4: movw r0, :lower16:(L_List_1_GetEnumerator_m593114157_RuntimeMethod_var$non_lazy_ptr-(LPC3_5+4)) add r5, sp, #8 movt r0, :upper16:(L_List_1_GetEnumerator_m593114157_RuntimeMethod_var$non_lazy_ptr-(LPC3_5+4)) mov r1, r4 LPC3_5: add r0, pc ldr r0, [r0] ldr r2, [r0] mov.w r0, #-1 str r0, [sp, #64] mov r0, r5 bl _List_1_GetEnumerator_m593114157_gshared vld1.64 {d16, d17}, [r5] movs r0, #0 vst1.64 {d16, d17}, [r6] movw r1, :lower16:(L_Enumerator_MoveNext_m3181700225_RuntimeMethod_var$non_lazy_ptr-(LPC3_6+4)) movt r1, :upper16:(L_Enumerator_MoveNext_m3181700225_RuntimeMethod_var$non_lazy_ptr-(LPC3_6+4)) LPC3_6: add r1, pc ldr r1, [r1] str r1, [sp, #4] movw r1, :lower16:(L_Enumerator_get_Current_m207670954_RuntimeMethod_var$non_lazy_ptr-(LPC3_7+4)) movt r1, :upper16:(L_Enumerator_get_Current_m207670954_RuntimeMethod_var$non_lazy_ptr-(LPC3_7+4)) LPC3_7: add r1, pc ldr r1, [r1] str r1, [sp] b LBB3_7 LBB3_5: ldr r1, [sp] movs r2, #1 ldr r1, [r1] str r2, [sp, #64] bl _Enumerator_get_Current_m207670954_gshared ldr r1, [sp, #44] add r6, sp, #24 add r0, r1 LBB3_7: str r0, [sp, #44] ldr r0, [sp, #4] ldr r1, [r0] movs r0, #2 str r0, [sp, #64] mov r0, r6 bl _Enumerator_MoveNext_m3181700225_gshared cmp r0, #0 add r0, sp, #24 bne LBB3_5 movs r5, #0 movs r4, #1 LBB3_10: movw r0, :lower16:(L_Enumerator_Dispose_m222348240_RuntimeMethod_var$non_lazy_ptr-(LPC3_8+4)) movt r0, :upper16:(L_Enumerator_Dispose_m222348240_RuntimeMethod_var$non_lazy_ptr-(LPC3_8+4)) str r5, [sp, #56] LPC3_8: add r0, pc ldr r0, [r0] ldr r1, [r0] mov.w r0, #-1 str r0, [sp, #64] add r0, sp, #24 bl _Enumerator_Dispose_m222348240_gshared ldr r0, [sp, #56] cbnz r4, LBB3_13 cbz r0, LBB3_13 ldr r0, [sp, #56] mov.w r1, #-1 str r1, [sp, #64] bl __ZN6il2cpp2vm9Exception5RaiseEP15Il2CppException add r0, sp, #60 ldr r4, [sp, #44] bl __Unwind_SjLj_Unregister mov r0, r4 add r4, sp, #112 vld1.64 {d8, d9, d10, d11}, [r4:128]! vld1.64 {d12, d13, d14, d15}, [r4:128] sub.w r4, r7, #24 mov sp, r4 pop.w {r8, r10, r11} pop {r4, r5, r6, r7, pc} LBB3_14: ldr r0, [sp, #64] cmp r0, #2 bls LBB3_16 trap LBB3_16: LCPI3_2: tbb [pc, r0] LJTI3_0: LBB3_18: b LBB3_20 LBB3_19: LBB3_20: ldr r0, [sp, #68] ldr r1, [sp, #72] strd r1, r0, [sp, #48] ldr r0, [sp, #48] cmp r0, #1 bne LBB3_22 ldr r0, [sp, #52] bl ___cxa_begin_catch ldr r5, [r0] mov.w r0, #-1 str r0, [sp, #64] bl ___cxa_end_catch movs r4, #0 b LBB3_10 LBB3_22: ldr r0, [sp, #52] mov.w r1, #-1 str r1, [sp, #64] bl __Unwind_SjLj_Resume
Notice that this isn’t the full assembly source for the loop. There are still calls to plenty of other functions like _List_1_GetEnumerator_m593114157_gshared
, _Enumerator_get_Current_m207670954_gshared
, _Enumerator_MoveNext_m3181700225_gshared
, and even _Enumerator_Dispose_m222348240_gshared
. The Dispose
function is just a single line, but we still get function call overhead for it.
IEnumerable<T>
foreach
loop
Finally for today, let’s look at a foreach
loop over an IEnumerable<T>
:
static class TestClass { static int ForeachEnumerable(IEnumerable<int> enumerable) { int sum = 0; foreach (int cur in enumerable) { sum += cur; } return sum; } }
And the C++:
extern "C" int32_t TestClass_ForeachEnumerable_m2471184119 (RuntimeObject * __this /* static, unused */, RuntimeObject* ___enumerable0, const RuntimeMethod* method) { static bool s_Il2CppMethodInitialized; if (!s_Il2CppMethodInitialized) { il2cpp_codegen_initialize_method (TestClass_ForeachEnumerable_m2471184119_MetadataUsageId); s_Il2CppMethodInitialized = true; } int32_t V_0 = 0; int32_t V_1 = 0; RuntimeObject* V_2 = NULL; Exception_t * __last_unhandled_exception = 0; NO_UNUSED_WARNING (__last_unhandled_exception); Exception_t * __exception_local = 0; NO_UNUSED_WARNING (__exception_local); int32_t __leave_target = 0; NO_UNUSED_WARNING (__leave_target); { V_0 = 0; RuntimeObject* L_0 = ___enumerable0; NullCheck(L_0); RuntimeObject* L_1 = InterfaceFuncInvoker0< RuntimeObject* >::Invoke(0 /* System.Collections.Generic.IEnumerator`1<!0> System.Collections.Generic.IEnumerable`1<System.Int32>::GetEnumerator() */, IEnumerable_1_t1930798642_il2cpp_TypeInfo_var, L_0); V_2 = L_1; } IL_0009: try { // begin try (depth: 1) { goto IL_0019; } IL_000e: { RuntimeObject* L_2 = V_2; NullCheck(L_2); int32_t L_3 = InterfaceFuncInvoker0< int32_t >::Invoke(0 /* !0 System.Collections.Generic.IEnumerator`1<System.Int32>::get_Current() */, IEnumerator_1_t3383516221_il2cpp_TypeInfo_var, L_2); V_1 = L_3; int32_t L_4 = V_0; int32_t L_5 = V_1; V_0 = ((int32_t)il2cpp_codegen_add((int32_t)L_4, (int32_t)L_5)); } IL_0019: { RuntimeObject* L_6 = V_2; NullCheck(L_6); bool L_7 = InterfaceFuncInvoker0< bool >::Invoke(1 /* System.Boolean System.Collections.IEnumerator::MoveNext() */, IEnumerator_t1853284238_il2cpp_TypeInfo_var, L_6); if (L_7) { goto IL_000e; } } IL_0024: { IL2CPP_LEAVE(0x36, FINALLY_0029); } } // end try (depth: 1) catch(Il2CppExceptionWrapper& e) { __last_unhandled_exception = (Exception_t *)e.ex; goto FINALLY_0029; } FINALLY_0029: { // begin finally (depth: 1) { RuntimeObject* L_8 = V_2; if (!L_8) { goto IL_0035; } } IL_002f: { RuntimeObject* L_9 = V_2; NullCheck(L_9); InterfaceActionInvoker0::Invoke(0 /* System.Void System.IDisposable::Dispose() */, IDisposable_t3640265483_il2cpp_TypeInfo_var, L_9); } IL_0035: { IL2CPP_END_FINALLY(41) } } // end finally (depth: 1) IL2CPP_CLEANUP(41) { IL2CPP_JUMP_TBL(0x36, IL_0036) IL2CPP_RETHROW_IF_UNHANDLED(Exception_t *) } IL_0036: { int32_t L_10 = V_0; return L_10; } }
This is pretty similiar to the foreach
loop on a List<T>
. It has method initialization and then the breakdown into GetEnumerator
, Current
getter, MoveNext
, and Dispose
with exception handling via a finally
block. The major difference is that InterfaceFuncInvoker0::Invoke
and InterfaceActionInvoker0::Invoke
are used to call all these functions, so they’ll all be slower calls than the non-virtual functions that were called when using a foreach
loop on a List<T>
.
Let’s look at the assembly for this. Just like with the foreach
loop on a List<T>
, it’s long and can just be skimmed to give a gist of its size:
push {r4, r5, r6, r7, lr} add r7, sp, #12 push.w {r8, r10, r11} sub.w r4, sp, #64 bfc r4, #0, #4 mov sp, r4 vst1.64 {d8, d9, d10, d11}, [r4:128]! vst1.64 {d12, d13, d14, d15}, [r4:128] sub sp, #96 movw r5, :lower16:(__ZZ39TestClass_ForeachEnumerable_m2471184119E25s_Il2CppMethodInitialized-(LPC4_2+4)) mov r4, r1 movt r5, :upper16:(__ZZ39TestClass_ForeachEnumerable_m2471184119E25s_Il2CppMethodInitialized-(LPC4_2+4)) movw r0, :lower16:(L___gxx_personality_sj0$non_lazy_ptr-(LPC4_3+4)) movt r0, :upper16:(L___gxx_personality_sj0$non_lazy_ptr-(LPC4_3+4)) LPC4_2: add r5, pc LPC4_3: add r0, pc ldr r1, LCPI4_0 ldrb r6, [r5] ldr r0, [r0] LPC4_0: add r1, pc str r0, [sp, #68] ldr r0, LCPI4_1 str r1, [sp, #72] orr r0, r0, #1 str r7, [sp, #76] LPC4_1: add r0, pc str.w sp, [sp, #84] str r0, [sp, #80] add r0, sp, #44 bl __Unwind_SjLj_Register cbnz r6, LBB4_2 movw r0, :lower16:(L_TestClass_ForeachEnumerable_m2471184119_MetadataUsageId$non_lazy_ptr-(LPC4_4+4)) mov.w r1, #-1 movt r0, :upper16:(L_TestClass_ForeachEnumerable_m2471184119_MetadataUsageId$non_lazy_ptr-(LPC4_4+4)) str r1, [sp, #48] LPC4_4: add r0, pc ldr r0, [r0] ldr r0, [r0] bl __ZN6il2cpp2vm13MetadataCache24InitializeMethodMetadataEj movs r0, #1 strb r0, [r5] LBB4_2: cbnz r4, LBB4_4 mov.w r0, #-1 str r0, [sp, #48] bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEv LBB4_4: movw r0, :lower16:(L_IEnumerable_1_t1930798642_il2cpp_TypeInfo_var$non_lazy_ptr-(LPC4_5+4)) mov r2, r4 movt r0, :upper16:(L_IEnumerable_1_t1930798642_il2cpp_TypeInfo_var$non_lazy_ptr-(LPC4_5+4)) movs r5, #0 LPC4_5: add r0, pc ldr r0, [r0] ldr r1, [r0] mov.w r0, #-1 str r0, [sp, #48] movs r0, #0 bl __ZN21InterfaceFuncInvoker0IP12Il2CppObjectE6InvokeEjP11Il2CppClassS1_ str r0, [sp, #20] movw r0, :lower16:(L_IEnumerator_t1853284238_il2cpp_TypeInfo_var$non_lazy_ptr-(LPC4_6+4)) movt r0, :upper16:(L_IEnumerator_t1853284238_il2cpp_TypeInfo_var$non_lazy_ptr-(LPC4_6+4)) LPC4_6: add r0, pc ldr r0, [r0] str r0, [sp, #16] movw r0, :lower16:(L_IEnumerator_1_t3383516221_il2cpp_TypeInfo_var$non_lazy_ptr-(LPC4_7+4)) movt r0, :upper16:(L_IEnumerator_1_t3383516221_il2cpp_TypeInfo_var$non_lazy_ptr-(LPC4_7+4)) LPC4_7: add r0, pc ldr r0, [r0] str r0, [sp, #12] b LBB4_9 LBB4_5: ldr r0, [sp, #20] cbnz r0, LBB4_7 movs r0, #1 str r0, [sp, #48] bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEv LBB4_7: ldr r0, [sp, #12] ldr r1, [r0] movs r0, #2 ldr r2, [sp, #20] str r0, [sp, #48] movs r0, #0 bl __ZN21InterfaceFuncInvoker0IiE6InvokeEjP11Il2CppClassP12Il2CppObject ldr r1, [sp, #24] adds r5, r0, r1 str r5, [sp, #24] ldr r0, [sp, #20] cbnz r0, LBB4_11 movs r0, #3 str r0, [sp, #48] bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEv LBB4_11: ldr r0, [sp, #16] ldr r1, [r0] movs r0, #4 ldr r2, [sp, #20] str r0, [sp, #48] movs r0, #1 bl __ZN21InterfaceFuncInvoker0IbE6InvokeEjP11Il2CppClassP12Il2CppObject cmp r0, #0 bne LBB4_5 movs r4, #0 movs r0, #54 LBB4_14: strd r0, r4, [sp, #36] ldr r0, [sp, #20] cbz r0, LBB4_16 movw r0, :lower16:(L_IDisposable_t3640265483_il2cpp_TypeInfo_var$non_lazy_ptr-(LPC4_8+4)) movt r0, :upper16:(L_IDisposable_t3640265483_il2cpp_TypeInfo_var$non_lazy_ptr-(LPC4_8+4)) LPC4_8: add r0, pc ldr r0, [r0] ldr r1, [r0] mov.w r0, #-1 ldr r2, [sp, #20] str r0, [sp, #48] movs r0, #0 bl __ZN23InterfaceActionInvoker06InvokeEjP11Il2CppClassP12Il2CppObject LBB4_16: ldr r1, [sp, #36] ldr r0, [sp, #40] cmp r1, #54 it ne cmpne r0, #0 beq LBB4_18 ldr r0, [sp, #40] mov.w r1, #-1 str r1, [sp, #48] bl __ZN6il2cpp2vm9Exception5RaiseEP15Il2CppException LBB4_18: add r0, sp, #44 ldr r4, [sp, #24] bl __Unwind_SjLj_Unregister mov r0, r4 add r4, sp, #96 vld1.64 {d8, d9, d10, d11}, [r4:128]! vld1.64 {d12, d13, d14, d15}, [r4:128] sub.w r4, r7, #24 mov sp, r4 pop.w {r8, r10, r11} pop {r4, r5, r6, r7, pc} LBB4_19: ldr r0, [sp, #48] cmp r0, #4 bls LBB4_21 trap LBB4_21: LCPI4_2: tbb [pc, r0] LJTI4_0: LBB4_23: b LBB4_27 LBB4_24: b LBB4_27 LBB4_25: b LBB4_27 LBB4_26: LBB4_27: ldr r0, [sp, #52] ldr r1, [sp, #56] strd r1, r0, [sp, #28] ldr r0, [sp, #28] cmp r0, #1 bne LBB4_29 ldr r0, [sp, #32] bl ___cxa_begin_catch ldr r4, [r0] mov.w r0, #-1 str r0, [sp, #48] bl ___cxa_end_catch movs r0, #0 b LBB4_14 LBB4_29: ldr r0, [sp, #32] mov.w r1, #-1 str r1, [sp, #48] bl __Unwind_SjLj_Resume
This looks very much like the assembly that was generated for foreach
on a List<T>
and lines up with the C++ quite well. We can see all the calls like __ZN23InterfaceActionInvoker06InvokeEjP11Il2CppClassP12Il2CppObject
for InterfaceActionInvoker0::Invoke
that have their own assembly elsewhere. There’s also all the calls to exception-related code like ___cxa_begin_catch
and __Unwind_SjLj_Resume
.
Conclusion
There is a clear winner here: for
loops on an array with the Length
field cached and the null- and bounds-checks disabled. It produces tiny, minimal assembly code for the CPU to run and should be preferred whenever we care about performance. Not disabling the null- and bounds-checks triples the number of if
checks in every iteration of the loop! Unfortunately, foreach
loops on an array use this very form. Even with the checks disabled, fetching the Length
field every iteration cannot be avoided by caching it as a local variable (i.e. into a CPU register). They should be avoided when performance matters.
Aside from plain arrays, performance takes a serious dive with List<T>
. No matter what kind of loop is used, calling methods of a generic type means type initialization overhead is generated for the function with the loop. Every iteration of the loop is slowed down with range checks, function calls to get the current item, and exception-induced overhead. Compared to the few instructions of assembly code that could have been generated, now we’re looking at dozens or even hundreds with expensive branching, cache misses, and function calls. The foreach
loop version is even worse than the for
version, but neither is good. Stick to plain arrays to avoid all this overhead.
Finally, there’s IEnumerable<T>
. It shouldn’t be used from the outset due to the GC allocation required to get its enumerator type for foreach
, the only loop option. Aside from that, its loop is the worst because every iteration requires two interface function calls to get the current value and advance the enumerator. It’s the worst kind of loop and should be used very sparingly.
#1 by VVEthan on March 5th, 2018 ·
First, thank you for these awesome in-depth investigations!
Secondly, I feel it might be worth mentioning the cases where optimal looping is critical (
Update()
methods, or anything else that runs each frame) and where it’s not (e.g. doing aforeach
on a collection of objects once when a user taps a button.)I only bring this up because I feel like one of the hardest, ongoing challenges of software is understanding how to take focused lessons like this article and use them productively in the context of writing a full blown application/game.
(I recently had some confusing discussions on Unity forums about how Coroutines are bad and never had a good reason to be used… except for the slew of scenarios in which they are useful and have no meaningful impact on performance.)
#2 by jackson on March 5th, 2018 ·
This is an excellent point. If you’re going to loop over five things once every ten seconds then it really doesn’t matter what kind of loop you use. Even a
foreach
loop over anIEnumerable<T>
might be acceptable and it creates garbage!This investigation is to arm you with the knowledge you need to make this kind of decision. It’s highly game-specific and even within the game, situation-dependent. Once you know the costs of each loop you’re able to judge the tradeoffs in terms of performance, readability, and other factors and decide on the appropriate type of collection, loop, IL2CPP settings, etc.
So far I haven’t discussed this much in the IL2CPP articles other than adding a “when performance matters” disclaimer to my conclusions, but perhaps I should call this out explicitly.
#3 by rick on November 30th, 2021 ·
Very interesting article! I would love to see this revisited in 2021 (or 2022?) because both C# and IL2CPP have evolved a lot since the time of writing.