Today we continue the series by looking at a pair of powerful, related features in C# 7.3: ref return values and local variables. These enable some great optimizations, so let’s look at the IL2CPP output for them to make sure it’s as good as it looks.

Simple Local ref Variables

Let’s start off the ref enhancements with local variables that are references to other local variables.

static class TestClass
{
    static int TestRefLocal(int x, int y)
    {
        ref int r = ref x;
        r = 30;
        r = ref y;
        r = 40;
        return x + y;
    }
}

This code creates r as a reference to x, not a copy of it. To do this, we declare r as ref int instead of just int and assign it to ref x instead of just x. Omitting either ref is a compiler error, so there’s no need to worry about accidentally making a copy.

We can then assign to the ref local variable, which changes the value it’s a reference to. In this case we start by assigning 30 to r which refers to x, so x is effectively changed to 30.

Then we reassign r to reference another local variable, y. To do this we simply assign it to ref y. There’s no need to add ref before the r this time and doing so is a compiler error. After this we once again assign to r, but this time it changes y because r now refers to that variable.

Now let’s see the C++ that IL2CPP generates for this in Unity 2018.3.0f2:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestRefLocal_m1B99E0AA6BD471E95C4DF28A0918787003C3D23A (int32_t ___x0, int32_t ___y1, const RuntimeMethod* method)
{
    {
        *((int32_t*)(&___x0)) = (int32_t)((int32_t)30);
        *((int32_t*)(&___y1)) = (int32_t)((int32_t)40);
        int32_t L_0 = ___x0;
        int32_t L_1 = ___y1;
        return ((int32_t)il2cpp_codegen_add((int32_t)L_0, (int32_t)L_1));
    }
}

The first two lines are rather complex, so let’s unpack them one bit at a time. On the right side of the assignment we see the int literals 30 and 40 casted to int32_t twice. Since int32_t is the same as int on current platforms, this effectively does nothing.

We read the left side of the assignment from right to left. First, we see &___x0 and &___y1. These get the address of the x and y parameters. Then we see (int32_t*) to cast them to pointers, which is unnecessary because they’re both int32_t so the address of them is already an int32_t* pointer. Finally, we see a * that dereferences the pointers so the assignment is to the memory location they point to.

These could have been simplified to just ___x0 = 30 and ___y0 = 40, but it looks like IL2CPP is preserving some of the ref from C#.

At the end of the function is a call to il2cpp_codegen_add to effectively add together two (unnecessary) copies of the parameters.

Now let’s look at the assembly that Xcode 10.1 generates for this C++ in an ARM64 iOS release build. There’s no need to understand it deeply as I’ll explain after each snippet of assembly code in this article. This particular example is quite simple:

movs  r0, #70
bx    lr

This is a great result for this function! The compiler has correctly realized that the return value is a constant 70, so all it does is return that.

Conclusion: Simple ref local variables result in sub-optimal C++ but the C++ compiler can fix this and emit great machine code.

Local ref Variables From the Ternary Operator

Next, let’s add some complication and conditionally assign to a ref local variable:

static class TestClass
{
    static int TestRefLocalFromTernary(int x, int y)
    {
        ref int r = ref (x > y ? ref x : ref y);
        r = -1;
        return x + y;
    }
}

Here we again make a ref int r, but this time we assign to it based on the result of a ternary operator expression. This adds even more ref keywords to the line since we now must specify ref before the expression as well as ref x, ref y, and ref int r. Omitting any of these will cause a compiler error. After that we can use the ref local just like before by assigning to it.

Now let’s check the C++ for this:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestRefLocalFromTernary_m99B93CDACECA931AC66C6274AF62995FD2BA83BE (int32_t ___x0, int32_t ___y1, const RuntimeMethod* method)
{
    int32_t* G_B3_0 = NULL;
    {
        int32_t L_0 = ___x0;
        int32_t L_1 = ___y1;
        if ((((int32_t)L_0) > ((int32_t)L_1)))
        {
            goto IL_0008;
        }
    }
    {
        G_B3_0 = (&___y1);
        goto IL_000a;
    }
 
IL_0008:
    {
        G_B3_0 = (&___x0);
    }
 
IL_000a:
    {
        *((int32_t*)G_B3_0) = (int32_t)(-1);
        int32_t L_2 = ___x0;
        int32_t L_3 = ___y1;
        return ((int32_t)il2cpp_codegen_add((int32_t)L_2, (int32_t)L_3));
    }
}

This generated a fair bit more code but, goto statements aside, it’s pretty straightforward. At the beginning we see an int32_t* G_B3_0 pointer that’ll come into play shortly. First, copies of the parameters are made and compared. If x is greater than y then we goto a block where G_B3_0 is assigned the address of x. Otherwise, G_B3_0 is assigned the address of y.

At this point we effectively have G_B3_0 as r, so we can proceed to use it. That occurs in the final block of the function which starts with another complex line. On the right we see the literal -1 which is needlessly cast to int32_t. On the left we see G_B3_0/r needlessly cast from int32_t* to int32_t* before being dereferenced with *. So this effectively assigns -1 to the memory that r points to. Then the function ends with the same verbose addition as in the first example.

Let’s move on and see what this compiles to:

sub    sp, #8
mov    r2, sp
add    r3, sp, #4
strd   r1, r0, [sp]
cmp    r0, r1
it     le
movle  r3, r2
mov.w  r0, #-1
str    r0, [r3]
ldrd   r1, r0, [sp]
add    r0, r1
add    sp, #8
bx     lr

This is a fairly literal translation of the C++ with only one nice change. The ternary operator (?:) in C# that became an if plus goto in C++ has been transformed again to a conditional move of either ref x or ref y to r. All of the pointless blocks, redundant casting, and unconditional goto statements have been kindly removed by the C++ compiler.

Conclusion: Assigning to a ref local from a ternary operator expression generates straightforward machine code with only one conditional move.

Returning ref values

Now let’s try returning a ref value from a function:

static class TestClass
{
    static ref int TestReturnRef(int[] a)
    {
        return ref a[0];
    }
 
    static int TestCallReturnRef(int[] a)
    {
        ref int r = ref TestReturnRef(a);
        r = 10;
        return r;
    }
}

TestReturnRef returns a reference to the first element of the a managed array rather than a copy of it.

TestCallReturnRef creates a local ref variable and assigns it from ref TestReturn(a) instead of a reference to a local variable or parameter as we’ve seen before.

Let’s see what C++ is generated for TestReturnRef:

extern "C" IL2CPP_METHOD_ATTR int32_t* TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method)
{
    {
        Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0;
        NullCheck(L_0);
        return (int32_t*)(((L_0)->GetAddressAt(static_cast<il2cpp_array_size_t>(0))));
    }
}

This just has two effective elements: a null check and a call to an accessor that gets a pointer to the element at a given index. The former is simple, but the latter includes some unnecessary casting so it’s a bit harder to read. Let’s look at GetAddressAt:

inline int32_t* GetAddressAt(il2cpp_array_size_t index)
{
    IL2CPP_ARRAY_BOUNDS_CHECK(index, (uint32_t)(this)->max_length);
    return m_Items + index;
}

This is just a simple offset from the pointer to the first element (m_Items) to the element at the given index: index. It includes a bounds-check though, so let’s look at that:

// Performance optimization as detailed here: http://blogs.msdn.com/b/clrcodegeneration/archive/2009/08/13/array-bounds-check-elimination-in-the-clr.aspx
// Since array size is a signed int32_t, a single unsigned check can be performed to determine if index is less than array size.
// Negative indices will map to a unsigned number greater than or equal to 2^31 which is larger than allowed for a valid array.
#define IL2CPP_ARRAY_BOUNDS_CHECK(index, length)     do {         if (((uint32_t)(index)) >= ((uint32_t)length)) il2cpp::vm::Exception::Raise (il2cpp::vm::Exception::GetIndexOutOfRangeException());     } while (0)

As the comment notes, this macro just does the one if check and throws an exception if the index is out of bounds. Note that the do-while loop doesn’t really loop since its condition is 0.

Here’s how TestReturnRef ends up looking in assembly:

    push    {r4, r7, lr}
    add     r7, sp, #4
    mov     r4, r0
    cbnz    r4, LBB40_2
    movs    r0, #0
    bl      __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEP19Il2CppSequencePoint
LBB40_2:
    ldr     r0, [r4, #12]
    cbnz    r0, LBB40_4
    bl      __ZN6il2cpp2vm9Exception27GetIndexOutOfRangeExceptionEv
    movs    r1, #0
    movs    r2, #0
    bl      __ZN6il2cpp2vm9Exception5RaiseEP15Il2CppExceptionP19Il2CppSequencePointP10MethodInfo
LBB40_4:
    add.w   r0, r4, #16
    pop     {r4, r7, pc}

This is nearly all error-checking. First we see the null check and NullReferenceException then we see the bounds check and IndexOutOfRangeException. Only at the very end is the actual work: one add instruction. To remove all this, use the appropriate IL2CPP attributes.

Next, let’s see the C++ for TestCallReturnRef:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCallReturnRef_mD752CDE161171917868820ABCD46C50F4A021471 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method)
{
    {
        Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0;
        int32_t* L_1 = TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77(L_0, /*hidden argument*/NULL);
        int32_t* L_2 = L_1;
        *((int32_t*)L_2) = (int32_t)((int32_t)10);
        int32_t L_3 = *((int32_t*)L_2);
        return L_3;
    }
}

This calls the C++ function for TestReturnRef and stores the return value in a local int32_t* pointer. Then we see the usual overcomplicated assignment of 10 to the pointer. Finally, the pointer is dereferenced to get the return value.

Here’s what this looks like in the assembly output:

push   {r7, lr}
mov    r7, sp
bl     _TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77
movs   r1, #10
str    r1, [r0]
movs   r0, #10
pop    {r7, pc}

This calls TestReturnRef, which hasn’t been inlined. It writes 10 to the memory pointed to by the returned pointer and returns 10. This is a nice optimization that skips dereferencing the pointer since the result is already known to be 10.

Conclusion: Returning ref values is also implemented in a straightforward way but, as always, watch out for null- and bounds-checks with managed arrays.

Readonly ref

Closely related to returning ref values is returning readonly ref values. These indicate to the caller that they may not change the memory pointed to by the returned reference. Here's how it looks:

class TestClass
{
    static ref readonly int TestReturnReadonlyRef(int[] a)
    {
        return ref a[0];
    }
 
    static int TestCallReturnReadonlyRef(int[] a)
    {
        ref readonly int r = ref TestReturnReadonlyRef(a);
        return r;
    }
}

The changes here are to add readonly after the ref keyword in the function definition and call site. It can't be added before ref in either location. The caller also can't omit readonly as the function requires this. However, the caller can opt to use ref readonly with a non-reeadonly ref return value if they don't want to ever change the reference. Note that this is not possible with ref parameters that've always been in the language.

Let's see how TestReturnReadonlyRef translates to C++:

extern "C" IL2CPP_METHOD_ATTR int32_t* TestClass_TestReturnReadonlyRef_mBA7C632040BADB6B0AE690DCDB463E0EF2F30F55 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method)
{
    {
        Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0;
        NullCheck(L_0);
        return ((L_0)->GetAddressAt(static_cast<il2cpp_array_size_t>(0)));
    }
}

This is identical to TestReturnRef except that a redundant cast at the end has been omitted. Let's check to make sure the assembly is the same, too:

    push    {r4, r7, lr}
    add     r7, sp, #4
    mov     r4, r0
    cbnz    r4, LBB41_2
    movs    r0, #0
    bl      __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEP19Il2CppSequencePoint
LBB41_2:
    ldr     r0, [r4, #12]
    cbnz    r0, LBB41_4
    bl      __ZN6il2cpp2vm9Exception27GetIndexOutOfRangeExceptionEv
    movs    r1, #0
    movs    r2, #0
    bl      __ZN6il2cpp2vm9Exception5RaiseEP15Il2CppExceptionP19Il2CppSequencePointP10MethodInfo
LBB41_4:
    add.w   r0, r4, #16
    pop     {r4, r7, pc}

Yes, this is identical to the assembly for TestReturnRef so let's move on to the C++ for TestCallReturnReadonlyRef:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCallReturnReadonlyRef_m516535EE352B6440998A00B5D9F21E6480EDA215 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method)
{
    {
        Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0;
        int32_t* L_1 = TestClass_TestReturnReadonlyRef_mBA7C632040BADB6B0AE690DCDB463E0EF2F30F55(L_0, /*hidden argument*/NULL);
        int32_t L_2 = *((int32_t*)L_1);
        return L_2;
    }
}

This function is slightly shorter since it doesn't include the r = 10; line because r is now readonly and that would result in a C# compiler error. Let's see what impact that has on the assembly:

push   {r7, lr}
mov    r7, sp
bl     _TestClass_TestReturnReadonlyRef_mBA7C632040BADB6B0AE690DCDB463E0EF2F30F55
ldr    r0, [r0]
pop    {r7, pc}

This calls TestReturnReadonlyRef and then dereferences the pointer to get the return value. The dereferencing is necessary here because, unlike in TestCallReturnRef, we hadn't just assigned 10 to r so the compiler doesn't already know the value.

Conclusion: Returning readonly ref or declaring local ref variables readonly provides some nice error-checking against inadvertently changing the memory pointed to by ref variables while not incurring any runtime overhead.

Storing ref Returns By Value

In today's final example, let's try omitting the ref keyword in the local variable that stores the return value of a function returning ref.

static class TestClass
{
    static int TestCallReturnRefByValue(int[] a)
    {
        int r = TestReturnRef(a);
        return r;
    }
}

Here's the IL2CPP output:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCallReturnRefByValue_m79A8BC0DCD1104C33F93B2BE5B24B9DBFD926B18 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method)
{
    {
        Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0;
        int32_t* L_1 = TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77(L_0, /*hidden argument*/NULL);
        int32_t L_2 = *((int32_t*)L_1);
        return L_2;
    }
}

This looks just like the C++ for TestCallReturnReadonlyRef. The return value is still stored in an int32_t* and then dereferenced into just an int32_t. This has simply been done for us as syntactic sugar for writing ref int r = TestReturnRef(); then int r2 = r;. Since this C++ is the same, the assembly should be the same too:

push   {r7, lr}
mov    r7, sp
bl     _TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77
ldr    r0, [r0]
pop    {r7, pc}

Indeed, the generated assembly is identical.

Conclusion: Storing the return value of a function that returns ref into a non-ref local variable is convenient, if error-prone, syntactic sugar for storing it in a ref variable and then dereferencing it into a non-ref variable.

Conclusion

ref local variables and return values, including ref readonly are implemented well without any nasty overhead such as method initialization. The end-result assembly is uniformly optimal in all cases. This is great because, unlike features like tuples and pattern matching, this truly adds new capability to the language. We now have the ability to write faster code without resorting to unsafe language features like pointers.

As an example of this, consider the following code:

class World
{
    public int ExpensiveGet(int id) { /* ... */ }
    public void ExpensiveSet(int id, int value) { /* ... */ }
}
 
int value = world.ExpensiveGet(myId);
world.ExpensiveSet(id, value + 1);

This code simply wanted to increment some value that could only be found through an expensive (i.e. slow) process. What happens is that the slow process is conducted and then the results of that search are lost and only the value is returned. In order to then set the value, the whole expensive search must be conducted again just to get back to the same point.

Alternatives in previous versions of C# have revolved around generic programming:

class World
{
    public void ExpensiveChange(int id, Func<int, int> f) { /* ... */ }
}
 
world.ExpensiveChange(myId, val => val + 1);

This conducts the expensive search and, once it's found what it's looking for, calls a delegate with the found value and uses the delegate's return value to set the new value. This avoids two expensive searches, but incurs all the cost of delegates.

Now with C# 7.3 we can make use of ref returns and ref local variables to avoid both the cost of double searches and that of delegates:

class World
{
    public ref int ExpensiveGet(int id) { /* ... */ }
}
 
ref int r = ref world.ExpensiveGet(myId);
r++;

And, from what we've seen in today's C++ and assembly analysis, we can rest assured that the machine code the CPU actually executes for this will be quite fast.