The story usually has three parts. First, find the highest CPU cost functions in a profiler. Second, look at the corresponding C++ code that IL2CPP generated from C#. Third, stop using more parts of C#. Today’s article explores some more IL2CPP output and discovers some more areas of C# that are shockingly expensive to use.

String Literals

The first question that comes to mind here is “how can string literals possibly be expensive?” Let’s see! Here’s a simple C# function:

static class TestClass
{
	static string StringLiteral()
	{
		return "some string";
	}
}

Now here’s the C++ IL2CPP generates:

// System.String TestClass::StringLiteral()
extern "C"  String_t* TestClass_StringLiteral_m3609919544 (RuntimeObject * __this /* static, unused */, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_StringLiteral_m3609919544_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		return _stringLiteral1751653916;
	}
}

The first part of the function is the expensive part. Static local variables like s_Il2CppMethodInitialized are unlike non-static local variables. Normal, non-static local variables are often mapped to a CPU register. Reads from and writes to CPU registers are essentially “free” operations. On the contrary, static local variables are stored in memory along with the stack and heap. Memory accesses may “hit” or “miss” at each level of the CPU’s cache: L1, L2, L3, etc. Each level of cache is an order of magnitude more expensive to access and RAM is an order of magnitude more expensive than the last cache level.

The question becomes “how likely is it that s_Il2CppMethodInitialized is in CPU cache?” Since CPU caches contain the most recently used areas of memory, will s_Il2CppMethodInitialized have been recently used? Since it’s a static local variable, it can only be accessed by this function. That makes the question “how much RAM was accessed between two beginnings of the function?” Any memory reads between the beginning of this function and the next call to this function is going to use up CPU cache and, eventually, s_Il2CppMethodInitialized will fall out of L1 then L2 then L3.

So if this function isn’t frequently called then it’s likely that the static local variable will need to be read from RAM at a cost of about 100 nanoseconds. That’s about enough time to calculate 14 square roots on a 2 GHz ARM chip. Think of all the times we use Vector3.sqrMagnitude to avoid just one square root!

The other cost is due to the if which also must be evaluated each time the function is called. This may throw off the CPU’s branch predictor leading to wasted work. It’s hard to measure exactly, but branch instructions generated by an if are definitely not free and will add even more overhead to the function.

Exceptions

Throwing exceptions should be extremely unusual, so we shouldn’t focus too much on the cost of that. However, we should focus on the cost of code that could throw exceptions but doesn’t actually throw. This is very common, especially in .NET classes like List<T> that bounds-check all index operations so that it can throw an ArgumentOutOfRangeException instead of an IndexOutOfRangeException.

With that in mind, let’s look at another simple C# function:

static class TestClass
{
	static void ThrowException()
	{
		throw new Exception();
	}
}

And here’s the IL2CPP output:

// System.Void TestClass::ThrowException()
extern "C"  void TestClass_ThrowException_m3269612699 (RuntimeObject * __this /* static, unused */, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_ThrowException_m3269612699_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		Exception_t * L_0 = (Exception_t *)il2cpp_codegen_object_new(Exception_t_il2cpp_TypeInfo_var);
		Exception__ctor_m213470898(L_0, /*hidden argument*/NULL);
		IL2CPP_RAISE_MANAGED_EXCEPTION(L_0);
	}
}

Again, we have the same static local variable as with the string literal. It’ll take 1-100 nanoseconds to access depending on what happened between the start of the function that throws an exception and the next start of the same function. Put another way, it costs as little as 0.14 square roots and as much as 14 square roots. On the low end for a function that doesn’t do much and gets called over and over, it’s not a big deal. On the high end for a function that does a lot or doesn’t get called repeatedly, it’s a very high cost!

Generics

Let’s define a couple of C# classes: one doesn’t use generics and one does:

class NonGenericClass
{
	public int Field;
	public void NonGenericMethod() { }
	public void GenericMethod<TMethod>() { }
}
 
class GenericClass<TClass>
{
	public int Field;
	public void NonGenericMethod() { }
	public void NonGenericMethodUsingClassParam(TClass t) { }
	public void GenericMethod<TMethod>() { }
}

Now let’s try accessing the Field of each one:

static class TestClass
{
	static int UseFieldOfNonGenericClass(NonGenericClass x)
	{
		return x.Field;
	}
 
	static int UseFieldOfGenericClass(GenericClass<int> x)
	{
		return x.Field;
	}
}

Here’s the IL2CPP output:

// System.Int32 TestClass::UseFieldOfNonGenericClass(NonGenericClass)
extern "C"  int32_t TestClass_UseFieldOfNonGenericClass_m950261640 (RuntimeObject * __this /* static, unused */, NonGenericClass_t1883130525 * ___x0, const RuntimeMethod* method)
{
	{
		NonGenericClass_t1883130525 * L_0 = ___x0;
		NullCheck(L_0);
		int32_t L_1 = L_0->get_Field_0();
		return L_1;
	}
}
 
// System.Int32 TestClass::UseFieldOfGenericClass(GenericClass`1<System.Int32>)
extern "C"  int32_t TestClass_UseFieldOfGenericClass_m2357377390 (RuntimeObject * __this /* static, unused */, GenericClass_1_t3595952750 * ___x0, const RuntimeMethod* method)
{
	{
		GenericClass_1_t3595952750 * L_0 = ___x0;
		NullCheck(L_0);
		int32_t L_1 = L_0->get_Field_0();
		return L_1;
	}
}

The IL2CPP output is identical and minimal except for the unnecessary local pointer variable L_0 which will probably be optimized out by the compiler. The null check can be disabled in C# by adding an attribute.

Now let’s call a non-generic method of both classes:

static class TestClass
{
	static void CallNonGenericMethodOfNonGenericClass(NonGenericClass x)
	{
		x.NonGenericMethod();
	}
 
	static void CallNonGenericMethodOfGenericClass(GenericClass<int> x)
	{
		x.NonGenericMethod();
	}
}

Here’s the IL2CPP output for these functions:

// System.Void TestClass::CallNonGenericMethodOfNonGenericClass(NonGenericClass)
extern "C"  void TestClass_CallNonGenericMethodOfNonGenericClass_m2295610642 (RuntimeObject * __this /* static, unused */, NonGenericClass_t1883130525 * ___x0, const RuntimeMethod* method)
{
	{
		NonGenericClass_t1883130525 * L_0 = ___x0;
		NullCheck(L_0);
		NonGenericClass_NonGenericMethod_m1663704318(L_0, /*hidden argument*/NULL);
		return;
	}
}
 
// System.Void TestClass::CallNonGenericMethodOfGenericClass(GenericClass`1<System.Int32>)
extern "C"  void TestClass_CallNonGenericMethodOfGenericClass_m348649891 (RuntimeObject * __this /* static, unused */, GenericClass_1_t3595952750 * ___x0, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_CallNonGenericMethodOfGenericClass_m348649891_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		GenericClass_1_t3595952750 * L_0 = ___x0;
		NullCheck(L_0);
		GenericClass_1_NonGenericMethod_m2518718539(L_0, /*hidden argument*/GenericClass_1_NonGenericMethod_m2518718539_RuntimeMethod_var);
		return;
	}
}

The non-generic version is minimal, but the version calling a method of a generic class gets the same method initialization code involving a static local variable. That means we could suffer up to the equivalent of 14 square roots in overhead.

Finally, let’s call a generic method of both classes:

static class TestClass
{
	static void CallGenericMethodOfNonGenericClass(NonGenericClass x)
	{
		x.GenericMethod<int>();
	}
 
	static void CallGenericMethodOfGenericClass(GenericClass<int> x)
	{
		x.GenericMethod<int>();
	}
}

Here’s the IL2CPP output:

// System.Void TestClass::CallGenericMethodOfNonGenericClass(NonGenericClass)
extern "C"  void TestClass_CallGenericMethodOfNonGenericClass_m461336270 (RuntimeObject * __this /* static, unused */, NonGenericClass_t1883130525 * ___x0, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_CallGenericMethodOfNonGenericClass_m461336270_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		NonGenericClass_t1883130525 * L_0 = ___x0;
		NullCheck(L_0);
		NonGenericClass_GenericMethod_TisInt32_t2950945753_m4017529516(L_0, /*hidden argument*/NonGenericClass_GenericMethod_TisInt32_t2950945753_m4017529516_RuntimeMethod_var);
		return;
	}
}
 
// System.Void TestClass::CallGenericMethodOfGenericClass(GenericClass`1<System.Int32>)
extern "C"  void TestClass_CallGenericMethodOfGenericClass_m3349609793 (RuntimeObject * __this /* static, unused */, GenericClass_1_t3595952750 * ___x0, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_CallGenericMethodOfGenericClass_m3349609793_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		GenericClass_1_t3595952750 * L_0 = ___x0;
		NullCheck(L_0);
		GenericClass_1_GenericMethod_TisInt32_t2950945753_m621858916(L_0, /*hidden argument*/GenericClass_1_GenericMethod_TisInt32_t2950945753_m621858916_RuntimeMethod_var);
		return;
	}
}

This yields identical code, but both versions have the same method initialization code with a static local variable.

Conclusion

Avoid string literals, exceptions, generic types, and generic methods in any code you want to run fast. Alternative: write your scripts in C++ and none of these problems will apply to you.