Most programmers write code for an abstract computer. The thing is- code runs on a real computer that works in a specific way. Even if your game is going to run on a wide range of devices, knowing some of the common features can speed up your code 10x or more. Today’s article shows you how!

Broadly, all modern computers have the same configuration. They have a CPU, a GPU, and RAM. The CPU also has two or three levels of cache: L1, L2, and L3. Those caches are used to speed up RAM access because it is so slow. Exact numbers will vary by device, but according to this site here are the rough speeds:

  • L1 cache access: 1 nanosecond
  • L2 cache access: 4 nanoseconds
  • RAM access: 100 nanoseconds

This means that there’s a 25-100x advantage to using the memory that’s in the CPU’s cache compared to memory that’s in RAM.

Unfortunately, most programmers never think about this. For example, consider languages like C# where almost everything is a class allocated in RAM by the VM. Your class reference is essentially a pointer to a random location in RAM. An array of class instances therefore looks like this in RAM:

49887 32704 21426 53253 11753 71510 9790 40224 88383

Each of these entries is a pointer: a memory address. When you loop over these class instances and access them you’re essentially visiting random locations in memory. What are the odds that this random memory location is in one of the CPU’s caches? Virtually zero.

You see variations of this code all the time. Almost everything is some kind of class and almost every container—List, Dictionary—is looping over instances of those class. It’s rare in C# programs to see any code that acknowledges the existence of CPU cache.

So what if you wanted to take advantage of the CPU cache? I figure this would put your code in the top 10% in terms of performance, purely based on my experience reading a variety of code throughout my programming days. Fortunately C# is friendlier in this respect than languages like Java or AS3. It often comes down to using struct instead of class.

Say you have an array of structs instead of an array of classes. A struct is just its contents. There’s no pointer to the struct like there is with a class. So the array of structs is just the array of their contents. When you access the first one, the CPU loads a bunch of memory into its caches starting with that struct. That means when you access the next struct in the array it’s already in the cache.

Let’s see how this looks in practice. The following script has a ProjectileStruct and a ProjectileClass each with a Vector3 Position and Vector3 Velocity. It makes two arrays of them, shuffles them, and loops over them calling UpdateProjectile for each one. That function just moves the projectile’s position according to a linear velocity: projectile.Position += projectile.Velocity * time;.

Note that this is a similar test to this article but with several important differences.

using System;
 
using UnityEngine;
 
class TestScript : MonoBehaviour
{
	struct ProjectileStruct
	{
		public Vector3 Position;
		public Vector3 Velocity;
	}
 
	class ProjectileClass
	{
		public Vector3 Position;
		public Vector3 Velocity;
	}
 
	void Start()
	{
		const int count = 10000000;
		ProjectileStruct[] projectileStructs = new ProjectileStruct[count];
		ProjectileClass[] projectileClasses = new ProjectileClass[count];
		for (int i = 0; i < count; ++i)
		{
			projectileClasses[i] = new ProjectileClass();
		}
		Shuffle(projectileStructs);
		Shuffle(projectileClasses);
 
		System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
		for (int i = 0; i < count; ++i)
		{
			UpdateProjectile(ref projectileStructs[i], 0.5f);
		}
		long structTime = sw.ElapsedMilliseconds;
 
		sw.Reset();
		sw.Start();
		for (int i = 0; i < count; ++i)
		{
			UpdateProjectile(projectileClasses[i], 0.5f);
		}
		long classTime = sw.ElapsedMilliseconds;
 
		string report = string.Format(
			"Type,Time\n" +
			"Struct,{0}\n" +
			"Class,{1}\n",
			structTime,
			classTime
		);
		Debug.Log(report);
	}
 
	void UpdateProjectile(ref ProjectileStruct projectile, float time)
	{
		projectile.Position += projectile.Velocity * time;
	}
 
	void UpdateProjectile(ProjectileClass projectile, float time)
	{
		projectile.Position += projectile.Velocity * time;
	}
 
	public static void Shuffle<T>(T[] list)  
	{
		System.Random random = new System.Random();
		for (int n = list.Length; n > 1; )
		{
			n--;
			int k = random.Next(n + 1);
			T value = list[k];
			list[k] = list[n];
			list[n] = value;
		}
	}
}

If you want to try out the test yourself, simply paste the above code into a TestScript.cs file in your Unity project’s Assets directory and attach it to the main camera game object in a new, empty project. Then build in non-development mode, ideally with IL2CPP. I ran it that way on this machine:

  • LG Nexus 5X
  • Android 7.1.2
  • Unity 5.6.0f3, IL2CPP

And here are the results I got:

Type Time
Struct 250
Class 3365

Struct vs. Class Performance Graph

As predicted, roughly, by the CPU cache and RAM access times, we have a 13.46x speedup by using cache-friendly structs instead of classes. That’s a huge win! Who wouldn’t like their code to run an order of magnitude faster? By organizing your data in cache-friendly ways, you can realize huge performance increases over almost every C# programmer who sticks to the default, class-oriented programming style.

It’s worth noting that just using the CPU cache still leaves a ton of performance on the table. For example, you’ll need to use multiple threads to take advantage of all the CPU’s cores. You’ll need to max out the GPU, too. Games often do this with graphics rendering, but increasingly there is more general-purpose work to be done there with the likes of compute shaders. Those are topics for another day.