As we know, foreach loops create garbage when used with a List<T>. This happens the first time you iterate over one and it happens every time thereafter. A comment on that article shared a link to a class called FastList that was written expressly to solve the GC issue. Does it? How does its performance compare to plain old List? Today’s article puts it to the test to find out!

Reading through the FastList source code it quickly became apparent to me that it’s a wrapper around the regular System.Collections.Generic.List class. It actually derives from that class and inherits almost all of the functionality you get with List. The one part it customizes is at the heart of foreach loops: GetEnumerator.

The customized GetEnumerator uses an existing enumerator cached as a private field, if available. This means that subsequent foreach loops will reuse the enumerator fields rather than creating new garbage… mostly. Since an enumerator can only be used by one foreach loop at a time, a new enumerator will still have to be created if there is more than one loop going on at a time. This could be due to multiple threads looping over the same FastList or just direct usage of GetEnumerator rather than simple foreach loops.

What happens on the first call to GetEnumerator? Well, FastList has a nested CustomEnumerator class that it instantiates. This creates garbage, but only this first time due to the caching described above. CustomEnumerator implements IEnumerator and IEnumerator<T> so that it is a valid candidate for GetEnumerator to return. It also collaborates with FastList to allow itself to be reset (via MyReset) and to use FastList to get the real enumerator from List (via GetBaseEnumerator). Other than that, it’s a wrapper class too! It just wraps the calls to IEnumerator interface functions by passing them along to the enumerator it got from List.

So does it deliver on the promise of not creating garbage? To see, I plugged it into the same test app from the original article and observed the Unity Profiler in “deep” mode. Here’s what I found:

FastList First Run GC Allocations

FastList Second Run GC Allocations

As you can see, the first foreach loop causes 80 bytes to be allocated. You might remember that List allocated only 40 bytes so FastList starts off at a disadvantage. However, the second foreach loop uses the cached enumerator and doesn’t create any garbage at all! List, on the other hand, created another 40 bytes. So after just two loops the score is even, but each subsequent loop would cause List to allocate yet-another 40 bytes.

That’s a pretty nice win for FastList, but does it come at a performance price? To find out, I started with the performance test in this article and modified it so that all the test loops use foreach and added FastList. Here’s the final test script:

using System.Collections.Generic;
using System.Reflection;
using System.Runtime;
 
using UnityEngine;
 
public class TestScript : MonoBehaviour
{
	private string report;
 
	void Start()
	{
		GCSettings.LatencyMode = GCLatencyMode.LowLatency;
		report = string.Join(
			"\n",
			new string[]{
				"Test,Array Time,List Time,LinkedList Time,FastList Time",
				Test(1000000),
				Test(2000000),
				Test(3000000),
				Test(4000000),
				Test(5000000),
				Test(6000000),
				Test(7000000),
				Test(8000000),
				Test(9000000),
				Test(10000000)
			}
		);
	}
 
	string Test(int length)
	{
		var stopwatch = new System.Diagnostics.Stopwatch();
		var array = new int[length];
		var list = new List<int>();
		var linkedList = new LinkedList<int>();
		var fastList = new FastList<int>();
		for (var i = 0; i < length; ++i)
		{
			array[i] = i;
			list.Add(i);
			linkedList.AddLast(i);
			fastList.Add(i);
		}
 
		int sum;
 
		stopwatch.Reset();
		stopwatch.Start();
		sum = 0;
		foreach (var i in array)
		{
			sum += i;
		}
		var arrayTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		sum = 0;
		foreach (var i in list)
		{
			sum += i;
		}
		var listTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		sum = 0;
		foreach (var i in linkedList)
		{
			sum += i;
		}
		var linkedListTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		sum = 0;
		foreach (var i in fastList)
		{
			sum += i;
		}
		var fastListTime = stopwatch.ElapsedMilliseconds;
 
		return string.Join(
			",",
			new string[] {
				length.ToString(),
				arrayTime.ToString(),
				listTime.ToString(),
				linkedListTime.ToString(),
				fastListTime.ToString()
			}
		);
	}
 
	void OnGUI()
	{
		GUI.TextArea(new Rect(0, 0, Screen.width, Screen.height), report);
	}
}

If you want to try out the test yourself, simply paste the above code into a TestScript.cs file in your Unity project’s Assets directory and attach it to the main camera game object in a new, empty project. Then build in non-development mode for 64-bit processors and run it windowed at 640×480 with fastest graphics. I ran it that way on this machine:

  • 2.3 Ghz Intel Core i7-3615QM
  • Mac OS X 10.11.3
  • Unity 5.3.2f1, Mac OS X Standalone, x86_64, non-development
  • 640×480, Fastest, Windowed

And here are the results I got:

Test Array Time List Time LinkedList Time FastList Time
1000000 3 8 9 13
2000000 5 16 17 21
3000000 16 23 25 30
4000000 11 32 43 39
5000000 16 35 42 51
6000000 30 40 50 69
7000000 16 47 63 73
8000000 26 57 78 90
9000000 33 67 76 92
10000000 25 71 92 102

FastList Performance Graph

Charts like this will always have some noise, but you can see that the overall trend is pretty clear. The three list classes—List, FastList, and LinkedList—all scaled up at about a 45 degree angle: linear performance. Array used the CPU cache to do even better than linear performance due to some special case code that makes its foreach loop much more efficient and produce no garbage. FastList was the slowest of the bunch, but not by a tremendous margin. It probably suffers from extra function call overhead due to the wrapping code that makes it work.

In the end it looks like we have a solid class to get the benefits of List with much less garbage creation, provided we loop over it at least twice with foreach. The performance is nearly as good, too. Is that good enough to use it in your project? Let me know if you’re tempted in the comments!