Today’s article takes a break from the iterator series to investigate an interesting anomaly with the List.ForEach function: it’s surprisingly fast! So fast that it’s actually competitive with regular old for, foreach, and while functions. How can it be so fast when it has to call a delegate that you pass it for every single loop iteration? Read on for to find out!

Roger writes in his comment that he added List.ForEach to the comparison in the previous article with surprisingly fast results. In the extreme case, his List.ForEach using a lambda for the delegate parameter actually outperformed any other loop type with a list of 1000 elements!

If you’re not familiar with List.ForEach, here’s a quick example of how to use it:

var fruits = new List<string>{ "apple", "banana", "cherry" };
fruits.ForEach(fruit => Debug.Log("fruit is: " + fruit));
// prints:
//   fruit is: apple
//   fruit is: banana
//   fruit is: cherry

I ran a quick test in the same environment (computer, Unity version, etc.) as the original test and largely reproduced his results. I couldn’t ever beat the fastest raw loop (while) with any list length, but the results were still quite surprising.

I decided to expand a bit on this finding. First, the times for using a lambda delegate versus a method delegate were essentially the same so I removed the method delegate. Second, I added a cached delegate as described in the iterator series. This avoids creating a new delegate for each run of the loop and also avoids any garbage creation. Third, I added a ForEach extension method for arrays to compare against List.ForEach.

Here’s my test script after these three changes:

using System;
using System.Collections.Generic;
using UnityEngine;
 
public static class ArrayExtensions
{
	public static void ForEach<T>(this T[] array, Action<T> callback)
	{
		for (int i = 0, len = array.Length; i < len; ++i)
		{
			callback(array[i]);
		}
	}
}
 
public class TestScript : MonoBehaviour
{
	private const int NumIterations = 100000;
	private static int delegateSum;
	private static Action<Counter> ForEachDelegate = c => delegateSum += c.Value;
 
	private class Counter
	{
		public int Value;
	}
 
	private string report;
 
	void Start()
	{
		report = string.Join(
			",",
			new[]
			{
				"Size",
				"Array For",
				"Array While",
				"Array Foreach",
				"Array ForEach (Uncached)",
				"Array ForEach (Cached)",
				"List For",
				"List While",
				"List foreach",
				"List Foreach (Uncached)",
				"List ForEach (Cached)"
			}
		) + "\n";
		var sizes = new []{ 10, 100, 1000 };
		foreach (var size in sizes)
		{
			report += Test(size);
		}
	}
 
	private string Test(int size)
	{
		var stopwatch = new System.Diagnostics.Stopwatch();
		var sum = 0;
		var array = new Counter[size];
		for (int i = 0; i < size; ++i)
		{
			array[i] = new Counter() { Value = i };
		}
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			for (int i = 0, len = array.Length; i < len; ++i)
			{
				sum += array[i].Value;
			}
		}
		var arrayForTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			var i = 0;
			var len = array.Length;
			while (i < len)
			{
				sum += array[i].Value;
				++i;
			}
		}
		var arrayWhileTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			foreach (var cur in array)
			{
				sum += cur.Value;
			}
		}
		var arrayForeachTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			array.ForEach(c => sum += c.Value);
		}
		var arrayForEachUncachedTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			array.ForEach(ForEachDelegate);
		}
		var arrayForEachCachedTime = stopwatch.ElapsedMilliseconds;
 
		var list = new List<Counter>(size);
		list.AddRange(array);
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			for (int i = 0, len = list.Count; i < len; ++i)
			{
				sum += list[i].Value;
			}
		}
		var listForTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			var i = 0;
			var len = list.Count;
			while (i < len)
			{
				sum += list[i].Value;
				++i;
			}
		}
		var listWhileTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			foreach (var cur in list)
			{
				sum += cur.Value;
			}
		}
		var listForeachTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			list.ForEach(c => sum += c.Value);
		}
		var listForEachUncachedTime = stopwatch.ElapsedMilliseconds;
 
		stopwatch.Reset();
		stopwatch.Start();
		for (var iteration = 0; iteration < NumIterations; ++iteration)
		{
			list.ForEach(ForEachDelegate);
		}
		var listForEachCachedTime = stopwatch.ElapsedMilliseconds;
 
		return size + ","
			+ arrayForTime + ","
			+ arrayWhileTime + ","
			+ arrayForeachTime + ","
			+ arrayForEachUncachedTime + ","
			+ arrayForEachCachedTime + ","
			+ listForTime + ","
			+ listWhileTime + ","
			+ listForeachTime + ","
			+ listForEachUncachedTime + ","
			+ listForEachCachedTime + "\n";
	}
 
	void OnGUI()
	{
		GUI.TextArea(new Rect(0, 0, Screen.width, Screen.height), report);
	}
}

If you want to try out the test yourself, simply paste the above code into a TestScript.cs file in your Unity project’s Assets directory and attach it to the main camera game object in a new, empty project. Then build in non-development mode for 64-bit processors and run it windowed at 640×480 with fastest graphics. I ran it that way on this machine:

  • 2.3 Ghz Intel Core i7-3615QM
  • Mac OS X 10.11.5
  • Unity 5.3.4f1, Mac OS X Standalone, x86_64, non-development
  • 640×480, Fastest, Windowed

And here are the results I got:

Size Array For Array While Array Foreach Array ForEach (Uncached) Array ForEach (Cached) List For List While List foreach List Foreach (Uncached) List ForEach (Cached)
10 4 3 3 20 4 4 3 15 20 5
100 31 29 28 53 35 45 42 99 56 42
1000 291 297 265 345 339 430 407 915 437 429

Loop Performance (array of 10)

Loop Performance (array of 100)

Loop Performance (array of 1000)

Loop Performance (list of 10)

Loop Performance (list of 100)

Loop Performance (list of 1000)

Since the ForEach extension method for arrays itself uses a for loop, the cost of calling the delegate is only additive. It’s always slower than just a for loop, but does allow for a more functional style consistent with List.ForEach.

One huge difference is between the cached and uncached versions of the delegates. The uncached version simply passes a lambda: array.ForEach(c => sum += c.Value);. The cached version stores the lambda in a static delegate variable so it’s not created every time the loop is run. This pays off hugely for short 10 element arrays, less for medium 100 element arrays, and only a little bit for long 1000 element arrays. Clearly, this is an overhead cost rather than one applied per-element.

Next up are lists, the reason for this article. Caching delegates is of immense value here, too. Without caching, the overhead of delegate creation is far too great at short list lengths. For medium and long lists the penalty is not so bad.

Assuming you’re using a cached delegate with List.ForEach on a long (1000 element) list, performance actually edges out a raw for loop. It doesn’t beat the while loop, but comes in a close second. Compared to foreach—a loop built into the language!—this class method is over twice as fast.

How on earth is this possible? Shouldn’t the performance be similar to the extension method that I made for arrays? Shouldn’t calling the delegate parameter for each iteration just be an extra cost on top of some other loop type? To find out, I opened up ILSpy and decompiled the mscorlib.dll that Unity’s Mono uses. Here’s how List.ForEach looks:

public void ForEach(Action<T> action)
{
	if (action == null)
	{
		throw new ArgumentNullException("action");
	}
	for (int i = 0; i < this._size; i++)
	{
		action(this._items[i]);
	}
}

That is just about identical to the extension method that I made for arrays, but there are some differences. First, mine is an extension method rather than an actual instance method. Second, this version checks to make sure the lambda isn’t null. Third, this version uses a cached _size variable rather than using the Array.Length property.

None of these factors should matter much at all. If we saw an unsafe loop, that might be a reason for it to be quicker. Likewise, we could see a call into native code, but we don’t.

The really impressive part here is that the repeated delegate calls aren’t costing much on top of the regular for loops that both ForEach functions are using. There must be some VM optimization going on to perhaps eliminate these function calls altogether. In any case, it seems that you’re free to use List.ForEach or my extension method for arrays without hardly any performance penalty! That’s an exciting find for fans of functional-style programming and a big win for the ForEach implementation in the iterator series.

So after all of this analysis of loops, let me know in the comments which do you end up using in your projects!