Collection types like List<T> and Dictionary<TKey, TValue> are fundamental tools in C#. Sadly, I keep seeing the same misuses of them in codebase after codebase. Today we’ll look at the top 5 problems and learn how to easily avoid them!

Double Dictionary Lookups

Let’s start off with a very common—and very wasteful—one. Here’s how it looks:

if (myDictionary.Contains(myKey))
{
    int value = myDictionary[myKey];
    // ... use value
}

The issue here is that Contains performs the lookup to find the key, returns true, and then the indexer (i.e. myDictionary[myKey]) immediately does the exact same lookup to return the corresponding value. Two lookups are performed when really just one is necessary:

int value;
if (myDictionary.TryGetValue(myKey, out value))
{
    // ... use value
}

TryGetValue does the lookup and sets the out param if found, otherwise it sets the default for that type. It then returns whether the key was found. If it was, there’s no need to look up the key again since the value has already been obtained.

In C# 7, starting in Unity 2018.3, we have a new language feature that makes this even simpler:

if (myDictionary.TryGetValue(myKey, out int value))
{
    // ... use value
}

Either way, we’ve reduced two lookups to just one for a quick performance boost.

No Initial Capacity

Collection types like List<T> and Dictionary<TKey, TValue> have constructors that allow us to set an initial capacity. They also have default constructors that cede that control to the implementation of the class. Let’s look at a decompilation of mscorlib.dll in Unity 2018.3 to see what List<T> does:

private static readonly T[] EmptyArray = new T[0];
 
public List()
{
    _items = EmptyArray;
}

Here there’s no allocation at all. This is great if the intention is to never add elements to the List, but that’s usually not the case. When elements are inevitably added, this code executes:

public void Add(T item)
{
    if (_size == _items.Length)
    {
        GrowIfNeeded(1);
    }
    _items[_size++] = item;
    _version++;
}
 
private void GrowIfNeeded(int newCount)
{
    int num = _size + newCount;
    if (num > _items.Length)
    {
        Capacity = Math.Max(Math.Max(Capacity * 2, 4), num);
    }
}

The result of adding the first element is that an array of 4 elements is allocated. If the intention was to only ever store less than 4, memory is wasted. If it was to store many more than 4, a lot of growing will be required. For example, adding 100 elements means allocating arrays for 4, 8, 16, 32, 64, and 128 with a final waste of 28 elements. If we’d simply passed an initial capacity of 100 to the constructor we could have avoided all but one allocation and all the copying to successively larger arrays. We don’t always know the right size beforehand, but a good guess can at least save most of the reallocation and copying.

As for Dictionary<TKey, TValue>, here’s its default constructor:

public Dictionary()
{
    Init(10, null);
}

Here we see an initial capacity of 10. Unlike List, an initial allocation will be made. Just like List, this might be too much causing waste or too little causing many growth operations and potentially still more waste. The same advice applies: pass an initial capacity value to the constructor.

Boxing Structs

Structs are a great way to avoid the GC, but they can actually be the cause of GC sometimes. This happens in generic classes like List<T> and Dictionary<TKey, TValue> because boxing is sometimes required by C#. Boxing is when the struct needs to be put inside a newly-allocated class so that a virtual method such as Object.Equals can be called on it.

By way of example, consider TestList and TestDictionary in this test script:

using System.Collections.Generic;
using UnityEngine;
 
class TestScript : MonoBehaviour
{
    struct IntStruct
    {
        public int Value;
    }
 
    bool TestList(List<IntStruct> list)
    {
        return list.Contains(new IntStruct { Value = 1 });
    }
 
    int TestDictionary(Dictionary<IntStruct, int> dict)
    {
        return dict[new IntStruct { Value = 1 }];
    }
 
    void Start()
    {
        const int size = 1000;
        List<IntStruct> list = new List<IntStruct>(size);
        Dictionary<IntStruct, int> dict = new Dictionary<IntStruct, int>(size);
        for (int i = 0; i < size; ++i)
        {
            list.Add(new IntStruct { Value = i });
            dict[new IntStruct { Value = i }] = i;
        }
        TestList(list);
        TestDictionary(dict);
    }
}

The TestList and TestDictionary functions don’t seem like they should cause any GC allocations since IntStruct is obviously a struct. Unfortunately, they do:

Struct Keys GC

List.Contains needs to compare elements to see if the given element is contained and it does this by using an ObjectEqualityComparer that must box the IntStruct to an Object so it can call Object.Equals on it. Likewise, the indexer for Dictionary has to check the hash codes of its stored keys against the hash code of the given key. It also uses an ObjectEqualityComparer and boxes the IntStruct to Object so it can call Object.GetHashCode.

To avoid this boxing and the GC allocation that comes along with it, avoid calling functions on List and Dictionary that will require boxing when you’re using struct elements. Unity’s profiler can, as shown above, be used to determine whether this is the case or not. Alternatively, don’t use struct elements or use alternative collections such as NativeList and NativeHashMap that are currently in Preview.

Using LINQ Instead of Built-In Methods

LINQ extension methods provided by System.Linq often have similar names to built-in names provided by collection types like List and Dictionary. For example, consider these confusingly-similar lines of code:

int len = list.Count;   // Built-in Count property get
int len = list.Count(); // LINQ extension method named Count()
 
bool HasOne(List<int> list)
{
    return list.Contains(1); // Built-in Contains(T) method
}
 
bool HasOne(IEnumerable<int> list)
{
    return list.Contains(1); // LINQ extension method named Contains(T)
}
 
bool isEmpty = list.Count == 0; // Built-in Count property get
bool isEmpty = list.Empty();    // LINQ extension method named Empty()

It’s easy to miss the difference between a lightning-fast call to the Count property and a slow, GC-provoking call to a LINQ extension method. This can be avoided by not adding using System.Linq;, but sometimes that needs to be present for other parts of the file that are less performance-sensitive. In that case, consider splitting up the code into one file for regular code and one file for performance-critical code so the using can be added to just the regular code file.

Passing interfaces

Collection types implement a lot of interfaces. List<T> alone implements ICollection<T>, IEnumerable<T>, IList<T>, IReadOnlyCollection<T>, IReadOnlyList<T>, ICollection, IEnumerable, and IList. This means functions taking any of these interface types can accept a List<T> parameter due to polymorphism.

void TakeList(List<int> list)
{
}
 
void TakeInterface(IList<int> list)
{
}
 
List<int> list = new List<int>();
TakeList(list);      // Pass class
TakeInterface(list); // Pass interface (polymorphism)

Unfortunately, this often leads to worse performance because it prevents devirtualization. So a call to Add on an IList<T> parameter results in a virtual method call where the same call to Add on a List<T> parameter doesn’t. The performance difference can be dramatic as the IL2CPP differences are large.

void TakeList(List<int> list)
{
    list.Add(1); // Non-virtual method call. Faster.
}
 
void TakeInterface(IList<int> list)
{
    list.Add(1); // Virtual method call. Slower.
}

Perhaps the worst effect of using an interface type like IList<T> instead of a class type like List<T> is that it causes foreach loops to create garbage for the GC to later collect. This is because a foreach loop boils down to a call to IEnumerator<T> IEnumerable<T>.GetEnumerator() for an IList<T>. The IEnumerator<T> type is an interface, which means the returned type must either be a class or a struct that is boxed. Either way will result in a GC allocation.

When the type is known, as in the case of a List<T> variable, the foreach loop instead results in a call to List<T>.Enumerator List<T>.GetEnumerator(). This returns a List<T>.Enumerator which is a struct that involves no GC allocation. Additionally, calls to its methods are devirtualized so each iteration of the loop can call the non-virtual MoveNext method and Current property get. The result is faster loops that don’t feed the GC.

int TakeList(List<int> list)
{
    int sum = 0;
    foreach (int x in list) // Calls GetEnumerator(). Returns struct. No GC.
    {
        sum += x;
    }
    return sum;
}
 
int TakeInterface(IList<int> list)
{
    int sum = 0;
    foreach (int x in list) // Calls GetEnumerator(). Returns class. GC alloc!
    {
        sum += x;
    }
    return sum;
}

Unfortunately, the advice from some is to intentionally use the least-specialized interface possible. Often this is IList<T>, IDictionary<TKey, TValue>, or even IEnumerable<T>. This will trigger all of the devirtualization disadvantages and also prevent the use of specialized methods provided by List<T> that do the same job as equivalent LINQ functions but much faster and usually without any GC allocations. The main argument in favor of this abstraction is that it decouples the code from the concrete implementation of the IEnumerable<T> being passed in.

While this is true to some extent, it comes at a major cost. It’s worth considering the tradeoff given what’s important to the game. Is it likely that the game will ever actually use a different implementation of an IList<T> or IDictionary<TKey, TValue>? How much work would it be to switch the List<T> types to MyList<T> types to fix the compiler errors? Usually the answers are “not likely” and “a few minutes” which leads to the decision to avoid passing interfaces, especially given the efficiency gains of devirtualization.