JacksonDunstan.com | 5 Common Misuses of Collections

5 Common Misuses of Collections

April 1, 2019 Tags: boxing, collection, dictionary, GC, interface, linq, list, virtual function

Collection types like List<T> and Dictionary<TKey, TValue> are fundamental tools in C#. Sadly, I keep seeing the same misuses of them in codebase after codebase. Today we’ll look at the top 5 problems and learn how to easily avoid them!

Double Dictionary Lookups

Let’s start off with a very common—and very wasteful—one. Here’s how it looks:

if (myDictionary.Contains(myKey))
{
    int value = myDictionary[myKey];
    // ... use value
}

The issue here is that Contains performs the lookup to find the key, returns true, and then the indexer (i.e. myDictionary[myKey]) immediately does the exact same lookup to return the corresponding value. Two lookups are performed when really just one is necessary:

int value;
if (myDictionary.TryGetValue(myKey, out value))
{
    // ... use value
}

TryGetValue does the lookup and sets the out param if found, otherwise it sets the default for that type. It then returns whether the key was found. If it was, there’s no need to look up the key again since the value has already been obtained.

In C# 7, starting in Unity 2018.3, we have a new language feature that makes this even simpler:

if (myDictionary.TryGetValue(myKey, out int value))
{
    // ... use value
}

Either way, we’ve reduced two lookups to just one for a quick performance boost.

No Initial Capacity

Collection types like List<T> and Dictionary<TKey, TValue> have constructors that allow us to set an initial capacity. They also have default constructors that cede that control to the implementation of the class. Let’s look at a decompilation of mscorlib.dll in Unity 2018.3 to see what List<T> does:

private static readonly T[] EmptyArray = new T[0];
 
public List()
{
    _items = EmptyArray;
}

Here there’s no allocation at all. This is great if the intention is to never add elements to the List, but that’s usually not the case. When elements are inevitably added, this code executes:

public void Add(T item)
{
    if (_size == _items.Length)
    {
        GrowIfNeeded(1);
    }
    _items[_size++] = item;
    _version++;
}
 
private void GrowIfNeeded(int newCount)
{
    int num = _size + newCount;
    if (num > _items.Length)
    {
        Capacity = Math.Max(Math.Max(Capacity * 2, 4), num);
    }
}

The result of adding the first element is that an array of 4 elements is allocated. If the intention was to only ever store less than 4, memory is wasted. If it was to store many more than 4, a lot of growing will be required. For example, adding 100 elements means allocating arrays for 4, 8, 16, 32, 64, and 128 with a final waste of 28 elements. If we’d simply passed an initial capacity of 100 to the constructor we could have avoided all but one allocation and all the copying to successively larger arrays. We don’t always know the right size beforehand, but a good guess can at least save most of the reallocation and copying.

As for Dictionary<TKey, TValue>, here’s its default constructor:

public Dictionary()
{
    Init(10, null);
}

Here we see an initial capacity of 10. Unlike List, an initial allocation will be made. Just like List, this might be too much causing waste or too little causing many growth operations and potentially still more waste. The same advice applies: pass an initial capacity value to the constructor.

Boxing Structs

Structs are a great way to avoid the GC, but they can actually be the cause of GC sometimes. This happens in generic classes like List<T> and Dictionary<TKey, TValue> because boxing is sometimes required by C#. Boxing is when the struct needs to be put inside a newly-allocated class so that a virtual method such as Object.Equals can be called on it.

By way of example, consider TestList and TestDictionary in this test script:

using System.Collections.Generic;
using UnityEngine;
 
class TestScript : MonoBehaviour
{
    struct IntStruct
    {
        public int Value;
    }
 
    bool TestList(List<IntStruct> list)
    {
        return list.Contains(new IntStruct { Value = 1 });
    }
 
    int TestDictionary(Dictionary<IntStruct, int> dict)
    {
        return dict[new IntStruct { Value = 1 }];
    }
 
    void Start()
    {
        const int size = 1000;
        List<IntStruct> list = new List<IntStruct>(size);
        Dictionary<IntStruct, int> dict = new Dictionary<IntStruct, int>(size);
        for (int i = 0; i < size; ++i)
        {
            list.Add(new IntStruct { Value = i });
            dict[new IntStruct { Value = i }] = i;
        }
        TestList(list);
        TestDictionary(dict);
    }
}

The TestList and TestDictionary functions don’t seem like they should cause any GC allocations since IntStruct is obviously a struct. Unfortunately, they do:

Struct Keys GC

List.Contains needs to compare elements to see if the given element is contained and it does this by using an ObjectEqualityComparer that must box the IntStruct to an Object so it can call Object.Equals on it. Likewise, the indexer for Dictionary has to check the hash codes of its stored keys against the hash code of the given key. It also uses an ObjectEqualityComparer and boxes the IntStruct to Object so it can call Object.GetHashCode.

To avoid this boxing and the GC allocation that comes along with it, avoid calling functions on List and Dictionary that will require boxing when you’re using struct elements. Unity’s profiler can, as shown above, be used to determine whether this is the case or not. Alternatively, don’t use struct elements or use alternative collections such as NativeList and NativeHashMap that are currently in Preview.

Using LINQ Instead of Built-In Methods

LINQ extension methods provided by System.Linq often have similar names to built-in names provided by collection types like List and Dictionary. For example, consider these confusingly-similar lines of code:

int len = list.Count;   // Built-in Count property get
int len = list.Count(); // LINQ extension method named Count()
 
bool HasOne(List<int> list)
{
    return list.Contains(1); // Built-in Contains(T) method
}
 
bool HasOne(IEnumerable<int> list)
{
    return list.Contains(1); // LINQ extension method named Contains(T)
}
 
bool isEmpty = list.Count == 0; // Built-in Count property get
bool isEmpty = list.Empty();    // LINQ extension method named Empty()

It’s easy to miss the difference between a lightning-fast call to the Count property and a slow, GC-provoking call to a LINQ extension method. This can be avoided by not adding using System.Linq;, but sometimes that needs to be present for other parts of the file that are less performance-sensitive. In that case, consider splitting up the code into one file for regular code and one file for performance-critical code so the using can be added to just the regular code file.

Passing interfaces

Collection types implement a lot of interfaces. List<T> alone implements ICollection<T>, IEnumerable<T>, IList<T>, IReadOnlyCollection<T>, IReadOnlyList<T>, ICollection, IEnumerable, and IList. This means functions taking any of these interface types can accept a List<T> parameter due to polymorphism.

void TakeList(List<int> list)
{
}
 
void TakeInterface(IList<int> list)
{
}
 
List<int> list = new List<int>();
TakeList(list);      // Pass class
TakeInterface(list); // Pass interface (polymorphism)

Unfortunately, this often leads to worse performance because it prevents devirtualization. So a call to Add on an IList<T> parameter results in a virtual method call where the same call to Add on a List<T> parameter doesn’t. The performance difference can be dramatic as the IL2CPP differences are large.

void TakeList(List<int> list)
{
    list.Add(1); // Non-virtual method call. Faster.
}
 
void TakeInterface(IList<int> list)
{
    list.Add(1); // Virtual method call. Slower.
}

Perhaps the worst effect of using an interface type like IList<T> instead of a class type like List<T> is that it causes foreach loops to create garbage for the GC to later collect. This is because a foreach loop boils down to a call to IEnumerator<T> IEnumerable<T>.GetEnumerator() for an IList<T>. The IEnumerator<T> type is an interface, which means the returned type must either be a class or a struct that is boxed. Either way will result in a GC allocation.

When the type is known, as in the case of a List<T> variable, the foreach loop instead results in a call to List<T>.Enumerator List<T>.GetEnumerator(). This returns a List<T>.Enumerator which is a struct that involves no GC allocation. Additionally, calls to its methods are devirtualized so each iteration of the loop can call the non-virtual MoveNext method and Current property get. The result is faster loops that don’t feed the GC.

int TakeList(List<int> list)
{
    int sum = 0;
    foreach (int x in list) // Calls GetEnumerator(). Returns struct. No GC.
    {
        sum += x;
    }
    return sum;
}
 
int TakeInterface(IList<int> list)
{
    int sum = 0;
    foreach (int x in list) // Calls GetEnumerator(). Returns class. GC alloc!
    {
        sum += x;
    }
    return sum;
}

Unfortunately, the advice from some is to intentionally use the least-specialized interface possible. Often this is IList<T>, IDictionary<TKey, TValue>, or even IEnumerable<T>. This will trigger all of the devirtualization disadvantages and also prevent the use of specialized methods provided by List<T> that do the same job as equivalent LINQ functions but much faster and usually without any GC allocations. The main argument in favor of this abstraction is that it decouples the code from the concrete implementation of the IEnumerable<T> being passed in.

While this is true to some extent, it comes at a major cost. It’s worth considering the tradeoff given what’s important to the game. Is it likely that the game will ever actually use a different implementation of an IList<T> or IDictionary<TKey, TValue>? How much work would it be to switch the List<T> types to MyList<T> types to fix the compiler errors? Usually the answers are “not likely” and “a few minutes” which leads to the decision to avoid passing interfaces, especially given the efficiency gains of devirtualization.

Comments

#1 by Khalid on April 1st, 2019 · Reply

Thanks for sharing this valuable information
#2 by Yilmaz Kiymaz on April 1st, 2019 · Reply

Regarding the last 2 points, I’ve recently found LinqFaster (https://github.com/jackmott/LinqFaster) to be a great replacement for LINQ when working with Lists.
- #3 by jackson on April 1st, 2019 · Reply
  
  Good suggestion! LinqFaster works by following the article’s advice to not pass interfaces:
  
  LinqFaster provides specialized versions of the Linq extensions methods optimized for working with arrays and lists.
  
  The library provides overloads that take T[], List<T>, and Span<T> parameters instead of IEnumerable<T> which prevents devirtualization for these types. Unfortunately there aren’t overloads for other common collection types like Dictionary<TKey, TValue>.
#4 by FH on April 1st, 2019 · Reply

Regarding Struct Boxing, you should override GetHashCode(). In this case, GetHashCode() would just return Value. This should reduce the GC Allocations.
- #5 by jackson on April 1st, 2019 · Reply
  
  I added Equals and GetHashCode like so:
  
  struct IntStruct { public int Value; public override bool Equals(object obj) { return obj is IntStruct && Value == ((IntStruct)obj).Value; } public override int GetHashCode() { return Value; } }
  
  Then ran TestScript again and got these results:
  
  TestList: 40 bytes (article was 60 bytes)
  TestDictionary: 20 bytes (article was 80 bytes)
  
  So you’re correct: this will reduce the GC allocations. Unfortunately, it won’t eliminate them so it’s still best to avoid struct keys and elements to truly eliminate GC allocations.
  - #6 by FH on April 2nd, 2019 · Reply
    
    If you want both TestList() and TestDictionary() down to 0 bytes, you should also implement System.IEquatable.
    
    struct IntStruct : System.IEquatable<IntStruct> { public int Value; public override bool Equals(object obj) { return obj is IntStruct && Value == ((IntStruct)obj).Value; } public override int GetHashCode() { return Value; } public bool Equals(IntStruct other) { return Value == other.Value; } }
    - #7 by jackson on April 2nd, 2019 · Reply
      
      The comment system sometimes mangles code, so I’m not sure exactly what the original looked like but as-is this doesn’t compile. System.IEquatable requires a generic type (e.g. System.IEquatable<IntStruct>) which I could re-add for you but then it wouldn’t compile since there’s no bool Equals(IntStruct obj) method. Or I could remove the : System.IEquatable, but you said that was the change you were trying to make.
      
      Regardless, override bool Equals(object obj) (i.e. from System.Object) isn’t actually necessary to eliminate the GC allocations. Instead, bool Equals(IntStruct obj) (i.e. from System.IEquatable<IntStruct>) in addition to GetHashCode will eliminate the GC allocations:
      
      struct IntStruct : System.IEquatable<IntStruct> { public int Value; public bool IEquatable(IntStruct obj) { return Value == obj.Value; } public override int GetHashCode() { return Value; } public bool Equals(IntStruct other) { return Value == other.Value; } }
      - #8 by FH on April 2nd, 2019 · Reply
        
        Yes, your comment system is pretty unusable for code. Obviously you found out what I originally wrote. I just copied your code and added the “: System.IEquatable < IntStruct >” and the public bool Equals(IntStruct other) { }. So I don’t know why you claim “then it wouldnâ€™t compile since thereâ€™s no bool Equals(IntStruct obj) method”, as I clearly added this method. See above.
        
        #9 by jackson on April 2nd, 2019 · Reply
        
        The version you posted has a Equals method that takes an object, but not an Equals method that takes an IntStruct. The version taking object isn’t part of the System.IEquatable<IntStruct> interface, but that’s OK since you provided the override keyword and all structs implicitly derive from object. The version taking IntStruct is required by System.IEquatable<IntStruct>, but the code you posted doesn’t implement one. Also, the override keyword shouldn’t be used with the IntStruct version since there’s nothing to override.
        
        #10 by FH on April 3rd, 2019 · Reply
        
        I know it is unimportant, but really, get back to comment #6. You will see that I clearly added the Equals method that takes an IntStruct. Without the override keyword.
        
        No need to lecture me. :P
        
        #11 by FH on April 3rd, 2019 · Reply
        
        Here’s a screenshot, if needed: https://imgur.com/w4QUjXv
        
        It’s really only missing the “<IntStruct>” part because your comment software allows tags, and it interpreted it as one, which is why it was removed.
        
        #12 by jackson on April 3rd, 2019 · Reply
        
        Thanks for pointing it out for me; I see the overload now. I didn’t mean to lecture, I just didn’t see it before. I’ve updated the comment to add the missing <IntStruct> and will look into improving the comments system to avoid similar issues in the future.
#13 by Shannon Rowe on June 8th, 2020 · Reply

Love that last one. Rider is always nagging me to switch my method parameters to an interface like IEnumerable, and it always bugged me so now I have a solid reason to switch off that inspection tip. :)