Burst Function Pointers vs. Switch Statements
A couple weeks ago we took a look at the performance of function pointers in Burst. In doing so, we left out an alternative: good old switch
statements. Today we’ll put those to the test to see how they stack up next to Burst’s newfangled function pointers!
When we compared performance before we wrote jobs that either invoked a function pointer or decided which operation to do and did it directly. There is another approach though: decide which operation to do outside of the job, store the decision in an integer, and switch
on that integer inside the job.
This approach is similar to the function pointer approach because the decisions are made outside of the job and then stored in data. That data takes the form of simple integers in this approach rather than function pointer addresses, but this is conceptually similar. The stored decisions may be reused as many times as we want, leading to improved performance if those decisions are relatively expensive.
So this week we’ll use a function pointer job like this:
delegate float Fp(float a, float b); [BurstCompile(CompileSynchronously = true)] struct FpJob : IJob { [ReadOnly] public NativeArray<FunctionPointer<Fp>> Fps; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { Output[i] = Fps[i].Invoke(Input[i].x, Input[i].y); } } }
Here are the functions that we could compile function pointers from:
[BurstCompile(CompileSynchronously = true)] static class Funcs { [BurstCompile(CompileSynchronously = true)] public static float Add(float a, float b) { return a + b; } [BurstCompile(CompileSynchronously = true)] public static float Sub(float a, float b) { return a - b; } [BurstCompile(CompileSynchronously = true)] public static float Mul(float a, float b) { return a * b; } [BurstCompile(CompileSynchronously = true)] public static float Div(float a, float b) { return a / b; } [BurstCompile(CompileSynchronously = true)] public static float Pow(float a, float b) { return math.pow(a, b); } [BurstCompile(CompileSynchronously = true)] public static float Atan2(float a, float b) { return math.atan2(a, b); } [BurstCompile(CompileSynchronously = true)] public static float Max(float a, float b) { return math.max(a, b); } [BurstCompile(CompileSynchronously = true)] public static float Min(float a, float b) { return math.min(a, b); } [BurstCompile(CompileSynchronously = true)] public static float Mod(float a, float b) { return math.fmod(a, b); } }
And we’ll compare the function pointer job against a switch
job that looks like this:
[BurstCompile(CompileSynchronously = true)] struct Switch9Job : IJob { [ReadOnly] public NativeArray<int> OpTypes; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { switch (OpTypes[i]) { case 0: Output[i] = Input[i].x + Input[i].y; break; case 1: Output[i] = Input[i].x - Input[i].y; break; case 2: Output[i] = Input[i].x * Input[i].y; break; case 3: Output[i] = Input[i].x / Input[i].y; break; case 4: Output[i] = math.pow(Input[i].x, Input[i].y); break; case 5: Output[i] = math.atan2(Input[i].x, Input[i].y); break; case 6: Output[i] = math.max(Input[i].x, Input[i].y); break; case 7: Output[i] = math.min(Input[i].x, Input[i].y); break; case 8: Output[i] = math.fmod(Input[i].x, Input[i].y); break; } } } }
Note that the operations themselves are the same. In the case of the switch
job, the operation to perform is taken in as NativeArray<int> OpTypes
and the operation is performed directly after using switch
to decide which to execute.
For completeness, we’ll also run Switch8Job
, Switch7Job
, Switch6Job
, Switch5Job
, Switch4Job
, Switch3Job
, and Switch2Job
to see the performance when there are fewer case
labels to handle in the switch
.
Here’s the source for the full test script:
using System.Diagnostics; using System.Text; using UnityEngine; using Unity.Burst; using Unity.Collections; using Unity.Jobs; using Unity.Mathematics; using Random = UnityEngine.Random; delegate float Fp(float a, float b); [BurstCompile(CompileSynchronously = true)] struct Switch2Job : IJob { [ReadOnly] public NativeArray<int> OpTypes; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { switch (OpTypes[i]) { case 0: Output[i] = Input[i].x + Input[i].y; break; case 1: Output[i] = Input[i].x - Input[i].y; break; } } } } [BurstCompile(CompileSynchronously = true)] struct Switch3Job : IJob { [ReadOnly] public NativeArray<int> OpTypes; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { switch (OpTypes[i]) { case 0: Output[i] = Input[i].x + Input[i].y; break; case 1: Output[i] = Input[i].x - Input[i].y; break; case 2: Output[i] = Input[i].x * Input[i].y; break; } } } } [BurstCompile(CompileSynchronously = true)] struct Switch4Job : IJob { [ReadOnly] public NativeArray<int> OpTypes; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { switch (OpTypes[i]) { case 0: Output[i] = Input[i].x + Input[i].y; break; case 1: Output[i] = Input[i].x - Input[i].y; break; case 2: Output[i] = Input[i].x * Input[i].y; break; case 3: Output[i] = Input[i].x / Input[i].y; break; } } } } [BurstCompile(CompileSynchronously = true)] struct Switch5Job : IJob { [ReadOnly] public NativeArray<int> OpTypes; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { switch (OpTypes[i]) { case 0: Output[i] = Input[i].x + Input[i].y; break; case 1: Output[i] = Input[i].x - Input[i].y; break; case 2: Output[i] = Input[i].x * Input[i].y; break; case 3: Output[i] = Input[i].x / Input[i].y; break; case 4: Output[i] = math.pow(Input[i].x, Input[i].y); break; } } } } [BurstCompile(CompileSynchronously = true)] struct Switch6Job : IJob { [ReadOnly] public NativeArray<int> OpTypes; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { switch (OpTypes[i]) { case 0: Output[i] = Input[i].x + Input[i].y; break; case 1: Output[i] = Input[i].x - Input[i].y; break; case 2: Output[i] = Input[i].x * Input[i].y; break; case 3: Output[i] = Input[i].x / Input[i].y; break; case 4: Output[i] = math.pow(Input[i].x, Input[i].y); break; case 5: Output[i] = math.atan2(Input[i].x, Input[i].y); break; } } } } [BurstCompile(CompileSynchronously = true)] struct Switch7Job : IJob { [ReadOnly] public NativeArray<int> OpTypes; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { switch (OpTypes[i]) { case 0: Output[i] = Input[i].x + Input[i].y; break; case 1: Output[i] = Input[i].x - Input[i].y; break; case 2: Output[i] = Input[i].x * Input[i].y; break; case 3: Output[i] = Input[i].x / Input[i].y; break; case 4: Output[i] = math.pow(Input[i].x, Input[i].y); break; case 5: Output[i] = math.atan2(Input[i].x, Input[i].y); break; case 6: Output[i] = math.max(Input[i].x, Input[i].y); break; } } } } [BurstCompile(CompileSynchronously = true)] struct Switch8Job : IJob { [ReadOnly] public NativeArray<int> OpTypes; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { switch (OpTypes[i]) { case 0: Output[i] = Input[i].x + Input[i].y; break; case 1: Output[i] = Input[i].x - Input[i].y; break; case 2: Output[i] = Input[i].x * Input[i].y; break; case 3: Output[i] = Input[i].x / Input[i].y; break; case 4: Output[i] = math.pow(Input[i].x, Input[i].y); break; case 5: Output[i] = math.atan2(Input[i].x, Input[i].y); break; case 6: Output[i] = math.max(Input[i].x, Input[i].y); break; case 7: Output[i] = math.min(Input[i].x, Input[i].y); break; } } } } [BurstCompile(CompileSynchronously = true)] struct Switch9Job : IJob { [ReadOnly] public NativeArray<int> OpTypes; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { switch (OpTypes[i]) { case 0: Output[i] = Input[i].x + Input[i].y; break; case 1: Output[i] = Input[i].x - Input[i].y; break; case 2: Output[i] = Input[i].x * Input[i].y; break; case 3: Output[i] = Input[i].x / Input[i].y; break; case 4: Output[i] = math.pow(Input[i].x, Input[i].y); break; case 5: Output[i] = math.atan2(Input[i].x, Input[i].y); break; case 6: Output[i] = math.max(Input[i].x, Input[i].y); break; case 7: Output[i] = math.min(Input[i].x, Input[i].y); break; case 8: Output[i] = math.fmod(Input[i].x, Input[i].y); break; } } } } [BurstCompile(CompileSynchronously = true)] struct FpJob : IJob { [ReadOnly] public NativeArray<FunctionPointer<Fp>> Fps; [ReadOnly] public NativeArray<float2> Input; [WriteOnly] public NativeArray<float> Output; public void Execute() { for (int i = 0; i < Input.Length; ++i) { Output[i] = Fps[i].Invoke(Input[i].x, Input[i].y); } } } [BurstCompile(CompileSynchronously = true)] static class Funcs { [BurstCompile(CompileSynchronously = true)] public static float Add(float a, float b) { return a + b; } [BurstCompile(CompileSynchronously = true)] public static float Sub(float a, float b) { return a - b; } [BurstCompile(CompileSynchronously = true)] public static float Mul(float a, float b) { return a * b; } [BurstCompile(CompileSynchronously = true)] public static float Div(float a, float b) { return a / b; } [BurstCompile(CompileSynchronously = true)] public static float Pow(float a, float b) { return math.pow(a, b); } [BurstCompile(CompileSynchronously = true)] public static float Atan2(float a, float b) { return math.atan2(a, b); } [BurstCompile(CompileSynchronously = true)] public static float Max(float a, float b) { return math.max(a, b); } [BurstCompile(CompileSynchronously = true)] public static float Min(float a, float b) { return math.min(a, b); } [BurstCompile(CompileSynchronously = true)] public static float Mod(float a, float b) { return math.fmod(a, b); } } class TestScript : MonoBehaviour { void Start() { const int numOps = 9; const int len = 1000; var sb = new StringBuilder(1024); sb.Append("Num Ops,Switch Ticks,Function Pointer Ticksn"); var fps = new NativeArray<FunctionPointer<Fp>>(len, Allocator.TempJob); var opTypes = new NativeArray<int>(len, Allocator.TempJob); var input = new NativeArray<float2>(len, Allocator.TempJob); var output = new NativeArray<float>(len, Allocator.TempJob); var sw = new Stopwatch(); for (int op = 2; op <= numOps; ++op) { for (int i = 0; i < len; ++i) { var opType = Random.Range(0, op-1); opTypes[i] = opType; switch (opType) { case 0: fps[i] = BurstCompiler.CompileFunctionPointer<Fp>( Funcs.Add); break; case 1: fps[i] = BurstCompiler.CompileFunctionPointer<Fp>( Funcs.Sub); break; case 2: fps[i] = BurstCompiler.CompileFunctionPointer<Fp>( Funcs.Mul); break; case 3: fps[i] = BurstCompiler.CompileFunctionPointer<Fp>( Funcs.Div); break; case 4: fps[i] = BurstCompiler.CompileFunctionPointer<Fp>( Funcs.Pow); break; case 5: fps[i] = BurstCompiler.CompileFunctionPointer<Fp>( Funcs.Atan2); break; case 6: fps[i] = BurstCompiler.CompileFunctionPointer<Fp>( Funcs.Max); break; case 7: fps[i] = BurstCompiler.CompileFunctionPointer<Fp>( Funcs.Min); break; case 8: fps[i] = BurstCompiler.CompileFunctionPointer<Fp>( Funcs.Mod); break; } } long switchTicks = 0; switch (op) { case 2: { var switchJob = new Switch2Job { OpTypes = opTypes, Input = input, Output = output }; switchJob.Run(); sw.Restart(); switchJob.Run(); switchTicks = sw.ElapsedTicks; break; } case 3: { var switchJob = new Switch3Job { OpTypes = opTypes, Input = input, Output = output }; switchJob.Run(); sw.Restart(); switchJob.Run(); switchTicks = sw.ElapsedTicks; break; } case 4: { var switchJob = new Switch4Job { OpTypes = opTypes, Input = input, Output = output }; switchJob.Run(); sw.Restart(); switchJob.Run(); switchTicks = sw.ElapsedTicks; break; } case 5: { var switchJob = new Switch5Job { OpTypes = opTypes, Input = input, Output = output }; switchJob.Run(); sw.Restart(); switchJob.Run(); switchTicks = sw.ElapsedTicks; break; } case 6: { var switchJob = new Switch6Job { OpTypes = opTypes, Input = input, Output = output }; switchJob.Run(); sw.Restart(); switchJob.Run(); switchTicks = sw.ElapsedTicks; break; } case 7: { var switchJob = new Switch7Job { OpTypes = opTypes, Input = input, Output = output }; switchJob.Run(); sw.Restart(); switchJob.Run(); switchTicks = sw.ElapsedTicks; break; } case 8: { var switchJob = new Switch8Job { OpTypes = opTypes, Input = input, Output = output }; switchJob.Run(); sw.Restart(); switchJob.Run(); switchTicks = sw.ElapsedTicks; break; } case 9: { var switchJob = new Switch9Job { OpTypes = opTypes, Input = input, Output = output }; switchJob.Run(); sw.Restart(); switchJob.Run(); switchTicks = sw.ElapsedTicks; break; } } var fpJob = new FpJob { Fps = fps, Input = input, Output = output }; fpJob.Run(); sw.Restart(); fpJob.Run(); long fpTicks = sw.ElapsedTicks; sb.Append(op) .Append(',') .Append(switchTicks) .Append(',') .Append(fpTicks) .Append('n'); } print(sb.ToString()); fps.Dispose(); opTypes.Dispose(); input.Dispose(); output.Dispose(); } }
I ran the test in this environment:
- 2.7 Ghz Intel Core i7-6820HQ
- macOS 10.15.3
- Unity 2019.3.5f1
- macOS Standalone
- .NET 4.x scripting runtime version and API compatibility level
- IL2CPP
- Non-development
- 640×480, Fastest, Windowed
And here are the results I got:
Num Ops | Switch Ticks | Function Pointer Ticks |
---|---|---|
2 | 23 | 32 |
3 | 46 | 69 |
4 | 74 | 75 |
5 | 103 | 99 |
6 | 277 | 263 |
7 | 241 | 236 |
8 | 261 | 294 |
9 | 319 | 309 |
The overall shape of the graph is very similar between both approaches. Both jobs take longer and longer as more types of operations are added.
This shows that function pointers neither speed up the job nor slow it down. So the advantages and disadvantages lie not with performance, but rather with more human factors. One might prefer the function pointer approach since the job wouldn’t need to be modified if new operation types were added. Another might prefer the switch
approach because the operations being performed are explicitly listed in the job.
Which do you prefer? Let me know in the comments.
#1 by Henke37 on March 23rd, 2020 ·
I like switch statements, since that is traditionally how you do state machines. Which, at this point, this is starting to become.
#2 by Stephen Hodgson on March 23rd, 2020 ·
The very first burst functions you’re adding in each when it should be multiplied, divided, and subtracted.
Might have skewed the numbers.
#3 by jackson on March 23rd, 2020 ·
Thanks for pointing this out! I corrected the typos and ran the test again in the same environment. I got the exact same results.
#4 by Simon Lebettre on September 28th, 2021 ·
I cant find it now, but I remember reading somewhere that burst will optimize switch statements into a lookup table if all the cases and default: are present and there is no throw.