P/Invoke in Burst: No Safety Net
Calling into native code like C++ from C# is a powerful interoperability tool in Unity. As we move more and more code out of Mono and IL2CPP and into Burst, will we still have this tool available? Today we’ll find out!
A Simple Call
Let’s start by setting up a simple P/Invoke call. To do that, we’ll need a native library to call into. We’ll use macOS today, so open up Xcode and create a new project named NativeLib with the Bundle configuration. Add a C file with these contents:
float Square(float val) { return val * val; }
Now build it and copy the Build/Products/Debug/NativeLib.bundle
to the Unity project’s Assets
With that in place, we’re ready to call it. So add the usual extern
function declaration in C#:
static class NativeLib { [DllImport("NativeLib", CallingConvention = CallingConvention.Cdecl)] public static extern float Square(float val); }
Now we can create a Burst-compiled job that calls NativeLib.Square
[BurstCompile] struct SquareJob : IJob { public NativeArray<float> Arr; public void Execute() { float val = Arr[0]; val = NativeLib.Square(val); Arr[0] = val; } }
Then we can write a little code to run the job and get the square of 2
NativeArray<float> arr = new NativeArray<float>(1, Allocator.TempJob); arr[0] = 2; new SquareJob {Arr = arr}.Run(); print(arr[0]); arr.Dispose();
Running it in the editor or a standalone build gives the expected result:
Now let’s look at Burst Inspector and see what code was generated for the job:
.text .intel_syntax noprefix .file "main" .globl "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D" .p2align 4, 0x90 .type "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D",@function "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D": .cfi_startproc push rbx .cfi_def_cfa_offset 16 .cfi_offset rbx, -16 mov rbx, qword ptr [rdi] movss xmm0, dword ptr [rbx] movabs rax, offset ".LNativeLib::Square_Ptr" call qword ptr [rax] movss dword ptr [rbx], xmm0 pop rbx ret .Lfunc_end0: .size "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D", .Lfunc_end0-"Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D" .cfi_endproc .globl burst.initialize .p2align 4, 0x90 .type burst.initialize,@function burst.initialize: .cfi_startproc push rax .cfi_def_cfa_offset 16 mov rax, rdi movabs rdi, offset ".LNativeLib::Square.function.string" call rax movabs rcx, offset ".LNativeLib::Square_Ptr" mov qword ptr [rcx], rax pop rax ret .Lfunc_end1: .size burst.initialize, .Lfunc_end1-burst.initialize .cfi_endproc .type ".LNativeLib::Square_Ptr",@object .local ".LNativeLib::Square_Ptr" .comm ".LNativeLib::Square_Ptr",8,8 .type ".LNativeLib::Square.function.string",@object .section .rodata,"a",@progbits ".LNativeLib::Square.function.string": .asciz "#dllimport:NativeLib|Square" .size ".LNativeLib::Square.function.string", 28 .section ".note.GNU-stack","",@progbits
These are the key two lines of the job’s Execute
movabs rax, offset ".LNativeLib::Square_Ptr" call qword ptr [rax]
This shows that there’s a global function pointer variable named .LNativeLib::Square_Ptr
that’s being called. We see it at the bottom of the Burst output. We also see a burst.initialize
block that initializes the variable.
Passing Structs
The Burst manual says:
For all DllImport and internal calls, only primitive types (including pointers) are supported. Passing a struct by value is not supported, you need to pass it through a pointer/reference.
Let’s see what happens when we violate this rule and pass a struct to a native function. To do so, let’s define a type and a function in the C code:
#include <math.h> struct MyVector { float X; float Y; float Z; }; float Magnitude(struct MyVector vec) { return sqrt(vec.X*vec.X + vec.Y*vec.Y + vec.Z*vec.Z); }
Now let’s add the C# extern
static class NativeLib { [DllImport("NativeLib", CallingConvention = CallingConvention.Cdecl)] public static extern MyVector MakeVec(float x, float y, float z); }
Then we can add a Burst-compiled job to call it:
[BurstCompile] struct MagnitudeJob : IJob { [ReadOnly] public NativeArray<MyVector> In; [WriteOnly] public NativeArray<float> Out; public void Execute() { MyVector val = In[0]; float mag = NativeLib.Magnitude(val); Out[0] = mag; } }
And finally we can run the job:
NativeArray<MyVector> i = new NativeArray<MyVector>(1, Allocator.TempJob); NativeArray<float> o = new NativeArray<float>(1, Allocator.TempJob); i[0] = new MyVector {X = 1, Y = 2, Z = 3}; new MagnitudeJob {In = i, Out = o}.Run(); print(o[0]); i.Dispose(); o.Dispose();
Running this in editor and a standalone build, we see the wrong output:
That’s about the square root of 5
, not 1*1 + 2*2 + 3*3 = 14
that we passed to sqrt
in the C code. It’s as though just the first two parameters were counted. Here’s the value we should have seen:
To find out why, let’s look at the Burst Inspector:
.text .intel_syntax noprefix .file "main" .globl "Unity.Jobs.IJobExtensions.JobStruct`1<MagnitudeJob>.Execute(ref MagnitudeJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_6A1C9A29C3D453DB" .p2align 4, 0x90 .type "Unity.Jobs.IJobExtensions.JobStruct`1<MagnitudeJob>.Execute(ref MagnitudeJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_6A1C9A29C3D453DB",@function "Unity.Jobs.IJobExtensions.JobStruct`1<MagnitudeJob>.Execute(ref MagnitudeJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_6A1C9A29C3D453DB": .cfi_startproc push rbx .cfi_def_cfa_offset 16 .cfi_offset rbx, -16 mov rbx, rdi mov rax, qword ptr [rbx] movss xmm0, dword ptr [rax] movss xmm1, dword ptr [rax + 4] movss xmm2, dword ptr [rax + 8] movabs rax, offset ".LNativeLib::Magnitude_Ptr" call qword ptr [rax] mov rax, qword ptr [rbx + 56] movss dword ptr [rax], xmm0 pop rbx ret .Lfunc_end0: .size "Unity.Jobs.IJobExtensions.JobStruct`1<MagnitudeJob>.Execute(ref MagnitudeJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_6A1C9A29C3D453DB", .Lfunc_end0-"Unity.Jobs.IJobExtensions.JobStruct`1<MagnitudeJob>.Execute(ref MagnitudeJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_6A1C9A29C3D453DB" .cfi_endproc .globl burst.initialize .p2align 4, 0x90 .type burst.initialize,@function burst.initialize: .cfi_startproc push rax .cfi_def_cfa_offset 16 mov rax, rdi movabs rdi, offset ".LNativeLib::Magnitude.function.string" call rax movabs rcx, offset ".LNativeLib::Magnitude_Ptr" mov qword ptr [rcx], rax pop rax ret .Lfunc_end1: .size burst.initialize, .Lfunc_end1-burst.initialize .cfi_endproc .type ".LNativeLib::Magnitude_Ptr",@object .local ".LNativeLib::Magnitude_Ptr" .comm ".LNativeLib::Magnitude_Ptr",8,8 .type ".LNativeLib::Magnitude.function.string",@object .section .rodata,"a",@progbits ".LNativeLib::Magnitude.function.string": .asciz "#dllimport:NativeLib|Magnitude" .size ".LNativeLib::Magnitude.function.string", 31 .section ".note.GNU-stack","",@progbits
Burst compiled without any warnings or errors. If we weren’t looking closely, we could have easily used the job and gotten the wrong outputs.
Returning Structs
Let’s finish up for today by trying to return a struct from C code rather than passing one as a parameter. Here’s the C function:
struct MyVector MakeVec(float x, float y, float z) { struct MyVector vec; vec.X = x; vec.Y = y; vec.Z = z; return vec; }
And here’s the extern
function in C#:
static class NativeLib { [DllImport("NativeLib", CallingConvention = CallingConvention.Cdecl)] public static extern MyVector MakeVec(float x, float y, float z); }
Now here’s a job that calls it:
[BurstCompile] struct MakeVecJob : IJob { [ReadOnly] public NativeArray<float> X; [ReadOnly] public NativeArray<float> Y; [ReadOnly] public NativeArray<float> Z; [WriteOnly] public NativeArray<MyVector> Out; public void Execute() { float x = X[0]; float y = Y[0]; float z = Z[0]; MyVector vec = NativeLib.MakeVec(x, y, z); Out[0] = vec; } }
And lastly, some test code to use the job:
NativeArray<float> x = new NativeArray<float>(1, Allocator.TempJob); NativeArray<float> y = new NativeArray<float>(1, Allocator.TempJob); NativeArray<float> z = new NativeArray<float>(1, Allocator.TempJob); NativeArray<MyVector> o = new NativeArray<MyVector>(1, Allocator.TempJob); x[0] = 1; y[0] = 2; z[0] = 3; new MakeVecJob {X = x, Y = y, Z = z, Out = o}.Run(); print(o[0].X + ", " + o[0].Y + ", " + o[0].Z); x.Dispose(); y.Dispose(); z.Dispose(); o.Dispose();
Again, we get the wrong result in both the editor and standalone builds:
1, 3, NaN
We expected to get:
1, 2, 3
To find out why, let’s again look at the Burst Inspector:
.text .intel_syntax noprefix .file "main" .globl "Unity.Jobs.IJobExtensions.JobStruct`1<MakeVecJob>.Execute(ref MakeVecJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_23A1043CCC0D044E" .p2align 4, 0x90 .type "Unity.Jobs.IJobExtensions.JobStruct`1<MakeVecJob>.Execute(ref MakeVecJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_23A1043CCC0D044E",@function "Unity.Jobs.IJobExtensions.JobStruct`1<MakeVecJob>.Execute(ref MakeVecJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_23A1043CCC0D044E": .cfi_startproc push rbx .cfi_def_cfa_offset 16 .cfi_offset rbx, -16 mov rbx, rdi mov rax, qword ptr [rbx] mov rcx, qword ptr [rbx + 56] movss xmm0, dword ptr [rax] movss xmm1, dword ptr [rcx] mov rax, qword ptr [rbx + 112] movss xmm2, dword ptr [rax] movabs rax, offset ".LNativeLib::MakeVec_Ptr" call qword ptr [rax] mov rax, qword ptr [rbx + 168] movss dword ptr [rax], xmm0 movss dword ptr [rax + 4], xmm1 fstp dword ptr [rax + 8] pop rbx ret .Lfunc_end0: .size "Unity.Jobs.IJobExtensions.JobStruct`1<MakeVecJob>.Execute(ref MakeVecJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_23A1043CCC0D044E", .Lfunc_end0-"Unity.Jobs.IJobExtensions.JobStruct`1<MakeVecJob>.Execute(ref MakeVecJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_23A1043CCC0D044E" .cfi_endproc .globl burst.initialize .p2align 4, 0x90 .type burst.initialize,@function burst.initialize: .cfi_startproc push rax .cfi_def_cfa_offset 16 mov rax, rdi movabs rdi, offset ".LNativeLib::MakeVec.function.string" call rax movabs rcx, offset ".LNativeLib::MakeVec_Ptr" mov qword ptr [rcx], rax pop rax ret .Lfunc_end1: .size burst.initialize, .Lfunc_end1-burst.initialize .cfi_endproc .type ".LNativeLib::MakeVec_Ptr",@object .local ".LNativeLib::MakeVec_Ptr" .comm ".LNativeLib::MakeVec_Ptr",8,8 .type ".LNativeLib::MakeVec.function.string",@object .section .rodata,"a",@progbits ".LNativeLib::MakeVec.function.string": .asciz "#dllimport:NativeLib|MakeVec" .size ".LNativeLib::MakeVec.function.string", 29 .section ".note.GNU-stack","",@progbits
Burst also didn’t throw any warnings or errors, but instead generated code for us to run that produces the wrong results.
ARM and 32-bit
Given that we were able to violate Burst’s rules regarding structs, let’s see what happens when we violate its other rules. First, let’s use Burst Inspector and set the CPU to x86_sse4
. This is a 32-bit CPU, which is unsupported:
DllImport is not available on 32bit platforms and on ARM platforms
Here’s the output:
.text .intel_syntax noprefix .file "main" .globl "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D" .p2align 4, 0x90 .type "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D",@function "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D": .cfi_startproc push esi .cfi_def_cfa_offset 8 sub esp, 8 .cfi_def_cfa_offset 16 .cfi_offset esi, -8 mov eax, dword ptr [esp + 16] mov esi, dword ptr [eax] movss xmm0, dword ptr [esi] movss dword ptr [esp], xmm0 call dword ptr [".LNativeLib::Square_Ptr"] fstp dword ptr [esi] add esp, 8 pop esi ret .Lfunc_end0: .size "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D", .Lfunc_end0-"Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D" .cfi_endproc .globl burst.initialize .p2align 4, 0x90 .type burst.initialize,@function burst.initialize: .cfi_startproc sub esp, 12 .cfi_def_cfa_offset 16 mov dword ptr [esp], offset ".LNativeLib::Square.function.string" call dword ptr [esp + 16] mov dword ptr [".LNativeLib::Square_Ptr"], eax add esp, 12 ret .Lfunc_end1: .size burst.initialize, .Lfunc_end1-burst.initialize .cfi_endproc .type ".LNativeLib::Square_Ptr",@object .local ".LNativeLib::Square_Ptr" .comm ".LNativeLib::Square_Ptr",4,4 .type ".LNativeLib::Square.function.string",@object .section .rodata,"a",@progbits ".LNativeLib::Square.function.string": .asciz "#dllimport:NativeLib|Square" .size ".LNativeLib::Square.function.string", 28 .section ".note.GNU-stack","",@progbits
Just like with structs, no error or warning is produced.
Now let’s try switching the CPU to arm8a_aarch64
, an unsupported 64-bit ARM CPU. Here’s the output:
.text .file "main" .globl "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D" .p2align 2 .type "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D",@function "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D": .cfi_startproc stp x19, x30, [sp, #-16]! .cfi_def_cfa_offset 16 .cfi_offset w30, -8 .cfi_offset w19, -16 ldr x19, [x0] movz x8, #".LNativeLib::Square_Ptr" movk x8, #".LNativeLib::Square_Ptr" movk x8, #".LNativeLib::Square_Ptr" movk x8, #".LNativeLib::Square_Ptr" ldr s0, [x19] ldr x8, [x8] blr x8 str s0, [x19] ldp x19, x30, [sp], #16 ret .Lfunc_end0: .size "Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D", .Lfunc_end0-"Unity.Jobs.IJobExtensions.JobStruct`1<SquareJob>.Execute(ref SquareJob data, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, ref Unity.Jobs.LowLevel.Unsafe.JobRanges ranges, int jobIndex)_83AC4021C611672D" .cfi_endproc .globl burst.initialize .p2align 2 .type burst.initialize,@function burst.initialize: .cfi_startproc str x30, [sp, #-16]! .cfi_def_cfa_offset 16 .cfi_offset w30, -16 mov x8, x0 movz x0, #".LNativeLib::Square.function.string" movk x0, #".LNativeLib::Square.function.string" movk x0, #".LNativeLib::Square.function.string" movk x0, #".LNativeLib::Square.function.string" blr x8 movz x8, #".LNativeLib::Square_Ptr" movk x8, #".LNativeLib::Square_Ptr" movk x8, #".LNativeLib::Square_Ptr" movk x8, #".LNativeLib::Square_Ptr" str x0, [x8] ldr x30, [sp], #16 ret .Lfunc_end1: .size burst.initialize, .Lfunc_end1-burst.initialize .cfi_endproc .type ".LNativeLib::Square_Ptr",@object .local ".LNativeLib::Square_Ptr" .comm ".LNativeLib::Square_Ptr",8,8 .type ".LNativeLib::Square.function.string",@object .section .rodata,"a",@progbits ".LNativeLib::Square.function.string": .asciz "#dllimport:NativeLib|Square" .size ".LNativeLib::Square.function.string", 28 .section ".note.GNU-stack","",@progbits
This too produced no error or warning from Burst.
Burst can call into native code with several caveats:
- The CPU must be x86, not ARM
- The CPU must be 64-bit
- Structs can’t be passed or returned by value
Unfortunately, none of these requirements are enforced by compiler errors by Burst. Burst doesn’t even provide a warning to indicate that we’ve violated its rules. Instead, it produces code that doesn’t work properly. So, at least at this point in Burst’s lifetime, we must be extra diligent to make sure we’re playing by Burst’s rules as there is no safety net.
#1 by Qman on August 12th, 2019 ·
I’ve heard there’s some overhead when calling P/Invoke functions from C#, does burst perform better/faster?
#2 by jackson on August 12th, 2019 ·
The calls from Burst are a lot more direct than from IL2CPP, which I’ve covered before. At least that’s currently the case with 64-bit x86. If and when ARM and 32-bit CPUs are supported, we’ll have to take another look to see how that implementation looks.
#3 by David Wu on August 15th, 2019 ·
It looks like for small structs, burst passes values in registers which is a nice optimization.
I think that for cdecl calling convention, structs must be passed on the stack, regardless of their size.
#4 by Vladislav Dmitrievich Turbanov on September 4th, 2019 ·
Did you file a bug report yet? =)
Nice catches.