The smallest a C# type can be is one byte. The `byte` type and and an empty struct are examples of this. But what if we want to store data in less than a byte to improve performance such as load times and CPU cache utilization? Today’s article does just this by packing at the bit level!

Data Layout

The key realization today is that a single piece of data can’t take up less than a byte, but many pieces of data can take up less than one byte as long as they’re stored in something that takes up at least one byte. Consider an array of four 4-bit integers `a`, `b`, `c`, and `d`:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
a a a a b b b b c c c c d d d d

While there’s no 4-bit type in C#, we can still pack these into 16 bits (2 bytes) because the overall array is at least one byte large. The trick is in reading and writing the packed data, which involves a some bitwise math to extract the right bits. This can be tricky when a piece of data straddles a byte boundary. For example, consider an array of five 3-bit integers `a`, `b`, `c`, `d`, and `e`:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
a a a b b b c c c d d d e e e

When reading `c`, we need to read the first byte (bits 0-7) to get its first two bits (at 6 and 7) then read the second byte (bits 8-15) to get its third bit (at 8). Then we need to mask off the unwanted bits of both bytes (0-5 and 9-15), shift the bits of both bytes into place (0-7 shift left 1, 8 shift right 7), and finally combine them with a bitwise OR.

Code Design

There are three main goals for the code that operates on these bits:

• Efficiency: Only the minimum number of memory accesses, branches, and bitwise operations should be performed.
• Burst Support: All accesses should be able to be compiled by Burst.
• Code Reuse: The container of the bits should be abstracted from the code that accesses the bits.

The goal of efficiency precludes the simpler approach of repeatedly reading a byte and masking off all but one bit until we have read enough. This approach reads from memory too many times, performs too many branches for the loop, and does an unnecessary number of bitwise operations. Instead, we’ll read exactly once per byte containing the bits we need, perform just enough branches to figure out how many times that is, and use bitwise operations on runs of bits instead of just one.

Burst support means we’ll be omitting anything it doesn’t support. This may have included managed arrays, classes, and raw C# interfaces. So we’ll abstract the collection holding the bits instead of using a managed array, use structs instead of classes, and use generics to give Burst the compile-time knowledge of exactly which interfaces we’re using.

Code reuse can be achieved using the Burst-compatible generic algorithms approach of combining an interface with generics. Instead of just taking an interface (`IFoo`), add a type parameter (`TFoo`), take a `TFoo`, and use a `where TFoo : IFoo` constraint. This means the code isn’t using an `IFoo` determined at runtime, but rather a `TFoo` determined at compile time. In this case, we’ll have an interface that accesses the bits using a simple index to read and write.

For today, we’ll just design support for reading bits. The core of today’s code is a struct named `BitStream` containing the current bit index, functions to read various numbers of bits, and an interface to abstract the bits:

```public struct BitStream
{
public interface IByteStream
{
}

public int BitIndex;

public byte ReadUpTo8<TByteStream>(in TByteStream stream, int num)
where TByteStream : IByteStream
{
// ...
}

public ushort ReadUpTo16<TByteStream>(in TByteStream stream, int num)
where TByteStream : IByteStream
{
// ...
}

public uint ReadUpTo32<TByteStream>(in TByteStream stream, int num)
where TByteStream : IByteStream
{
// ...
}

public ulong ReadUpTo64<TByteStream>(in TByteStream stream, int num)
where TByteStream : IByteStream
{
// ...
}
}```
Implementation

`ReadUpTo8` contains the trickiest bitwise work. Functions for reading more bits (`ReadUpTo16`, `ReadUpTo32`, and `ReadUpTo64`) are all written in terms of `ReadUpTo8`. This allows the others to be quite simple and easy to read. Here is the full file:

```/// <summary>
/// Reader and writer of a stream of bits, which can be stored in any indexed
/// container type. This allows for accessing the bits regardless of their size
/// (up to 64 bits) even if the accesses cross one or more byte boundaries. This
/// can be used to efficiently pack bits such that individual values can be
/// stored in less than one bit. For example, a boolean can take up only one bit
/// and a 3-bit integer is possible.
/// </summary>
///
/// <author>
/// Jackson Dunstan, https://JacksonDunstan.com/articles/5426
/// </author>
///
/// MIT
public struct BitStream
{
/// <summary>
/// Abstraction of a stream of bytes
/// </summary>
public interface IByteStream
{
/// <summary>
/// Read a byte from the stream at a given index
/// </summary>
///
/// <param name="index">
/// Index of the byte to read.
/// </param>
///
/// <returns>
/// The byte at the given index.
/// </returns>
}

/// <summary>
/// Index of the next bit to access
/// </summary>
public int BitIndex;

/// <summary>
/// Read up to 8 bits starting at <see cref="BitIndex"/>.
/// </summary>
///
/// <param name="stream">
/// Byte stream to read from.
/// </param>
///
/// <param name="num">
/// Number of bits to read.
/// </param>
///
/// <typeparam name="TByteStream">
/// Type of byte stream to read with.
/// </typeparam>
///
/// <returns>
/// The read value, stored in the least-significant bits.
/// </returns>
public byte ReadUpTo8<TByteStream>(in TByteStream stream, int num)
where TByteStream : IByteStream
{
// Calculate where we are in the stream and advance
int bitIndex = BitIndex;
int localByteIndex = bitIndex / 8;
int localBitIndex = bitIndex % 8;
BitIndex = bitIndex + num;

// Read the byte with the high bits and decide if that's the only byte
int numHighBitsAvailable = 8 - localBitIndex;
int excessHighBitsAvailable = numHighBitsAvailable - num;
if (excessHighBitsAvailable >= 0)
{
highMask = ((1 << num) - 1) << excessHighBitsAvailable;
return (byte)((high & highMask) >> excessHighBitsAvailable);
}

// Read the low byte and combine with the high byte
highMask = (1 << numHighBitsAvailable) - 1;
int numLowBits = num - numHighBitsAvailable;
byte low = stream.Read(localByteIndex + 1);
int lowShift = 8-numLowBits;
int lowMask = ((1 << numLowBits) - 1) << lowShift;
int highPart = (high & highMask) << numLowBits;
int lowPart = (low & lowMask) >> lowShift;
int result = highPart | lowPart;
return (byte)result;
}

/// <summary>
/// Read up to 16 bits starting at <see cref="BitIndex"/>.
/// </summary>
///
/// <param name="stream">
/// Byte stream to read from.
/// </param>
///
/// <param name="num">
/// Number of bits to read.
/// </param>
///
/// <typeparam name="TByteStream">
/// Type of byte stream to read with.
/// </typeparam>
///
/// <returns>
/// The read value, stored in the least-significant bits.
/// </returns>
public ushort ReadUpTo16<TByteStream>(in TByteStream stream, int num)
where TByteStream : IByteStream
{
if (num <= 8)
{
}
int numLowBits = num - 8;
return (ushort)((high << numLowBits) | low);
}

/// <summary>
/// Read up to 32 bits starting at <see cref="BitIndex"/>.
/// </summary>
///
/// <param name="stream">
/// Byte stream to read from.
/// </param>
///
/// <param name="num">
/// Number of bits to read.
/// </param>
///
/// <typeparam name="TByteStream">
/// Type of byte stream to read with.
/// </typeparam>
///
/// <returns>
/// The read value, stored in the least-significant bits.
/// </returns>
public uint ReadUpTo32<TByteStream>(in TByteStream stream, int num)
where TByteStream : IByteStream
{
if (num <= 16)
{
}
int numLowBits = num - 16;
return (high << numLowBits) | low;
}

/// <summary>
/// Read up to 64 bits starting at <see cref="BitIndex"/>.
/// </summary>
///
/// <param name="stream">
/// Byte stream to read from.
/// </param>
///
/// <param name="num">
/// Number of bits to read.
/// </param>
///
/// <typeparam name="TByteStream">
/// Type of byte stream to read with.
/// </typeparam>
///
/// <returns>
/// The read value, stored in the least-significant bits.
/// </returns>
public ulong ReadUpTo64<TByteStream>(in TByteStream stream, int num)
where TByteStream : IByteStream
{
if (num <= 32)
{
}
int numLowBits = num - 32;
return (high << numLowBits) | low;
}
}```

Along with the runtime code, we have unit tests to exhaustively run through every combination of bit offset and number of bits to read:

```using NUnit.Framework;
using Unity.Collections;

/// <summary>
/// Unit tests for <see cref="BitStream"/>.
/// </summary>
///
/// <author>
/// Jackson Dunstan, https://JacksonDunstan.com/articles/5426
/// </author>
///
/// MIT
public class BitStreamTests
{
private struct NativeArrayByteStream : BitStream.IByteStream
{
public NativeArray<byte> Array;

{
return Array[index];
}
}

private static NativeArray<byte> CreateArray()
{
NativeArray<byte> array = new NativeArray<byte>(9, Allocator.Temp);
for (int i = 0; i < array.Length; ++i)
{
array[i] = 0b10101010;
}
return array;
}

private static BitStream CreateStream(int bitIndex)
{
return new BitStream { BitIndex = bitIndex };
}

[Test]
{
using (NativeArray<byte> array = CreateArray())
{
NativeArrayByteStream byteStream = new NativeArrayByteStream
{
Array = array
};
for (int bitIndex = 0; bitIndex < 8; ++bitIndex)
{
int expected = 0;
int next = bitIndex & 1;
for (int numBits = 1; numBits <= 8; ++numBits)
{
next = ~next & 1;
expected = (expected << 1) | next;
BitStream stream = CreateStream(bitIndex);
Assert.That(
Is.EqualTo(expected),
\$"Bit index: {bitIndex}. Num bits: {numBits}");
}
}
}
}

[Test]
{
using (NativeArray<byte> array = CreateArray())
{
NativeArrayByteStream byteStream = new NativeArrayByteStream
{
Array = array
};
for (ushort bitIndex = 0; bitIndex < 8; ++bitIndex)
{
ushort expected = 0;
ushort next = (ushort)(bitIndex & 1);
for (int numBits = 1; numBits <= 16; ++numBits)
{
next = (ushort)(~next & 1);
expected = (ushort)((expected << 1) | next);
BitStream stream = CreateStream(bitIndex);
Assert.That(
Is.EqualTo(expected),
\$"Bit index: {bitIndex}. Num bits: {numBits}");
}
}
}
}

[Test]
{
using (NativeArray<byte> array = CreateArray())
{
NativeArrayByteStream byteStream = new NativeArrayByteStream
{
Array = array
};
for (int bitIndex = 0; bitIndex < 8; ++bitIndex)
{
uint expected = 0;
uint next = (uint)bitIndex & 1;
for (int numBits = 1; numBits <= 32; ++numBits)
{
next = ~next & 1;
expected = (expected << 1) | next;
BitStream stream = CreateStream(bitIndex);
Assert.That(
actual,
Is.EqualTo(expected),
\$"Bit index: {bitIndex}. Num bits: {numBits}.");
}
}
}
}

[Test]
{
using (NativeArray<byte> array = CreateArray())
{
NativeArrayByteStream byteStream = new NativeArrayByteStream
{
Array = array
};
for (int bitIndex = 0; bitIndex < 8; ++bitIndex)
{
ulong expected = 0;
ulong next = (ulong)bitIndex & 1;
for (int numBits = 1; numBits <= 64; ++numBits)
{
next = ~next & 1;
expected = (expected << 1) | next;
BitStream stream = CreateStream(bitIndex);
Assert.That(
actual,
Is.EqualTo(expected),
\$"Bit index: {bitIndex}. Num bits: {numBits}.");
}
}
}
}
}```
Usage

Using `BitStream` begins by defining a struct implementing `IByteStream`. This is trivial for array-like types such as `NativeArray<T>`:

```struct NativeArrayByteStream : BitStream.IByteStream
{
public NativeArray<byte> Array;

{
return Array[index];
}
}```

Next, let’s define a Burst-compiled job to prove that we’ve achieved that goal:

```[BurstCompile]
struct BitStreamJob : IJob
{
public NativeArray<byte> Input;
public NativeArray<uint> Output;

public void Execute()
{
new NativeArrayByteStream { Array = Input },
32);
}
}```

Finally, let’s execute that job from a `MonoBehaviour`:

```class TestScript : MonoBehaviour
{
void Start()
{
using (NativeArray<byte> input = new NativeArray<byte>(4, Allocator.TempJob))
{
NativeArray<byte> inputAlias = input;
inputAlias[0] = 0x12;
inputAlias[1] = 0x34;
inputAlias[2] = 0x56;
inputAlias[3] = 0x78;
using (NativeArray<uint> output = new NativeArray<uint>(1, Allocator.TempJob))
{
new BitStreamJob { Input = input, Output = output }.Run();
print(Convert.ToString(output[0], 16));
}
}
}
}```

As expected, this prints `12345678` as it reads 32 bits across the four bytes in the input array into the output array. That’s a relatively easy test case, but the hard cases have already been proven by the unit tests.

Using Burst Inspector, we can verify that this job is indeed being compiled by Burst to x86 assembly:

```mov     rax, qword ptr [rdi]
mov     rcx, qword ptr [rdi + 56]
mov     eax, dword ptr [rax]
bswap   eax
mov     dword ptr [rcx], eax
ret```

Burst has nicely taken constants like the bit length of `32` and bit index of `0` and optimized away the branches and bitwise calculations. In general there will be more instructions and cost to reading from a `BitStream`, but this at least shows that we’ve achieved Burst compatibility.

Conclusion

While C# doesn’t allow for a single value to be stored in less than a byte, we can pack multiple sub-byte values into a bit stream as long as the total size of the stream is at least one byte. That’s a very minor concession that allows us to greatly reduce data sizes for purposes such as saving disk space, decreasing load times, and improving CPU cache utilization. Now we can store boolean values in a single bit, define fixed-point numbers such as `1.3`, and even use weird sizes like 9-bit integers without wasting a bunch of bits.

Next week we’ll expand `BitStream` to support writing bits. Stay tuned!

Continue to part two