Steal Some Bits
With a bit of understanding and some C# trickery, we can exploit how float
works to cram in a few more bits and make some big performance gains. Today we’ll see how to steal some of the bits from a float
!
The IEEE 754 standard describes how float
values are stored. Here’s how it looks:
1
bit: Sign8
bits: Exponent23
bits: Fraction
The Wikipedia page goes way more detail, if you’re interested. For now, we’ll use this knowledge to exploit the format and steal some bits.
Here’s the idea. There are a lot of fraction bits: 23. How often do we really need all that precision? For example, if we’re just storing an angle, do we need precision to 0.00000011920929
degrees? Usually not. So what if we stored something else in the least-significant fraction bits?
We can do this in C# pretty easily using a union. We can even measure the impact it’ll have on our precision. To do so, we’ll start with the float
for 1.0
:
Sign | Exponent | Fraction |
---|---|---|
0 | 01111111 | 00000000000000000000000 |
Then we’ll change more and most of the least-significant digits to 1
:
Sign | Exponent | Fraction |
---|---|---|
0 | 01111111 | 00000000000000000000001 |
0 | 01111111 | 00000000000000000000011 |
0 | 01111111 | 00000000000000000000111 |
0 | 01111111 | 00000000000000000001111 |
To see the result, here’s a tiny script that changes the bits starting at none and gradually changing them all:
using System; using System.Runtime.InteropServices; using System.Text; using UnityEngine; [StructLayout(LayoutKind.Explicit)] struct UintFloatUnion { [FieldOffset(0)] public uint Uint; [FieldOffset(0)] public float Float; } class TestScript : MonoBehaviour { void Start() { StringBuilder sb = new StringBuilder(); for (int i = 0; i <= 23; ++i) { uint u = 0b00111111100000000000000000000000; // == 1.0 u |= (1u << i) - 1; // Change the least significant 'i' bits to 1 // Reinterpret as float UintFloatUnion ufu; ufu.Float = 0; ufu.Uint = u; float f = ufu.Float; if (i < 10) { sb.Append(' '); } sb.Append(i); sb.Append(": "); sb.Append(f.ToString("0." + new string('0', 30))); sb.Append('n'); } print(sb.ToString()); } }
When we run this, we get these results:
0: 1.000000000000000000000000000000 1: 1.000000000000000000000000000000 2: 1.000000000000000000000000000000 3: 1.000001000000000000000000000000 4: 1.000002000000000000000000000000 5: 1.000004000000000000000000000000 6: 1.000008000000000000000000000000 7: 1.000015000000000000000000000000 8: 1.000030000000000000000000000000 9: 1.000061000000000000000000000000 10: 1.000122000000000000000000000000 11: 1.000244000000000000000000000000 12: 1.000488000000000000000000000000 13: 1.000976000000000000000000000000 14: 1.001953000000000000000000000000 15: 1.003906000000000000000000000000 16: 1.007812000000000000000000000000 17: 1.015625000000000000000000000000 18: 1.031250000000000000000000000000 19: 1.062500000000000000000000000000 20: 1.125000000000000000000000000000 21: 1.250000000000000000000000000000 22: 1.500000000000000000000000000000 23: 2.000000000000000000000000000000
Unfortunately, float.ToString
isn’t giving us enough precision to gauge the impact. Most of the digits are simply 0
. So let’s re-write this script in a tiny C++ command line program:
#include <iostream> #include <iomanip> int main() { std::cout.precision(30); for (unsigned int i = 0; i <= 23; ++i) { unsigned int u = 0b00111111100000000000000000000000; // == 1.0 u |= (1u << i) - 1; // Change the least significant 'i' bits to 1 float f = *(float*)(&u); // Reinterpret as float std::cout << std::setw(2) << i << ": " << std::fixed << f << 'n'; } }
Now here’s what we get:
0: 1.000000000000000000000000000000 1: 1.000000119209289550781250000000 2: 1.000000357627868652343750000000 3: 1.000000834465026855468750000000 4: 1.000001788139343261718750000000 5: 1.000003695487976074218750000000 6: 1.000007510185241699218750000000 7: 1.000015139579772949218750000000 8: 1.000030398368835449218750000000 9: 1.000060915946960449218750000000 10: 1.000121951103210449218750000000 11: 1.000244021415710449218750000000 12: 1.000488162040710449218750000000 13: 1.000976443290710449218750000000 14: 1.001953005790710449218750000000 15: 1.003906130790710449218750000000 16: 1.007812380790710449218750000000 17: 1.015624880790710449218750000000 18: 1.031249880790710449218750000000 19: 1.062499880790710449218750000000 20: 1.124999880790710449218750000000 21: 1.249999880790710449218750000000 22: 1.499999880790710449218750000000 23: 1.999999880790710449218750000000
That’s a lot more precision! We can now see the impact of using more and more of the digits.
Exactly how many digits to use is really dependent on the data. If the float
is just going to translate to a percentage with a couple of decimal points, it’s possible to steal about 8-9 bits. Nearly all float
values can tolerate stealing 1-3 bits, which is good enough to store a boolean value or a small integer. However, in some cases there’s a need for incredible accuracy and this technique isn’t useful at all.
There’s no way we’re going to do tons of bit twiddling just to use float
values, so we’ll need some utility functions to do it for us. Here are the three we’re going to crete:
ReadStolen
: Read the integer value stored in the least-signficant bitsReadFloatCleared
: Read thefloat
with the value stored in the least-signficant bits cleared to zeroWrite
Write a new float value without changing the value stored in the least-signficant bits
Also, note that the float
can be used directly for maximum performance but any 1
bits set in the least-signficant bits will increase its value.
Here’s how to use these functions:
// Read the integer value stored in the least-signficant 3 bits float myVal = ...; uint readVal = FloatBitStealing.ReadStolen(myVal, 3); // Read the float value with the least-signficant 3 bits set to zero float myVal = ...; float myValCleared = FloatBitStealing.ReadFloatCleared(myVal, 3); // Write srcVal to destVal without changing the least-significant 3 bits float srcVal = ...; float destVal = ...; float writtenVal = FloatBitStealing.Write(srcVal, destVal, 3);
Here’s the source code for FloatBitStealing
:
/// <summary> /// Utility functions for working with <see cref="float"/> values that have some /// of their least-signficant bits "stolen" by storing another value there. /// </summary> /// /// <author> /// Jackson Dunstan, https://JacksonDunstan.com/articles/5365 /// </author> /// /// <license> /// MIT /// </license> public static class FloatBitStealing { /// <summary> /// Utility to change the type of a <see cref="float"/> to a /// <see cref="uint"/> or visa versa. /// </summary> [System.Runtime.InteropServices.StructLayout( System.Runtime.InteropServices.LayoutKind.Explicit)] private struct UintFloatUnion { /// <summary> /// <see cref="uint"/> value of the memory /// </summary> [System.Runtime.InteropServices.FieldOffset(0)] public uint Uint; /// <summary> /// <see cref="float"/> value of the memory /// </summary> [System.Runtime.InteropServices.FieldOffset(0)] public float Float; } /// <summary> /// Read the value with the least-signficant bits cleared to zero /// </summary> /// /// <param name="f"> /// Value to read from /// </param> /// /// <param name="numStolenBits"> /// Number of least-signficant bits to clear to zero /// </param> /// /// <returns> /// The given value with the given number of least-signficant bits cleared /// to zero /// </returns> public static float ReadFloatCleared(float f, int numStolenBits) { // 1s for all stolen bits uint stolenMask = (1u << numStolenBits) - 1; UintFloatUnion ufu; ufu.Uint = 0; ufu.Float = f; ufu.Uint = ufu.Uint & ~stolenMask; return ufu.Float; } /// <summary> /// Read the least-signficant bits /// </summary> /// /// <param name="f"> /// Value to read from /// </param> /// /// <param name="numStolenBits"> /// Number of least-signficant bits to read /// </param> /// /// <returns> /// The given number of least-signficant bits of the given value /// </returns> public static uint ReadStolen(float f, int numStolenBits) { // 1s for all stolen bits uint stolenMask = (1u << numStolenBits) - 1; UintFloatUnion ufu; ufu.Uint = 0; ufu.Float = f; return ufu.Uint & stolenMask; } /// <summary> /// Write a value to an existing value without overwriting its /// least-signficant bits /// </summary> /// /// <param name="src"> /// Value to write /// </param> /// /// <param name="dest"> /// Value to write to /// </param> /// /// <param name="numStolenBits"> /// Number of least-signficant bits to not overwrite /// </param> /// /// <returns> /// The most-signficant <paramref name="numStolenBits"/> bits of the given /// <paramref name="src"/> value combined with the least-signficant bits of /// the given <paramref name="dest"/> value. /// </returns> public static float Write(float src, float dest, int numStolenBits) { // 1s for all stolen bits uint stolenMask = (1u << numStolenBits) - 1; // Get the not-stolen bits of 'src', which are the most-signficant bits UintFloatUnion ufu; ufu.Uint = 0; ufu.Float = src; uint srcNotStolen = ufu.Uint & ~stolenMask; // Get the stolen bits of 'dest', which are the least-significant bits ufu.Uint = 0; ufu.Float = dest; uint destStolen = ufu.Uint & stolenMask; // Combine them ufu.Uint = srcNotStolen | destStolen; return ufu.Float; } }
This technique and the above utility functions are particularly useful to fit the most float
values in the CPU’s data cache. For example, a struct
with a float
and a bool
takes up 8
bytes and so only 8
of them will fit in a 64
byte cache line. Alternatively, stealing just one bit of the float
with this technique means that there’s no need for the bool
and suddenly twice as many values will fit in a cache line. So keep this technique in mind next time you’re trying to fit more data in a cache line and have a float
that doesn’t necessarily need all of its precision.