JacksonDunstan.com

JSON is incredibly bloated, but what do you use instead? Many games have some huge configuration file with lots of data about how the game should be run. Think of the items in a shop or the layout of a saga map and you’ll get the picture. This is often a JSON file that will take forever to parse, hog up a bunch of memory, and create a ton of garbage for the GC to collect. Enter CDB: the Constant Database. Unlike other databases, CDB is a simple, read-only, key-value store that’s been around over 20 years! Today’s article introduces the format and provides a one-file script you can drop into your projects and start gaining the many advantages that CDB has to offer.

CDB was created in 1996 by D.J Bernstein. The official page describes the file format and tools and has links to a few ports. The file format is really simple. It’s literally a one-page description, if that. It basically describes a read-only Hashtable that’s stored on disk. So you can’t use it to replace PlayerPrefs or SQLite, but you can use it to replace JSON configuration files.

For example, say you had the following JSON file to describe players’ high scores:

{
	"scores": [
		{
			"name": "Mario",
			"score": 10000
		},
		{
			"name": "Luigi",
			"difficulty": 5000
		},
		// ... more scores
	]
}

When you pass a scores file with 1000 scores to a JSON deserializer it’s going to load the entire thing into memory all on one frame and create 2002 managed objects that will eventually get collected by the GC:

1 Scores
1 Score[] with capacity for as many scores as you have: N
N Score objects to fill in that array
N string objects for the player names, one per score

This is slow, will fragment managed memory, will consume a lot of memory, and if you ever release these references will eventually cause a frame hitch when the GC runs.

So how do we fix this with CDB? Well, it’s basically a Hashtable so we need to come up with key-value pairs. CDB supports multiple values per key, so we can use that to our advantage to make a sort of array. More importantly, we need to adjust our thinking from the JSON way—load the whole file into memory all at once—and toward the database way: load what we need on-demand. This greatly helps with scaling to larger size files. It might not be practical to load every single level in the game or every single item in the shop.

The database way in this case is to ask “what part of the file do I need right now?” You’re probably only showing the player one screen full of high scores, so you don’t need to load all the scores in the file that you’re not showing on the screen. You just need to load the ones that are actually on screen.

Say the player is on the first page of scores. You’d be interested in the first 20 scores. So you need to look them up by index. While the CDB is a key-value map, you can add an index easily:

0 10000Mario
1 5000Luigi

CDB keys and values are all byte[] so you can use whatever you want. In this case the key could literally be the four bytes of an int instead of a text representation like in JSON. Likewise, the first four bytes of the value can be the score and the rest of it the name. You have a lot of flexibility in how you want to store your keys and values. You can literally put whatever bytes you want there!

Now for the script to use a CDB file. It’s MIT-licensed for easy inclusion into a wide variety of projects. Simply drop this one 243 line file into your Unity project and you’re good to go.

using System.IO;
 
/// <summary>
/// Reader for CDB (constant database) files. Information about the format can be found here:
///   http://cr.yp.to/cdb.html
/// Note that this class is not thread-safe.
/// </summary>
/// <author>Jackson Dunstan, http://JacksonDunstan.com/articles/3842</author>
/// <license>MIT</license>
public class CdbFile
{
	private static byte[] table;
	private readonly Stream stream;
	private readonly int[] slots;
 
	/// <summary>
	/// Prepare for querying by reading the CDB file header. Note that this function may allocate a
	/// byte[2048] and always allocates an int[512].
	/// </summary>
	/// <param name="stream">Stream to read from. Should be the start of CDB file data. If the
	/// stream has zero length then this is a no-op and all queries created with
	/// <see cref="CreateQuery"/> will find no values.</param>
	public CdbFile(Stream stream)
	{
		unchecked
		{
			this.stream = stream;
			if (stream.Length > 0)
			{
				if (table == null)
				{
					table = new byte[2048];
				}
				stream.Read(table, 0, 2048);
				slots = new int[512];
				for (int si = 0, ti = 0; si < 256; ++si, ti += 8)
				{
					slots[si << 1] = table[ti]
						| (table[ti+1] << 8)
						| (table[ti+2] << 16)
						| (table[ti+3] << 24);
					slots[(si << 1) + 1] = table[ti+4]
						| (table[ti+5] << 8)
						| (table[ti+6] << 16)
						| (table[ti+7] << 24);
				}
			}
		}
	}
 
	/// <summary>
	/// Create a query used to find values corresponding with a given key. This uses the same
	/// stream given to the constructor. Note that this function may allocate a new byte[] with the
	/// greater of the key length and 256.
	/// </summary>
	/// <returns>A query used to find values corresponding with the given key</returns>
	/// <param name="key">Key that the query should find values corresponding with</param>
	public CdbQuery CreateQuery(byte[] key)
	{
		return new CdbQuery(key, stream, slots);
	}
}
 
/// <summary>
/// A query to find values corresponding with keys in a CDB file. Use
/// <see cref="CdbFile.CreateQuery"/> to create this. Afterward, it may be freely copy constructed
/// to find the same values from the point at which the copy is made. If default constructed, no
/// values will be found. Note that this struct's functions are not thread-safe.
/// </summary>
/// <author>Jackson Dunstan, http://JacksonDunstan.com/articles/3842</author>
/// <license>MIT</license>
public struct CdbQuery
{
	private static byte[] ByteBuffer;
	private readonly byte[] Key;
	private readonly int KeyLength;
	private readonly Stream Stream;
	private readonly int KeyHash;
	private readonly int QueryLength;
	private readonly int HashTableIndex;
	private int QueryIndex;
	private int SlotPos;
 
	/// <summary>
	/// This is not meant for public usage. Use <see cref="CdbFile.CreateQuery"/> instead.
	/// </summary>
	public CdbQuery(byte[] key, Stream stream, int[] slots)
	{
		unchecked
		{
			Key = key;
			KeyLength = key.Length;
			int requiredByteBufferLength = KeyLength > 256 ? KeyLength : 256;
			if (ByteBuffer == null || ByteBuffer.Length < requiredByteBufferLength)
			{
				ByteBuffer = new byte[requiredByteBufferLength];
			}
			Stream = stream;
 
			// No slots means the CDB file is empty
			if (slots == null)
			{
				KeyHash = 0;
				QueryLength = 0;
				HashTableIndex = 0;
				QueryIndex = 0;
				SlotPos = 0;
				return;
			}
 
			// Hash the key
			long hash = 5381;
			for (int i = 0; i < KeyLength; ++i)
			{
				hash =  (((hash + ((hash << 5) & 0xffffffff)) & 0xffffffff))
					^ (key[i] + 0x100) & 0xff;
			}
			hash = hash & 0xffffffff;
			KeyHash = (int)(hash);
 
			// Get the number of hash slots for this key
			int slot = KeyHash & 0xff;
			QueryLength = slots[(slot << 1) + 1];
			if (QueryLength == 0)
			{
				HashTableIndex = 0;
				QueryIndex = int.MaxValue;
				SlotPos = 0;
				return;
			}
 
			// Get the index into the hash table for this key
			HashTableIndex = slots[slot << 1];
 
			// Start on the first index
			QueryIndex = 0;
 
			// Get the slot for this key
			SlotPos = HashTableIndex + (int)(((int)((uint)hash >> 8) % QueryLength) << 3);
		}
	}
 
	/// <summary>
	/// Get the next value corresponding to the key for this query. The value is placed in the
	/// buffer parameter at the index specified by bufferIndex.
	/// </summary>
	/// <returns>A code indicating one of three conditions. If zero, there are no more values
	/// corresponding to the key for this query and buffer is unchanged. If positive, a value has
	/// been found and stored in buffer at bufferIndex. The return value is the number of bytes
	/// stored in buffer in this case. If negative, the buffer does not have enough capacity to
	/// store the value and is unchanged. Call this function again with a buffer that has more
	/// capacity to read the same value.</returns>
	/// <param name="buffer">Buffer to store the value in</param>
	/// <param name="bufferIndex">Index into the buffer to store the value at</param>
	public int GetNextValue(byte[] buffer, int bufferIndex = 0)
	{
		unchecked
		{
			// We may have to skip hash slots, so loop until we either get to the end, find a
			// corresponding value, or discover that the given buffer is too small for the
			// corresponding value.
			while (QueryIndex < QueryLength)
			{
				// Read this hash slot's key hash and entry position
				Stream.Seek(SlotPos, SeekOrigin.Begin);
				int hash;
				int entryPosition;
				ReadTwoInts(out hash, out entryPosition);
				if (entryPosition == 0)
				{
					return 0;
				}
 
				// Skip hash mismatches
				if (hash != KeyHash)
				{
					GoToNextEntry();
					continue;
				}
 
				// Seek to the entry
				Stream.Seek(entryPosition, SeekOrigin.Begin);
 
				// Read the lengths of the key and value
				int keyLength;
				int valueLength;
				ReadTwoInts(out keyLength, out valueLength);
 
				// Key must be the same length or it can't be a match
				if (keyLength != KeyLength)
				{
					GoToNextEntry();
					continue;
				}
 
				// Buffer must have enough capacity to hold the value
				if (valueLength > buffer.Length - bufferIndex)
				{
					return -valueLength;
				}
 
				// Read this entry's key
				Stream.Read(ByteBuffer, 0, keyLength);
				for (int i = 0; i < keyLength; ++i)
				{
					if (ByteBuffer[i] != Key[i])
					{
						GoToNextEntry();
						goto end_of_loop;
					}
				}
 
				// Read and return the value
				Stream.Read(buffer, bufferIndex, valueLength);
				GoToNextEntry();
				return valueLength;
 
				end_of_loop:;
			}
 
			// No more data values for this key.
			return 0;
		}
	}
 
	private void ReadTwoInts(out int one, out int two)
	{
		Stream.Read(ByteBuffer, 0, 8);
		one = ByteBuffer[0] | (ByteBuffer[1] << 8) | (ByteBuffer[2] << 16) | (ByteBuffer[3] << 24);
		two = ByteBuffer[4] | (ByteBuffer[5] << 8) | (ByteBuffer[6] << 16) | (ByteBuffer[7] << 24);
	}
 
	private void GoToNextEntry()
	{
		// Go to the next entry and wrap around
		QueryIndex++;
		SlotPos += 8;
		if (SlotPos == (HashTableIndex + (QueryLength << 3)))
		{
			SlotPos = HashTableIndex;
		}
	}
}

There are two types in this file: CdbFile and CdbQuery. You start by creating a Stream for your CDB file. This will probably be a FileStream, but you could use a MemoryStream for faster access to small files. Then create a CdbFile and pass in the Stream. Hold on to this Stream and CdbFile as long as you’re using the CDB file.

Next you call CreateQuery on the CdbFile and pass in the key you want to find values for. This returns you a CdbQuery struct. Then call GetNextValue on the CdbQuery to iterate over the values. Each time you call GetNextValue you’ll pass in a byte[] where the value will be put. An int will be returned with zero if there are no more values, a positive number of bytes if a value was stored in the buffer parameter, and a negative number of bytes if the buffer was too small. Negate that negative value and you’ll get the required size.

As an example, consider this CDB file:

one oneval1
one oneval2
two twoval

Now here’s a little script that loads and prints the values for the “one” key:

void Test()
{
	const string path = "/path/to/file.cdb";
	using (Stream stream = new FileStream(path, FileMode.Open, FileAccess.Read))
	{
		CdbFile cdbFile = new CdbFile(stream);
		byte[] key = Encoding.UTF8.GetBytes("one");
		byte[] value = new byte[1];
		CdbQuery query = cdbFile.CreateQuery(key);
		do
		{
			int valueLen = query.GetNextValue(value);
			if (valueLen == 0)
			{
				Debug.Log("done");
				break;
			}
			else if (valueLen < 0)
			{
				Debug.LogFormat("resize buffer to {0}", -valueLen);
				value = new byte[-valueLen];
			}
			else
			{
				Debug.LogFormat(
					"found {0} bytes: {1}",
					valueLen,
					Encoding.UTF8.GetString(value)
				);
			}
		}
		while (true);
	}
}

When run, this prints:

resize buffer to 7
found 7 bytes: oneval1
found 7 bytes: oneval2
done

The CDB script tries hard to minimize the amount of garbage created. Usually it’ll create about 2KB per CdbFile and then nothing after that. Initially some static buffers will get initialized, but that’s a one-time cost. Even CdbQuery is a struct and copying it is fine.

While this script doesn’t create CDB files, there are many apps that will. For example, the official CDB version has the cdbmake program and there are others like TinyCDB. Many fast and flexible options are out there to suit all sorts of content creation pipelines.

So next time you’re considering adding a huge JSON document to your game, think about CDB as an alternative. If you can format your data as key-value pairs, you could have a big win in many ways.

CDB: A JSON Alternative

Comments