Free Lists
The garbage collector (GC) in Flash Player 9 and 10 is a notorious source of performance problems. This is caught in the Flex Builder profiler as [mark] and [reap] steps as the GC marks objects to free and then actually frees them. Additionally, the actual allocation of new objects is slow. Today’s question asks “can we improve performance by managing some of our own memory?” This article shows one way to take a little control of memory management. (SPOILER: this technique can yield a 2100% speedup!)
One useful technique in managing your own memory is called a free list. When you are done with an object, put it on a list of free objects and then when you want a new object, take it off the free list. If the free list is empty, allocate a new one as would normally via the new operator.
AS3 programmers should be wary of any technique that proposes to shift work from the Flash Player’s native code to AS3’s interpreted (or JITed) code. This technique proposes to do just that: take the work that the GC is doing in native code and shift it to free list functionality written in AS3. The advantage the programmer has here over the GC is that he or she knows more about the unique circumstances with with the program is actually dealing. The AS3 programmer can remove some guarantees that the GC makes but the programmer does not actually need. In the case of the free list, the programmer does not start over with a brand new object each time but instead must re-initialize it. Finally, the work of calling utility functions to allocate and deallocate the object rather than directly using the new operator and letting the GC deal with the dereferencing will add more work for AS3 to do and further slow down the free list implementation.
The above pros and cons make it unclear whether a free list implementation will provide any speed up. Below are two free list implementations written to test whether a speedup can be achieved. The first implementation uses a simple singly linked list to hold the freed objects that make up the free list. The second implementation uses a Vector instead. These approaches were tacked on to two identical classes: Color4LinkedList and Color4Vector. Color4LinkedList has static fields to manage the allocation and deallocation of Color4LinkedList objects. Color4Vector has static fields to manage the allocation and deallocation of Color4Vector objects.
The test program works like this:
- Allocate a bunch of objects via the linked list technique, the vector technique, and the new operator. These objects are not freed.
- Pre-allocate a bunch of objects via the linked list and vector techniques and free them so that the free list is full.
- Allocate a bunch of objects via the linked list technique, the vector technique, and the new operator. These objects are not freed.
The first and third steps are the tests and the times to allocate all the objects by each technique is recorded. Here is the test program:
package { import flash.display.*; import flash.events.*; import flash.geom.*; import flash.system.*; import flash.text.*; import flash.utils.*; public class FreeListTest extends Sprite { private var __logger:TextField = new TextField(); public function FreeListTest() { __logger.autoSize = TextFieldAutoSize.LEFT; addChild(__logger); const NUM_COLORS:int = 10000000; log("------Unallocated------"); test(NUM_COLORS); preAllocate(NUM_COLORS); log("-----Pre-allocated-----"); test(NUM_COLORS); } private function preAllocate(numColors:int): void { var i:int; var tempLL:Vector.<Color4LinkedList> = new Vector.<Color4LinkedList>(numColors); for (i = 0; i < numColors; ++i) { tempLL[i] = Color4LinkedList.getColor(); } for (i = 0; i < numColors; ++i) { Color4LinkedList.freeColor(tempLL[i]); } var tempV:Vector.<Color4Vector> = new Vector.<Color4Vector>(numColors); for (i = 0; i < numColors; ++i) { tempV[i] = Color4Vector.getColor(); } for (i = 0; i < numColors; ++i) { Color4Vector.freeColor(tempV[i]); } } private function test(numColors:int): void { var cll:Color4LinkedList; var cv:Color4Vector; var i:int; // Test with pre-allocated linked list-based free list var beforeTime:int = getTimer(); for (i = 0; i < numColors; ++i) { cll = Color4LinkedList.getColor(); } var afterTime:int = getTimer(); log("Allocation from linked list-based free list: " + (afterTime - beforeTime)); // Test with vector-based free list beforeTime = getTimer(); for (i = 0; i < numColors; ++i) { cv = Color4Vector.getColor(); } afterTime = getTimer(); log("Allocation from vector-based free list: " + (afterTime - beforeTime)); // Test with direct allocation beforeTime = getTimer(); for (i = 0; i < numColors; ++i) { cll = new Color4LinkedList(); } afterTime = getTimer(); log("Direct allocation: " + (afterTime - beforeTime)); } private function log(msg:*): void { __logger.appendText(msg + "\n"); } } } class Color4LinkedListNode { public var c:Color4LinkedList; public var next:Color4LinkedListNode; } class Color4LinkedList { private static var __freeList:Color4LinkedListNode; public var a:Number; public var r:Number; public var g:Number; public var b:Number; public function Color4LinkedList(a:Number = 0.0, r:Number = 0.0, g:Number = 0.0, b:Number = 0.0) { this.a = a; this.r = r; this.g = g; this.b = b; } public static function getColor(a:Number = 0.0, r:Number = 0.0, g:Number = 0.0, b:Number = 0.0): Color4LinkedList { // Already have one, take the head if (__freeList) { var ret:Color4LinkedList = __freeList.c; ret.a = a; ret.r = r; ret.g = g; ret.b = b; __freeList = __freeList.next; return ret; } // Have none, make new return new Color4LinkedList(a, r, g, b); } public static function freeColor(c:Color4LinkedList): void { // Add to the head var node:Color4LinkedListNode = new Color4LinkedListNode(); node.c = c; node.next = __freeList; __freeList = node; } } class Color4Vector { private static var __freeList:Vector.<Color4Vector> = new Vector.<Color4Vector>(); public var a:Number; public var r:Number; public var g:Number; public var b:Number; public function Color4Vector(a:Number = 0.0, r:Number = 0.0, g:Number = 0.0, b:Number = 0.0) { this.a = a; this.r = r; this.g = g; this.b = b; } public static function getColor(a:Number = 0.0, r:Number = 0.0, g:Number = 0.0, b:Number = 0.0): Color4Vector { // Already have one, take the head var freeList:Vector.<Color4Vector> = __freeList; if (freeList.length > 0) { var endIndex:int = freeList.length-1; var ret:Color4Vector = freeList[endIndex]; freeList.length = endIndex; ret.a = a; ret.r = r; ret.g = g; ret.b = b; return ret; } // Have none, make new return new Color4Vector(a, r, g, b); } public static function freeColor(p:Color4Vector): void { __freeList.push(p); } }
The results I get on a 3.0Ghz Intel Core 2 Duo on Windows XP SP3 are:
------Unallocated------ Allocation from linked list-based free list: 83 Allocation from vector-based free list: 88 Direct allocation: 74 -----Pre-allocated----- Allocation from linked list-based free list: 6 Allocation from vector-based free list: 12 Direct allocation: 126
As we can see from the test numbers, the linked list approach is faster than the vector approach in all cases, especially when actually making use of pre-allocated objects. So we can safely discard the vector approach. The linked list approach is, on average, 12% slower than the direct approach when the objects are not already allocated. The linked list approach is, on average, 2100% faster when the objects are already allocated. This is a huge win! This is nearly 175 times as much a win as a loss. You can use this as a rule of thumb: if you expect to “allocate” out of the free list more than 1 of every 175 allocations, you will have a performance gain by using the free list.
You may have noticed that, paradoxically, the free list approach to limit memory allocation actually introduces a memory allocation when freeing. The speed of freeing was not tested, but clearly the performance will be somewhat given back when it is included into the mix. Happily, we can make a minor modification to the test program to get rid of even this allocation, at least after things have settled. When getColor() returns objects off of the free list, the node that was holding the object is simply discarded. Why don’t we keep it around in a list of free nodes? The terminology can be a bit mind bending, but avoiding allocation is definitely worth it. Below is a modified version of the above test app that incorporates this optimization. I have removed the vector-based free list approach and the direct allocation approach since this is an optimization to the linked list-based approach only and has no bearing on the other two approaches. The result is a nice example app showing only the overall most efficient approach.
package { import flash.display.*; import flash.events.*; import flash.geom.*; import flash.system.*; import flash.text.*; import flash.utils.*; public class FreeListTest extends Sprite { private var __logger:TextField = new TextField(); public function FreeListTest() { __logger.autoSize = TextFieldAutoSize.LEFT; addChild(__logger); const NUM_COLORS:int = 10000000; log("------Unallocated------"); test(NUM_COLORS); preAllocate(NUM_COLORS); log("-----Pre-allocated-----"); test(NUM_COLORS); } private function preAllocate(numColors:int): void { var i:int; var temp:Vector.<Color4LinkedList> = new Vector.<Color4LinkedList>(numColors); for (i = 0; i < numColors; ++i) { temp[i] = Color4LinkedList.getColor(); } for (i = 0; i < numColors; ++i) { Color4LinkedList.freeColor(temp[i]); } temp = null; } private function test(numColors:int): void { var cll:Color4LinkedList; var i:int; // Test with linked list-based free list var beforeTime:int = getTimer(); for (i = 0; i < numColors; ++i) { cll = Color4LinkedList.getColor(); } var afterTime:int = getTimer(); log("Allocation from linked list-based free list: " + (afterTime - beforeTime)); } private function log(msg:*): void { __logger.appendText(msg + "\n"); } } } class Color4LinkedListNode { public var c:Color4LinkedList; public var next:Color4LinkedListNode; } class Color4LinkedList { /** Head of the list of already-freed colors */ private static var __freeColorsHead:Color4LinkedListNode; /** Head of the list of already-freed nodes */ private static var __freeNodesHead:Color4LinkedListNode; public var a:Number; public var r:Number; public var g:Number; public var b:Number; public function Color4LinkedList(a:Number = 0.0, r:Number = 0.0, g:Number = 0.0, b:Number = 0.0) { this.a = a; this.r = r; this.g = g; this.b = b; } public static function getColor(a:Number = 0.0, r:Number = 0.0, g:Number = 0.0, b:Number = 0.0): Color4LinkedList { // Already have one, take the head if (__freeColorsHead) { var headNode:Color4LinkedListNode = __freeColorsHead; __freeColorsHead = __freeColorsHead.next; var color:Color4LinkedList = headNode.c; color.a = a; color.r = r; color.g = g; color.b = b; headNode.c = null; headNode.next = __freeNodesHead; __freeNodesHead = headNode; return color; } // Have none, make new return new Color4LinkedList(a, r, g, b); } public static function freeColor(c:Color4LinkedList): void { // Get a node to put on the free list var node:Color4LinkedListNode; if (__freeNodesHead) { node = __freeNodesHead; __freeNodesHead = __freeNodesHead.next; } else { node = new Color4LinkedListNode(); } node.c = c; // Add to the head node.next = __freeColorsHead; __freeColorsHead = node; } }
The results for this on the same computer are:
------Unallocated------ Allocation from linked list-based free list: 88 -----Pre-allocated----- Allocation from linked list-based free list: 33
As we can see, this approach adds quite a bit of work to the pre-allocated time due to keeping around the nodes rather than simply discarding them. Overall, it is still far faster than any other approach as it will not trigger undue allocation and garbage collection. It is hard to test the exact performance penalty levied by the GC though, as it cannot be simply wrapped in calls to getTimer().
Here are the outputs of both test programs merged into one table:
Approach | Unallocated | Pre-allocated |
---|---|---|
Vector | 88 | 12 |
Direct | 74 | 126 |
Linked List | 83 | 6 |
Linked List (recycling nodes) | 88 | 33 |
One possible criticism of this approach is that memory is never released to the operating system since the GC never collects the allocated memory. Globally, this would be valid if the GC actually ever released any memory to the operating system once it had collected it. However, it does not do this. You may verify this by simply allocating a lot of memory, dereferencing it, and watching Task Manager or Activity Monitor to see if your web browser’s memory usage goes down. It will not until you exit the Flash app. Locally, however, this is something of a problem. Memory deallocated by the GC is available for any other purpose. However, our free list approach reserves this memory for only further allocation of the specific type of object using the free list approach. For this reason, it may be a good idea to implement a clear() function to clear out free lists.
Another possible criticism is that we must add this free list code to each class that uses it. This is mostly true. Since the free list makes use of static variables and functions to do its work, we cannot simply have a base class or interface/helper class pair to do this work. We could use some of Flash’s more dynamic features to attempt an “FreeList” class, presumably able to control the allocation of many objects. However, its interface would necessarily involve taking and returning Objects or untyped (*) variables that would need to be casted, thus destroying the performance benefits.
A final criticism is that the programmer must do some extra memory management work by using a utility method to allocate and deallocate objects rather than simply using the new operator and dereferencing objects. This means that your program will be more error prone. However, what are the consequences of forgetting to use the free list utility methods? If a programmer uses the new operator, the speed benefits are simply not realized. If the programmer does not use the utility method for freeing the object, the same is true although there could be a buildup of available nodes that causes a memory leak. Therefore, this is also a valid criticism. Taken together, these three criticisms are enough to relegate the free list technique to advanced users and only in areas where allocation/deallocation time is a demonstrable performance problem.
The applications for this technique seem to many. Obvious examples are any place that lots of tiny objects are used. For example, a particle engine may allocate a thousand instances of a Particle class per second as it spews them from an emitter. As they die out, due to a range or fade or some other reason, the programmer will dereference them and the GC will do its collection work. Instead, why no use a linked list-based free list to gain a 2100% speedup in the memory allocation/deallocation work?
#1 by Troy Gilbert on September 16th, 2009 ·
Would using an array instead of a linked-list keep from having to use the intermediate free list node to wrap your allocated object? Would that allow for a compromise between the “recycling” linked-list and linked-list performances?
I’m thinking along these lines: treat the array like a stack, push freed (or recycled) objects onto it, pop them off to allocate.
#2 by jackson on September 16th, 2009 ·
Thanks for your comment. I think using an array would be just like using a vector, only slower. Did you have something in mind about arrays that would make them faster in this instance than vectors? If so, I’d love to hear it. Thanks again.
#3 by Promethe on June 15th, 2010 ·
Hello jackson!
I noticed that in Flash 10.1, the Flash player actually releases the garbage collected memory.
It works in both the desktop player and in the browser. And Sytem.gc() is now working perfectly fine!
Regards,
#4 by jackson on June 15th, 2010 ·
Cool! This is one I’m definitely going to re-test with Flash Player 10.1.
#5 by Henke37 on September 15th, 2010 ·
Toss out the list requirement and you are describing an object pool.
#6 by jackson on September 15th, 2010 ·
Yes, but you have to hold the objects somehow. This is why I tried two linked list approaches as well as
Vector
. Do you have an alternate collection to suggest?