Array Performance
Today’s article is inspired by Jean-Philippe Auclair’s excellent article on arrays and the comment he recently posted on my article about map performance. In it he discusses how the Array class is implemented using a densely-packed C/C++ array (a contiguous block of memory) and, after the first undefined element, a hash table. This got me thinking of three questions, together the subject of today’s article. First, when an Array‘s densely-packed portion is split using the delete operator, how fast is this? Secondly, can it be re-joined? Third, how fast is rejoining it?
As Jean-Philippe shows in his article, reading from the dense portion of the Array is faster by 2-3x. I will confirm that observation with code later in the article. This means that if you want your Array accesses to be fast, you want to deal with the dense portion as much as possible and minimize your usage of the sparse (hash table) portion. This is the conclusion that Jean-Philippe has reached and I concur with. He then notes that when the dense portion of the Array is broken up using the delete operator, everything after the deleted element is moved from the C/C++ array to the hash table. This seems like a very expensive line of code!
delete array[0];
The above single, innocuous-looking line actually results in a huge C/C++ array resize and hash table population. It is therefore vastly slower than setting the array element to a value treated as empty:
array[0] = -1;
The value should be defined as a compile-time constant or AS3 const and be chosen as a value that is always invalid and never naturally occurring in the data. This may be a non-issue if you never read from that element again and it may be a huge issue if it causes you to do lots of error checking. Nevertheless, it is an option to avoid the cost of splitting apart the dense portion of the Array.
Secondly, what does it take to re-join the dense portion once it is split? Consider these three attempts and try to guess which of them results in an empty sparse hash table and a full dense C/C++ array:
var array:Array = []; for (var i:int = 0; i < 10; ++i) { array[i] = VALUE; } // Split the dense portion at its beginning delete array[0]; // Step A: Try to restore by just plugging the gap array[0] = VALUE; // Step B: Try to restore by plugging the gap with an overlap array[0] = VALUE; array[1] = VALUE; // Step C: Try to restore by filling up the whole array again for (i = 0; i < SIZE; ++i) { array[i] = VALUE; }
So which step fully rejoins the dense portion? Step A, B, or C? Let’s look at the full test program and results to see:
package { import flash.display.*; import flash.utils.*; import flash.text.*; /** * An app to test Array dense/sparse performance * @author Jackson Dunstan */ public class ArrayPerformanceTest extends Sprite { public function ArrayPerformanceTest() { var logger:TextField = new TextField(); logger.autoSize = TextFieldAutoSize.LEFT; addChild(logger); function log(msg:*): void { logger.appendText(msg + "\n"); } var array:Array = []; var i:int; const SIZE:int = 10000000; var beforeTime:int; const VALUE:int = 33; for (i = 0; i < SIZE; ++i) { array[i] = VALUE; } beforeTime = getTimer(); for (i = 0; i < SIZE; ++i) { array[i]; } log("Dense read: " + (getTimer()-beforeTime)); // Break up the dense array beforeTime = getTimer(); delete array[0]; log("Break dense array: " + (getTimer()-beforeTime)); beforeTime = getTimer(); for (i = 0; i < SIZE; ++i) { array[i]; } log("Sparse read: " + (getTimer()-beforeTime)); // Try to restore by just plugging the gap array[0] = VALUE; beforeTime = getTimer(); for (i = 0; i < SIZE; ++i) { array[i]; } log("Plug the gap, sparse read: " + (getTimer()-beforeTime)); // Try to restore by plugging the gap with an overlap array[0] = VALUE; array[1] = VALUE; beforeTime = getTimer(); for (i = 0; i < SIZE; ++i) { array[i]; } log("Plug the gap with overlap, sparse read: " + (getTimer()-beforeTime)); // Try to restore by filling up the whole array again beforeTime = getTimer(); for (i = 0; i < SIZE; ++i) { array[i] = VALUE; } log("Full restore time: " + (getTimer()-beforeTime)); beforeTime = getTimer(); for (i = 0; i < SIZE; ++i) { array[i]; } log("Full restore, dense read: " + (getTimer()-beforeTime)); } } }
The results may surprise you:
Environment | Dense Read | Break Dense Array | Sparse Read | Plug Gap | Plug Gap w/ Overlap | Full Restore | Post-Restore Read |
---|---|---|---|---|---|---|---|
2.2 Ghz Intel Core 2 Duo, 2 GB RAM, Mac OS X 10.6 | 122 | 932 | 363 | 362 | 362 | 422 | 123 |
The worst-case scenario seems to have come true. First, it is tremendously expensive to break the dense array up. In this case it took about 9x longer than iterating over the whole Array when it is densely packed! The above suggestion for avoiding this breakup can surely be useful when you’re dealing with a large Array. Second, step C is actually required in order to rejoin the dense and sparse portions of the Array. That is, you have to go so far as to actually re-fill every element in order to re-establish the dense portion. Lastly, that re-filling is very expensive. It takes about 4x longer than iterating over the whole Array when it is densely packed to re-establish the dense portion. These are more good reasons to avoid ever breaking up the dense portion.
My thanks to Jean-Philippe for the excellent and inspiring article. If you’ve read all of this article and still haven’t read his, what are you waiting for? :)
#1 by jpauclair on December 14th, 2009 ·
Thanks a lot for adding to my article. I appreciate it.
Though I’m not sure of your conclusion.
Why is step C required? Are you talking about
an array with multiple undefined values?
You only need to fill “undefined” value. so when you remove the first element, you must fill that one only to get back the whole dense array. tamarin is doing the iterative work to transfert all values (until first undefined) to the dense part.
as for the “-1” value, it can also be “null”. since null is a valid atom for tamarin, you can use null for filling holes.
If you have an array with 75% of valid values in it, I would suggest filling all others with “null” values to make your array faster.
If you read my next article you’ll see flash is allocating memory in block of 4K, so filing empty values or nulll-initialisation of values is not heavy on memory, it will only take some more process at the beginning. http://jpauclair.wordpress.com/2009/12/05/tamarin-part-ii-more-on-array-and-vector/
#2 by jackson on December 14th, 2009 ·
The test app shows that step C is required by the testing times. Iterating is initially fast (dense reads). I then delete element 0 and iterating becomes slow (sparse reads). I then fill the undefined value at 0 with a non-undefined value (33), but iterating is still slow (indicating sparse). I then fill with an overlap by filling in element 0 and element 1, but iterating is still slow (sparse). I then fill in all the elements again and iterating is fast again (dense restored).
Good point about the use of null. That would be a good choice for an Array where null is not a naturally-occurring element. For an array of class instances though, it just might. Perhaps in that case it would be better to use -1 and in the case of an Array of int, Number, or the like to use null. Thanks for the suggestion!
#3 by jpauclair on December 14th, 2009 ·
Hey! the reslut you had was not coherent from what I did read in Tamarin source.
Check this out!
http://jpauclair.wordpress.com/2009/12/14/tamarin-part-i-i-as3-array-bug/
#4 by jackson on December 14th, 2009 ·
Crazy! When you get that Adobe bug filed, reply here and we can vote it up. Thanks for finding this.
#5 by jpauclair on December 15th, 2009 ·
I just added it: https://bugs.adobe.com/jira/browse/FP-3477