Loop Speed
AS3 gives you a good number of potential ways you can loop over collections. When Flash Player 10 first came out, I went ahead and tested out the new Vector class in a variety of ways. One of them was to pit it against the collections available in Flash Player 9: Array, Object, Dictionary, ByteArray, and even BitmapData. Below I’ll show you my test and discuss its results.
First, have a look at the test app:
package { import flash.display.*; import flash.utils.*; import flash.text.*; public class MyApp extends Sprite { public function MyApp() { const SIZE:uint = 6000; const REPS:uint = 2000; var i:int; var s:String; var a:int; var b:String; var rep:int; var res:String = "For-each times:\n"; var before:uint; var after:uint; var arr:Array = new Array(SIZE); for (i = 0; i < SIZE; ++i) { arr[i] = int(Math.random()*1000); } var vecFixed:Vector.<int> = new Vector.<int>(SIZE, true); for (i = 0; i < SIZE; ++i) { vecFixed[i] = arr[i]; } var vecVariable:Vector.<int> = new Vector.<int>(SIZE, false); for (i = 0; i < SIZE; ++i) { vecVariable[i] = arr[i]; } var bmdAlpha:BitmapData = new BitmapData(SIZE, 1, true); for (i = 0; i < SIZE; ++i) { bmdAlpha.setPixel32(i, 0, arr[i]); } var bmdNoAlpha:BitmapData = new BitmapData(SIZE, 1, false); for (i = 0; i < SIZE; ++i) { bmdNoAlpha.setPixel32(i, 0, arr[i]); } var obj:Object = {}; for (i = 0; i < SIZE; ++i) { obj[i] = i; } var dictStrong:Dictionary = new Dictionary(); for (i = 0; i < SIZE; ++i) { dictStrong[i] = i; } var dictWeak:Dictionary = new Dictionary(true); for (i = 0; i < SIZE; ++i) { dictWeak[i] = i; } var ba:ByteArray = new ByteArray(); ba.length = SIZE; for (i = 0; i < SIZE; ++i) { ba[i] = i; } before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in arr) { a = i; } } after = getTimer(); res += "\tArray: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in vecFixed) { a = i; } } after = getTimer(); res += "\tFixed vec: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in vecVariable) { a = i; } } after = getTimer(); res += "\tVariable vec: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in obj) { a = i; } } after = getTimer(); res += "\tObject: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in dictStrong) { a = i; } } after = getTimer(); res += "\tDictionary (strong keys): " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in dictWeak) { a = i; } } after = getTimer(); res += "\tDictionary (weak keys): " + (after-before) + "\n"; res += "For-in times:\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in arr) { a = arr[s]; } } after = getTimer(); res += "\tArray: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in vecFixed) { a = vecFixed[s]; } } after = getTimer(); res += "\tFixed vec: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in vecVariable) { a = vecVariable[s]; } } after = getTimer(); res += "\tVariable vec: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in obj) { a = obj[s]; } } after = getTimer(); res += "\tObject: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in dictStrong) { a = dictStrong[s]; } } after = getTimer(); res += "\tDictionary (strong keys): " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in dictWeak) { a = dictWeak[s]; } } after = getTimer(); res += "\tDictionary (weak keys): " + (after-before) + "\n"; res += "For times:\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = arr[i]; } } after = getTimer(); res += "\tArray: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = vecFixed[i]; } } after = getTimer(); res += "\tFixed vec: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = vecVariable[i]; } } after = getTimer(); res += "\tVariable vec: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = bmdAlpha.getPixel32(i, 0); } } after = getTimer(); res += "\tBMD w/ alpha getPixel32: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = bmdNoAlpha.getPixel32(i, 0); } } after = getTimer(); res += "\tBMD w/o alpha getPixel32: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = bmdAlpha.getPixel(i, 0); } } after = getTimer(); res += "\tBMD w/ alpha getPixel: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = bmdNoAlpha.getPixel(i, 0); } } after = getTimer(); res += "\tBMD w/o alpha getPixel: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = obj[i]; } } after = getTimer(); res += "\tObject: " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = dictStrong[i]; } } after = getTimer(); res += "\tDictionary (strong keys): " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = dictWeak[i]; } } after = getTimer(); res += "\tDictionary (weak keys): " + (after-before) + "\n"; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = ba[i]; } } after = getTimer(); res += "\tByteArray: " + (after-before) + "\n"; var tf:TextField = new TextField(); tf.autoSize = TextFieldAutoSize.LEFT; tf.text = res; addChild(tf); } } }
And the results on a 2.2 Ghz Intel Core 2 Duo with 2 GB of RAM on Mac OS X 10.6:
For-each times: Array: 377 Fixed vec: 455 Variable vec: 448 Object: 721 Dictionary (strong keys): 652 Dictionary (weak keys): 666 For-in times: Array: 8730 Fixed vec: 8983 Variable vec: 8940 Object: 8805 Dictionary (strong keys): 8816 Dictionary (weak keys): 8884 For times: Array: 203 Fixed vec: 105 Variable vec: 104 BMD w/ alpha getPixel32: 290 BMD w/o alpha getPixel32: 222 BMD w/ alpha getPixel: 300 BMD w/o alpha getPixel: 228 Object: 592 Dictionary (strong keys): 487 Dictionary (weak keys): 487 ByteArray: 244
Wow, what a range! Depending on your choice of collection and looping strategy, you could see results as low as 104 milliseconds or as high as 8940 milliseconds! That’s over 85x slower!
First things first, for-in loops are just plain slow. Sure, there is conversion to a String if the collection is not keyed by Strings. This is true for Dictionaries where you can use any object for keys as well as Arrays and Vectors which use integers as keys. However it doesn’t seem to make much difference as all collections perform just about equally badly. Bottom line here: stay away from for-in loops unless you really need the keys of an Object or Dictionary.
Once we get away from the awful slowness of for-in loops things get a lot closer together. There is still quite a range though! Consider the for-each loops where the range is from 377 milliseconds to 721 milliseconds. It is about 1.91x faster to iterate over an Array with a for-each loop than it is to iterate over an Object. Dictionaries are also on the slow side and Vectors are on the fast side. It’s quite disappointing and unexpected to see Vectors not be the fastest here. But what if we turn to the lowly and plain for loop?
Here we find our performance champion: Vectors iterated over by a plain old for loop. They are twice as fast as iterating over an Array with a plain old for loop, which is the runner up for speed. This is clearly what Adobe had in mind when they introduced Vectors for increased speed. The Objects and Dictionaries also don’t fare so badly; this is the fastest loop to iterate over them with as well. Likewise, for loops are the only way to iterate over a ByteArray, but doing so is slower than a regular Array so there isn’t much point unless you already have a ByteArray for other reasons.
The for loop is the only way to iterate over the oddball collection here: BitmapData. BitmapData is supposed to hold the pixels of a raster image, but you could choose to use it to store any 24- or 32-bit integers you wanted to in some limited capacity, such as image dimensions. You also have to deal with pre-multiplying and the second (Y) dimension of storage if you want to hold more elements in storage than Flash allows for the width of a BitmapData. But this is all mostly moot as it seems that BitmapData can’t seem to beat out Arrays and Vectors for the speed crown, although it does come close. It’s also the only way to iterate over a ByteArray, but that’s comparably slow.
Here are the above results in a tabular format. I’ve also bolded the fastest choice for each type of loop.
Collection | For-Each | For-In | For |
---|---|---|---|
Array | 377 | 8730 | 203 |
Fixed Vector | 455 | 8983 | 105 |
Variable Vector | 448 | 8940 | 104 |
BitmapData w/ alpha getPixel32 | n/a | n/a | 290 |
BitmapData w/o alpha getPixel32 | n/a | n/a | 222 |
BitmapData w/ alpha getPixel | n/a | n/a | 300 |
BitmapData w/o alpha getPixel | n/a | n/a | 228 |
Object | 721 | 8805 | 592 |
Dictionary (strong keys) | 652 | 8816 | 487 |
Dictionary (weak keys) | 666 | 8884 | 487 |
ByteArray | n/a | n/a | 244 |
In conclusion, if you want speed you want a for loop.
#1 by Jonnie on October 7th, 2009 ·
I bet after all that work, you’re ready to add while and do while loops to the mix. ;)
#2 by jackson on October 7th, 2009 ·
I sure hope those are equally fast as regular for loops. Knowing Flash, I should probably do the test anyhow…
#3 by Jonnie on October 8th, 2009 ·
I do remember the speeds varying on a case-by-case basis, so you might want to.
#4 by jackson on October 8th, 2009 ·
I tried out the while and do-while loops, but don’t see anything statistically different from for loops. Here are my numbers on an 3.0 Ghz Intel Core 2 Duo with 2GB of RAM on Windows XP:
#5 by Li on October 8th, 2009 ·
Excellent test. I’ve seen similar ones but this is so clear and effective. Thanks!
#6 by jwopitz on September 22nd, 2010 ·
Have you found any performance drop when using complex object types for the Vectors. I can’t remember where I heard this but accessing the following:
for.. Vector
is much slower than
for.. Vector
#7 by jwopitz on September 22nd, 2010 ·
Sorry the Vector type brackets didn’t show. I meant:
Vector (ComplexObject)
vs
Vector (int or String or other simple types)
#8 by jackson on September 22nd, 2010 ·
I’ve actually got a two-part series of articles that covers this: Array vs. Vector. In there I explicitly test with different types of elements:
int
,uint
,Number
,Boolean
, andString
. Check it out for the full performance differences as well as the followup article testing performance with Flash Player 10.1.#9 by jardilio on September 22nd, 2010 ·
A friend just referred me to this page and I think that the test scenario in play here is getting optimized by the compiler as you are not really doing anything with the values in the array for the for loop case. I condensed your test to just working with the for and for each for an array and the only thing I modified was change “a = ?” to “a += ?”. I ended up getting very different results where the foreach was actually much faster. I think what was happening in your test case was that the value from the array was not actually getting retrieved in the for loop as it was getting optimized by the compiler.
#10 by jackson on September 22nd, 2010 ·
I opened the test in nemo440 and looked at the bytecode the compiler had generated and it definitely hasn’t optimized any of the test out. Actually, it virtually never optimizes out any of your code. Take this example of both dead code and constant folding I just whipped up:
Clearly this does nothing, but here’s what the compiler (MXMLC 4.1) generates:
As you can see, it didn’t do either optimization.
Still, your results actually don’t differ from mine in the first place, at least if we compensate for CPU speed. Your
for
loop time is 2.42x slower than yourfor-each
time. If you refer to my Flash Player 10.1 performance update of this article, you’ll see that myfor
loop time is 2.44x slower than myfor-each
time.#11 by erik on June 7th, 2011 ·
I am curious if these findings from WIN 10,3,181,22 are worth considering?
uint < uint 153
int < int 105
int < uint 164
uint < int 142
int < (int)uint 111
It suggests to me that when incrementing through an array/vector, casting vector.length to an int, and also counting up with an int is a worthwhile extra step.
#12 by jackson on June 7th, 2011 ·
It may be. I’ll investigate some more of the
int
/uint
effects. In the meantime, check out my more general article on operator speed. As noted elsewhere, a more focused test on just the<
operator is needed to really see the effects ofint
/uint
.Thanks for the tip!
#13 by skyboy on June 9th, 2011 ·
I think this is related to my findings of uints being converted to floats in conditionals and brackets (
++u; array[u]
faster thanarray[++u]
). The problem is entirely the compiler, and it’s strange how Adobe has completely missed these problems.Also, in a further modified operator speed test I’m going to get around to finishing (1000 operations per loop using Apparat to help. Still a lot of manual work) I noticed that the special opcode for inc/dec Number/int are 2x slower than the 3 op code for get/inc/set and at the same time and, when not used in the above areas, boolean operators are O(1) operations on all numeric datatypes and on all mixtures of numeric datatypes.
#14 by jackson on June 10th, 2011 ·
Can’t wait to read that article! :)