Flash Player 10.2 Performance: Part 1
From a performance perspective, lots changed in Flash 10.1 (see part 1, 2, 3, 4, 5, 6). Flash Player 10.2 was officially released last week, so it’s time to update this site’s many performance tests to the new player. This time around I’ll be updating more performance tests per part of this series, so hopefully everything will be updated a lot quicker than last time. Read on for the updates!
Test Environment
All tests in this performance update use the same environment:
- Flex SDK (MXMLC) 4.1.0.16076, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 10.1.102.64 or 10.2.152.26
- 2.8 Ghz Intel Xeon W3530
- Windows 7
Free Lists
Flash Player 10.1 Performance:
Approach | Unallocated | Pre-allocated |
---|---|---|
Vector | 32 | 16 |
Direct | 15 | 47 |
Linked List | 31 | 15 |
Linked List (recycling nodes) | 31 | 15 |
Flash Player 10.2 Performance
Approach | Unallocated | Pre-allocated |
---|---|---|
Vector | 22 | 9 |
Direct | 21 | 50 |
Linked List | 22 | 5 |
Linked List (recycling nodes) | 22 | 27 |
Direct allocation seems to have taken a performance hit, which is a real shame because it happens all the time. On the plus side, we seem to be able to make up for it by using free lists. The recycling technique though, is now antiquated.
Namespaces As Function Pointers
Explicit Namespace | Namespace Variable | No Namespace | |
---|---|---|---|
Flash Player 10.1 | 109 | 7046 | 100 |
Flash Player 10.2 | 103 | 6191 | 103 |
There’s not much change here, but using namespace variables seems about 15% faster.
XOR Swap
Assign Swap | XOR Swap | |
---|---|---|
Flash Player 10.1 | 233 | 302 |
Flash Player 10.2 | 233 | 302 |
No change here: XOR swap is still slower and less readable.
Runnables as Function Pointers
Function Object | Runnable | Direct | |
---|---|---|---|
Flash Player 10.1 | 163 | 50 | 57 |
Flash Player 10.2 | 204 | 54 | 54 |
Method call speed is pretty much unchanged, but calls through Function
objects are now 25% slower. This is a real bummer since they are the basis of most callback and signal/event systems (except TurboSignals, which uses runnables).
Loop Speed
Flash Player 10.1 Performance:
Collection | For-each | For-in | For |
---|---|---|---|
Array | 190 | 4259 | 85 |
Fixed Vector | 255 | 4241 | 69 |
Variable Vector | 256 | 4280 | 70 |
Object | 506 | 4323 | 234 |
Dictionary (strong keys) | 511 | 4504 | 287 |
Dictionary (weak keys) | 579 | 4579 | 270 |
BMD w/ alpha getPixel32 | n/a | n/a | 183 |
BMD w/o alpha getPixel32 | n/a | n/a | 167 |
BMD w/ alpha getPixel | n/a | n/a | 182 |
BMD w/o alpha getPixel | n/a | n/a | 171 |
ByteArray | n/a | n/a | 109 |
Flash Player 10.2 Performance:
Collection | For-each | For-in | For |
---|---|---|---|
Array | 206 | 4564 | 69 |
Fixed Vector | 251 | 4709 | 70 |
Variable Vector | 254 | 4706 | 67 |
Object | 532 | 4630 | 253 |
Dictionary (strong keys) | 568 | 4850 | 261 |
Dictionary (weak keys) | 568 | 4901 | 261 |
BMD w/ alpha getPixel32 | n/a | n/a | 185 |
BMD w/o alpha getPixel32 | n/a | n/a | 168 |
BMD w/ alpha getPixel | n/a | n/a | 184 |
BMD w/o alpha getPixel | n/a | n/a | 172 |
ByteArray | n/a | n/a | 95 |
There are a lot of figures here and they vary a little from test to test, but overall not much has changed. One notable exception is that for-in
loops are slower across the board by about 10%.
Try/Catch Slowdown
Try/Catch | No Try/Catch | |
---|---|---|
Flash Player 10.1 | 735 | 704 |
Flash Player 10.2 | 770 | 735 |
With or without the try/catch
, both versions are 5% slower in 10.2.
Building XML
XML Class | String Class | |
---|---|---|
Flash Player 10.1 | 57 | 1 |
Flash Player 10.2 | 48 | 1 |
XML is now about 19% faster, but still nearly 50x slower than just using a String
.
Shape vs. Sprite
Shape FPS | Sprite FPS | Shape Memory | Sprite Memory | |
---|---|---|---|---|
Flash Player 10.1 | 60 | 60 | 34524 | 50908 |
Flash Player 10.2 | 60 | 60 | 35776 | 55228 |
There’s no change in the performance as it was already capped at 60 FPS. As for memory, Shape
is using about the same amount and Sprite
is using about 8% more.
Function Performance
Plain | Local | Var | Method | Static | Override | super | Interface Direct | Interface via Interface | Interface via Class | |
---|---|---|---|---|---|---|---|---|---|---|
Flash Player 10.1 | 259 | 215 | 216 | 54 | 62 | 52 | 57 | 54 | 118 | 54 |
Flash Player 10.2 | 321 | 257 | 271 | 53 | 60 | 56 | 57 | 55 | 51 | 53 |
As we saw in the runnables test above, Function
objects—plain, local, var—are slower in 10.2. On the plus side, calling an interface method via an interface object no longer carries a 2x performance slowdown and is now just as fast as an interface method call directly or via a class instance. This is a big win for anyone who uses a lot of interfaces!
Simple Regular Expressions
String.lastIndexOf() | String.indexOf() | RegExp.test() | RegExp.exec() | |
---|---|---|---|---|
Flash Player 10.1 | 3 | 3 | 97 | 95 |
Flash Player 10.2 | 3 | 3 | 94 | 91 |
There may be a very slight boost to regular expression speed here, but it may also just be statistical variance.
Beware of Getters and Setters
Sprite | Point | MySprite | MyPoint | |
---|---|---|---|---|
Flash Player 10.1 | 183 | 18 | 32 | 78 |
Flash Player 10.2 | 182 | 26 | 27 | 70 |
These are strange results! The non-getter field access (Point.x
) got slower by 44% and the getter field access (MyPoint.x
) got faster by 11%. The 4.3x performance boost for using variables instead of getters/setters is now narrowed to only 2.7x, which is a shame as it is now harder to improve field access performance.
Var Args Is Slow
Pre-Allocated Array | Dynamically-Allocated Array | Var Args | |
---|---|---|---|
Flash Player 10.1 | 16 | 109 | 109 |
Flash Player 10.2 | 12 | 170 | 7 |
Wow, var args has been amazingly optimized in Flash Player 10.2! It’s now even faster than a pre-allocated Array
, meaning it’s probably not even using an Array
behind the scenes anymore. This is great news for any fan of var args!
Faster isNaN()
Since this article has been superseded by this followup article, I won’t be updating Faster isNaN() anymore.
Inlining Math Functions
Function | Player 10.1 | Player 10.2 |
---|---|---|
abs | 10 inline, 15 Math | 9 inline, 14 Math |
ceil | 14 inline, 18 Math | 13 inline, 17 Math |
floor | 13 inline, 18 Math | 13 inline, 18 Math |
max | 258 inline, 46 Math | 61 inline, 46 Math |
min | 249 inline, 46 Math | 60 inline, 47 Math |
max2 | 14 inline, 18 Math | 14 inline, 21 Math |
min2 | 14 inline, 16 Math | 14 inline, 20 Math |
The only real change here is the big speedups for the inlined version of min
and max
. They’re still slower than the regular Math
versions, so there’s not much point to using them.
Map Performance
Class | Player 10.1 | Player 10.2 |
---|---|---|
Array | 47 hit, 222 miss | 44 hit, 247 miss |
Vector Dynamic | 42 hit, 7090 miss | 42 hit, 6470 miss |
Vector Fixed | 41 hit, 7106 miss | 43 hit, 6455 miss |
Object | 150 hit, 182 miss | 137 hit, 206 miss |
Dictionary Strong | 141 hit, 242 miss | 148 hit, 276 miss |
Dictionary Weak | 141 hit, 249 miss | 146 hit, 278 miss |
BitmapData no alpha, getPixel | 93 hit, 78 miss | 96 hit, 75 miss |
BitmapData no alpha, getPixel32 | 94 hit, 94 miss | 93 hit, 90 miss |
BitmapData alpha, getPixel | 93 hit, 76 miss | 92 hit, 74 miss |
BitmapData alpha, getPixel32 | 85 hit, 67 miss | 90 hit, 73 miss |
ByteArray | 57 hit, 50 miss | 53 hit, 49 miss |
The performance penalty—due to the Error
that gets thrown—for missing on a Vector
has been reduced by about 10%. Otherwise, nothing much has changed.
More To Come
I’ll reserve any general conclusions until the series has concluded, but for now the performance is quite mixed. Stay tuned for part two!
#1 by Jeff on February 14th, 2011 ·
Well just in reading through this list, it seems like 10.2 is… slower (?) Which is surprising and disappointing. Thanks for the article and look forward to more!
#2 by Matan Uberstein on February 14th, 2011 ·
Thanks for the insight! This was really unexpected! Like Jeff says, it seems like 10.2 is mostly slower than 10.1 – really weird!
#3 by Marvin Blase on February 14th, 2011 ·
Great overview, thanks a lot.
#4 by as3isolib on February 14th, 2011 ·
What I am getting from these player updates is that while you can performance tweak to a particular player RC, there are no guarantees that those performance gains will persist in future releases. Any performance-sensitive code’s life span is only as good as the worst performing player (relatively speaking of course :)
At first reading about the interface boost I was excited to maybe refactor some code, but who in the hell knows if this will remain on FP 10.3? Ugggggg.
#5 by jackson on February 14th, 2011 ·
It’s true that performance changes from player version to player version (hence this series), but you have options (not an exhaustive list):
So, hopefully, you can pin down the performance of the player versions you actually support and then tune your app to those.
#6 by as3isolib on February 14th, 2011 ·
While I agree in theory, something such as architecting a performance-intensive platform around something as unstable as the performance of Flash Player is rather restrictive.
Case in point: the as3isolib.v2 core class structure – I decided to bypass using interfaces in favor of (at the time) more performant base implementation classes that would be subclassed by the developers. Architecting a platform that utilizes RT player detection can become unwieldy.
Your suggestion of limiting the player version support is probably the most pragmatic and simplest thing one could do and I just might do that for as3isolib.v2.
Again Jackson, you provide an abundance of great (and often frustrating) info. Thank you!
I hope Adobe is listening and taking notes.
#7 by jonathanasdf on February 15th, 2011 ·
Hmm… I wasn’t aware that 4.1.0.16076 could target 10.2!
Anyways, thanks for releasing this series. The information really is interesting. Though it seems like 10.2 is slower from the data so far, if you carefully look through the data the speed improvements are all in important places (interface methods, varargs) and the parts that are significantly slower aren’t used as often or have faster workarounds (though it sucks that Function objects are slower, but then again one wouldn’t have been using Function objects in any speed critical code in earlier versions either, so there really should be no problems from that).
And yeah varargs really surprised me. This could mean that a lot of built in methods (eg. array push, splice, etc) are going to get a major speed increase. Guess we’ll see whether or not this is the case in a future article.
#8 by skyboy on February 15th, 2011 ·
splice
might be slightly faster, but push/unshift and similar methods will not get a performance boost.They are implemented in C++ and short-circuit typical varargs logic and simply concat the passed Array to the end of the current Array. However, this does mean that my push1 and unshift1 methods for DenseMap may be completely obsolete.
#9 by ben w on February 16th, 2011 ·
just a quick question, have you benched access of complex objects vs say numbers for arrays/vectors/dics/lists?
be interesting to see.
also wondered about read speed of a Vector of Interfaces vs the read speed of they more complex implementers.
might be no difference at all but I have been wondering..might try it out.
#10 by jackson on February 16th, 2011 ·
I have some tests like that. Stay tuned for more articles in this series (the next one will be out on Monday).