Flash Player 10.2 Performance: Part 3
Today’s article is the conclusion of the series updating my performance tests in light of the newly-released Flash Player 10.2. If you haven’t read part one or part two, that would be a good place to start. If you already have, read on for the conclusion!
Test Environment
All tests in this performance update use the same environment:
- Flex SDK (MXMLC) 4.1.0.16076, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 10.1.102.64 or 10.2.152.26
- 2.8 Ghz Intel Xeon W3530
- Windows 7
Declaring Vectors
Environment | Cast | New | Scratch (Single Push) | Scratch (Many Push) | Scratch (Index) |
---|---|---|---|---|---|
Flash Player 10.1 | 780 | 265 | 530 | 281 | 281 |
Flash Player 10.2 | 718 | 263 | 534 | 285 | 258 |
Two of these—”cast” and “scratch index”—are significantly faster in 10.2. I find that the “cast” approach (i.e. v = Vector.
) is the most common approach for code written before Flex 4 was released with the new syntax (v = new
), so legacy code should get a good speedup.
Functional Methods
Every (all pass)
Environment | Method (Vector) | Manual (Vector) | Inline (Vector) | Method (Array) | Manual (Array) | Inline (Array) |
---|---|---|---|---|---|---|
Flash Player 10.1 | 1321 | 519 | 98 | 1180 | 651 | 99 |
Flash Player 10.2 | 1479 | 488 | 100 | 1348 | 552 | 97 |
Every (none pass)
Environment | Method (Vector) | Manual (Vector) | Inline (Vector) | Method (Array) | Manual (Array) | Inline (Array) |
---|---|---|---|---|---|---|
Flash Player 10.1 | 2 | 0 | 0 | 1 | 0 | 0 |
Flash Player 10.2 | 2 | 0 | 0 | 1 | 0 | 0 |
Filter (all pass)
Environment | Method (Vector) | Manual (Vector) | Inline (Vector) | Method (Array) | Manual (Array) | Inline (Array) |
---|---|---|---|---|---|---|
Flash Player 10.1 | 1922 | 1760 | 1429 | 2559 | 2263 | 1792 |
Flash Player 10.2 | 2311 | 2029 | 1648 | 2552 | 1918 | 1491 |
Filter (none pass)
Environment | Method (Vector) | Manual (Vector) | Inline (Vector) | Method (Array) | Manual (Array) | Inline (Array) |
---|---|---|---|---|---|---|
Flash Player 10.1 | 1423 | 443 | 0 | 1237 | 573 | 2 |
Flash Player 10.2 | 1547 | 446 | 1 | 1420 | 551 | 1 |
ForEach
Environment | Method (Vector) | Manual (Vector) | Inline (Vector) | Method (Array) | Manual (Array) | Inline (Array) |
---|---|---|---|---|---|---|
Flash Player 10.1 | 1319 | 463 | 99 | 1176 | 585 | 93 |
Flash Player 10.2 | 1535 | 467 | 97 | 1411 | 557 | 103 |
Map
Environment | Method (Vector) | Manual (Vector) | Inline (Vector) | Method (Array) | Manual (Array) | Inline (Array) |
---|---|---|---|---|---|---|
Flash Player 10.1 | 1604 | 587 | 229 | 2564 | 1870 | 1264 |
Flash Player 10.2 | 1913 | 805 | 418 | 2492 | 1736 | 1175 |
Some (all pass)
Environment | Method (Vector) | Manual (Vector) | Inline (Vector) | Method (Array) | Manual (Array) | Inline (Array) |
---|---|---|---|---|---|---|
Flash Player 10.1 | 1337 | 497 | 94 | 1174 | 663 | 92 |
Flash Player 10.2 | 1534 | 482 | 99 | 1351 | 561 | 98 |
Some (none pass)
Environment | Method (Vector) | Manual (Vector) | Inline (Vector) | Method (Array) | Manual (Array) | Inline (Array) |
---|---|---|---|---|---|---|
Flash Player 10.1 | 0 | 0 | 0 | 0 | 0 | 0 |
Flash Player 10.2 | 1 | 0 | 0 | 1 | 0 | 0 |
There is a lot of data in this test, but not much of it has changed dramatically. Across the board, it’s still really important for performance that you don’t use the functional methods of Array
and Vector
, but instead write your own loops. If you need a reference, check back to this article and look at the “inline” versions.
Conditionals Test
Environment | If-Else | Ternary | Switch | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 0 | 1 | 2 | 3 | 4 | 0 | 1 | 2 | 3 | 4 | |
Flash Player 10.1 | 360 | 388 | 427 | 483 | 484 | 351 | 391 | 439 | 474 | 475 | 424 | 454 | 483 | 519 | 542 |
Flash Player 10.2 | 310 | 330 | 343 | 358 | 359 | 309 | 327 | 328 | 375 | 375 | 409 | 410 | 484 | 501 | 521 |
There are some big speedups here! At the 4 level, If-else is 33% faster, ternary is 27% faster, and switch
is 4% faster. Assuming that performance-critical code is already not using switch
statements, these two larger gains are a big win!
Object Creation: Part II
Environment | Empty | 5 Properties | ||||||
---|---|---|---|---|---|---|---|---|
Curly Braces | New Operator | Object Cast | Curly Braces | New Operator (assign) | Object Cast (assign) | New Operator (with) | Object Cast (with) | |
Flash Player 10.1 | 17 | 16 | 15 | 42 | 62 | 62 | 352 | 340 |
Flash Player 10.2 | 17 | 12 | 12 | 42 | 57 | 59 | 473 | 469 |
The already-slow with
block is now even slower in Flash Player 10.2. On the bright side, everything else seems marginally faster!
Logical Operator Performance
Environment | All True | All False | ||
---|---|---|---|---|
Compound | Chain | Compound | Chain | |
Flash Player 10.1 | 817 | 421 | 564 | 293 |
Flash Player 10.2 | 655 | 392 | 501 | 274 |
All of these execute quicker in 10.2 than they did in 10.1. This is another big win since conditionals are, obviously, extremely commonplace. The biggest winner, with a 24% boost, is the “all true, compound” test, which was most desperately in need of a speedup. It’s still, unfortunately, slower than the equivalent test using the “chain” method, but at least a move in the right direction.
Cast Speed
Environment | Cast Succeeds | Cast Fails | ||
---|---|---|---|---|
Function Call Style | As Keyword | Function Call Style | As Keyword | |
Flash Player 10.1 | 29 | 34 | 7233 | 33 |
Flash Player 10.2 | 32 | 36 | 6056 | 35 |
While the slowest (by far) method is quicker due to quicker exceptions in 10.2, the others seem to have taken a minor speed hit.
Operator Speed
Due to some issues with the original test pointed out by skyboy, I’ll be doing an update article separately.
Accessing Objects
Flash Player 10.1
Operator | Array | Vector | Object | Dictionary | Instance | |||||
---|---|---|---|---|---|---|---|---|---|---|
Hit | Miss | Hit | Miss | Hit | Miss | Hit | Miss | Hit | Miss | |
In | 89 | 173 | 103 | 146 | 68 | 81 | 83 | 109 | 48 | 91 |
Index | 10 | 51 | 8 | 1597 | 72 | 77 | 78 | 96 | 60 | 1643 |
Dot | n/a | n/a | n/a | n/a | 28 | 38 | 28 | 49 | 6 | n/a |
hasOwnProperty | 79 | 96 | 89 | 91 | 80 | 77 | 92 | 95 | 74 | 70 |
Flash Player 10.2
Operator | Array | Vector | Object | Dictionary | Instance | |||||
---|---|---|---|---|---|---|---|---|---|---|
Hit | Miss | Hit | Miss | Hit | Miss | Hit | Miss | Hit | Miss | |
In | 10 | 26 | 10 | 10 | 70 | 87 | 79 | 118 | 47 | 100 |
Index | 8 | 45 | 9 | 1271 | 69 | 81 | 78 | 105 | 61 | 1329 |
Dot | n/a | n/a | n/a | n/a | 26 | 41 | 31 | 57 | 6 | n/a |
hasOwnProperty | 98 | 91 | 95 | 86 | 85 | 78 | 106 | 84 | 82 | 69 |
The in
operator is massively faster on Array
and Vector
in 10.2! Even when missing (i.e. the key doesn’t exist) there is a 6-15x performance boost. Aside from that, there are some other minor changes (perhaps most notably the slowdown for hasOwnProperty
on an Array
), but nothing much.
Holding DisplayObjects
Flash Player 10.1
Collection | Index | Search (best case) | Search (worst case) |
---|---|---|---|
Array | 6 | 20 | 2571 |
Vector | 6 | 20 | 2495 |
Sprite | 23 | 23 | 2448 |
MovieClip | 26 | 24 | 2369 |
QuickSpriteArray | 6 | 22 | 2610 |
QuickSpriteVector | 5 | 19 | 2520 |
Flash Player 10.2
Collection | Index | Search (best case) | Search (worst case) |
---|---|---|---|
Array | 5 | 18 | 2376 |
Vector | 6 | 18 | 2383 |
Sprite | 24 | 24 | 2327 |
MovieClip | 23 | 24 | 2303 |
QuickSpriteArray | 6 | 19 | 2373 |
QuickSpriteVector | 5 | 20 | 2372 |
Searching seems faster across the board, but not by much. Other measures are unchanged since 10.1.
The Const Keyword
Environment | Var | Const | Define |
---|---|---|---|
Flash Player 10.1 | 218 | 218 | 238 |
Flash Player 10.2 | 205 | 205 | 238 |
Local variable access—var
or const
—is faster by about 6%, but literal value access (in this case via compile-time defines) is unchanged.
Calling Functions
Flash Player 10.1
Approach | Function Object | Method | Static Method |
---|---|---|---|
() | 214 | 60 | 63 |
call(null) | 254 | 623 | 624 |
call(this) | 252 | 909 | 902 |
apply(null) | 322 | 687 | 692 |
apply(this) | 318 | 980 | 996 |
Flash Player 10.2
Approach | Function Object | Method | Static Method |
---|---|---|---|
() | 258 | 63 | 72 |
call(null) | 313 | 689 | 702 |
call(this) | 311 | 983 | 999 |
apply(null) | 385 | 733 | 748 |
apply(this) | 357 | 1077 | 1061 |
Here we see all three types of functions are slower in 10.2. The relative order has not changed though: methods are still fastest, static methods follow them, and Function
objects are by far the slowest. At least for regular calls. As soon as you turn to the call
or apply
methods, Function
objects suddenly dominate by a factor of about 2-3x.
Implicit Type Conversion
Flash Player 10.1
int | uint | Number | Boolean | |
---|---|---|---|---|
int=… | 0 | 0 | 219 | n/a |
uint=… | 5 | 4 | 244 | n/a |
Number=… | 0 | 84 | 0 | n/a |
Boolean=… | 36 | 33 | 366 | 0 |
Flash Player 10.2
int | uint | Number | Boolean | |
---|---|---|---|---|
int=… | 0 | 0 | 228 | n/a |
uint=… | 0 | 1 | 230 | n/a |
Number=… | 3 | 98 | 0 | n/a |
Boolean=… | 29 | 30 | 241 | 0 |
Boolean = Number
has had its performance improved somewhat, but otherwise not much has changed in 10.2.
Definitive isNaN()
Flash Player 10.1
Algorithm | Function Call | Inline | ||
---|---|---|---|---|
isNaN | 1519 | 1516 | n/a | n/a |
Old Algorithm | 893 | 889 | 203 | 206 | New Algorithm | 787 | 781 | 207 | 209 |
Flash Player 10.2
Algorithm | Function Call | Inline | ||
---|---|---|---|---|
isNaN | 210 | 199 | n/a | n/a |
Old Algorithm | 847 | 841 | 203 | 187 | New Algorithm | 764 | 765 | 218 | 203 |
The built-in isNaN
function has been sped up to such an amazing degree that it’s now on par with inlining the “new algorithm”! There is definitely no need to do your own version of isNaN
anymore: simply opt for the more readable built-in version.
Default Arguments
Flash Player 10.1
Num Params | Required | Default |
---|---|---|
1 | 613 | 582 |
2 | 575 | 566 |
3 | 600 | 641 |
4 | 650 | 699 |
5 | 670 | 728 |
6 | 701 | 790 |
7 | 761 | 996 |
8 | 772 | 1033 |
9 | 881 | 1161 |
10 | 856 | 1106 |
Flash Player 10.2
Num Params | Required | Default |
---|---|---|
1 | 590 | 582 |
2 | 590 | 614 |
3 | 624 | 580 |
4 | 657 | 712 |
5 | 666 | 777 |
6 | 754 | 871 |
7 | 740 | 937 |
8 | 772 | 1018 |
9 | 828 | 1032 |
10 | 869 | 1164 |
Unlike the huge var args speedup, there doesn’t seem to have been any speedup regarding default arguments: they are still slower than regular arguments, especially at high numbers.
Typecasting: Part 3
Flash Player 10.1
Type | AS3 | Native | ||
---|---|---|---|---|
Success | Failure | Success | Failure | |
Function Call | 11 | 7384 | 116 | 7485 |
As Keyword | 12 | 10 | 11 | 12 |
Flash Player 10.2
Type | AS3 | Native | ||
---|---|---|---|---|
Success | Failure | Success | Failure | |
Function Call | 12 | 6065 | 120 | 6228 |
As Keyword | 10 | 0 | 10 | 0 |
The only significant change here is that the as
keyword is now almost instantaneous when it fails. It is still the recommended cast if there is even a remote chance that the cast can fail and always the recommended cast for built-in classes implemented in native code.
Conclusion
This concludes the performance comparisons between Flash Player 10.1 and Flash Player 10.2. I’ve updated 26 articles worth of tests that run the gamut of Flash Player functionality. Everything from Shape
and Sprite
to local variables and function calls has been tested in this three part series. So how does Flash Player 10.2 perform overall? Well, as we’ve seen from all of these tests, there have been some performance increases, some performance decreases, and quite a few with no change at all. Let’s break it down into a table of rough performance change categories:
Method | Speed Change | Importance | Notes |
---|---|---|---|
Object allocation | 30% slower | Major | Free lists still 3x faster |
Function Objects | 25% slower | Major | Affects as3signals, TurboSignals, callbacks, etc. |
Field Access | 44% slower | Major | |
Logical Operators | 4-25% faster | Major | |
Conditionals | 4-33% faster | Major | |
Calling Functions | 10-30% slower | Major | |
For-in Loops | 10% slower | Average | |
XML | 19% faster | Average | |
Interface Methods | 2x faster | Average | |
Getters and Setters | 11% faster | Average | |
Var Args | 15-30x faster | Average | |
Declaring Vectors | 10% faster | Average | |
Local Variables | 6% faster | Average | |
Namespaces | 15% faster | Minor | |
Sorting Vectors | 20% faster | Minor | Array.sortOn still the fastest by far |
Deleting Object Properties | 25% faster | Minor | |
Local Variables Declared Last | 5% faster | Minor | |
With Blocks | 30% slower | Minor | |
in Operator |
6-15x faster | Minor | |
Built-in isNaN Function |
7x faster | Minor | Equivalent speed available by inline (val!=val ) |
Among the tests I marked “major”, four are slower and two are faster, each by amounts in the double digits. In the “average” category, six are faster and one is slower. So is Flash Player 10.2 faster or slower than Flash Player 10.1? With results as varied as these, the answer is highly dependent on your particular application. Is your bottleneck on the speed of calling Function
objects (e.g. as3signals)? Your code will now run slower. Is your bottleneck in more nuts-and-bolts conditionals and logical operators? Your code will now run faster. For most AS3 programmers, I’d guess that over they will see a net performance loss with Flash Player 10.2.
#1 by Vic on February 28th, 2011 ·
I love you man.
#2 by Elliot Chong on February 28th, 2011 ·
Great insights, thank you for compiling such an extensive benchmark!
#3 by skyboy on February 28th, 2011 ·
The isNaN speed up seems like a bit of fowl play to me, if function calls can be so fast they execute in the same amount of time as !=, then why don’t all function calls execute this fast? However, the conditionals and
as
speedups are nice to see.Though in the last couple days I’ve split off DenseMap’s sorting routines into a new class with a couple generic-ized versions that can sort any Object that has numeric indexes and a
length
property as an alternative forArray::sortOn
. While normally slower when randomly sorted with no null values, pre-sorted is often the same or faster on machines with smaller cache sizes. The larger the cache, the better Adobe’s method performs, while the opposite is true for my code; So odd how they’re different.However, Adobe’s method takes major performance hits when you introduce null (and possibly all the same number elements) elements, getting as far as performing at O(n^2) (which can take up to several hours to complete, locking flash player at the same time) while my method performs at O(n log n).
As an example to reinforce how bad it is:
The first result for Array is the sortOn method, and the remaining tests are a call to
fastSort
; which doubles as both sortOn and sort based on the second parameter’s type.#4 by jackson on February 28th, 2011 ·
I too thought it a little fishy how fast
isNaN
runs now. Perhaps I should test some other top-level functions to see if this optimization has been done for them…Glad to see your DenseMap and sorting is going well. Those results are truly fast! It’s great to have such an alternative for heavy sorting loads (3D engine polygon sorting?). Not to get too off-topic, but do you think you’ll ever set up a source code repository on Google Code, GitHub, or the like?
#5 by skyboy on February 28th, 2011 ·
The only problem with the results is that they’re only so competitive on systems where 3D is virtually impossible to have at high frame rates(or is this a plus?). Systems better suited, and with larger cache sizes show dramatically improved performance for Adobe’s method (excluding nulls, those still kill it). Unsorted running only 18% longer than sorted, while my method is 30% slower on sorted and 60% slower on unsorted.
My method also doesn’t return results exactly the same as Array’s method all the time either: on numeric sort, it’s a perfect match, but on string sort, nulls, NaN and undefined aren’t in the same order as how Array returns them. Personally, I think Array should be consistent with those values like my method is (they’re actually pre-sorted to not only get them in the right order, but to avoid the above O(n^2) running time).
And I have indeed set up a GitHub repository for the various things I create: https://github.com/skyboy/AS3-Utilities/tree/master/skyboy/
Though, not many of the classes are as useful as I’d like. Hopefully with time, I’ll get more useful classes like DenseMap.
#6 by skyboy on March 4th, 2011 ·
An idea occurred to me related to switch statements.
Testing a haXe compiled switch with the others, or an Apparat switch, and combing it with Apparat’s macro inlining to increase the number of tests per loop while keeping it legible. The speed up of not using if statements and just the lookupswitch command could be more dramatic in an unrolled loop.
Especially around 15-ish cases, perhaps to the point where not using it in those cases would almost be a joke.
#7 by jackson on March 4th, 2011 ·
Really, MXMLC should just detect when a
lookupswitch
can be used directly and do it rather than always doing some kind of pointless hybrid mode. It’s good to hear that rawlookupswitch
is fast though since that means a future MXMLC can do just that optimization. Can you post bytecode for your haXe and Apparat versions?#8 by skyboy on March 4th, 2011 ·
Unfortunately the only place I have either is in my JSON class, and the sheer amount of bytecode in it causes nearly every decompiler to crash (even Nemo440). The best I can manage currently is some Apparat ASM: