ASC 2.0 Conditionals Performance
Surprisingly, some interesting things have been happening with conditionals like if-else
in AS3. First, a brand new AS3 compiler—ASC 2.0—has been released with the promise that it’ll generate more efficient bytecode. Second, some readers have pointed out the existence of a new (to me) technique: the “if-else tree”. Today’s article takes a look at just what that is and tests it against the classic options: if-else
, the ternary (? :
) operator, and the switch
statement. Which will be fastest?
First, let’s talk about the newcomer: “if-else trees”. The idea is to replace “if-else ladders” that do N checks for N possible values with a version that does only log(N) checks. Here’s the classic “if-else ladder” for 0-3:
if (val == 0) { // code for 0 } else if (val == 1) { // code for 1 } else if (val == 2) { // code for 2 } else if (val == 3) { // code for 3 }
The worst-case scenario is when val
is the last case you check for. For N values, that takes N-1 if
statements. For 4 values like above, that’s 3 if
statements. For 20 values, that’s 19 if
statements. In contrast, let’s look at an “if-else tree” for 0-3:
if (val < 2) { if (val == 0) { // code for 0 } else { // code for 1 } } else { if (val == 2) { // code for 2 } else { // code for 3 } }
If you have more values to check for, you simply nest more if-else
statements. For 4 values, there’s only a maximum of 2 if
statements that need to execute. That’s a 33% savings on branching instructions, but it gets much better if you have more values. For 20 values, you only have a maximum of 5 if
statements instead of 19
. That adds up to a huge savings!
So with that in mind, here’s an updated version of the test in Conditionals Performance Revisited that incorporates “if-else trees”:
package { import flash.text.TextField; import flash.utils.getTimer; import flash.display.Sprite; public class IfElseTrees extends Sprite { public function IfElseTrees() { test(); } private function test(): void { var beforeTime:int; var afterTime:int; var i:int; const ITERATIONS:int = 10000000; var results:TextField = new TextField(); results.width = stage.stageWidth; results.height = stage.stageHeight; results.text = "Iterations,If-Else Ladder,If-Else Tree,Ternary,Switch\n"; for (var val:int = 0; val < 20; ++val) { results.appendText(val+","); beforeTime = getTimer(); for (i = 0; i < ITERATIONS; ++i) { if (val == 0) func0(); else if (val == 1) func1(); else if (val == 2) func2(); else if (val == 3) func3(); else if (val == 4) func4(); else if (val == 5) func5(); else if (val == 6) func6(); else if (val == 7) func7(); else if (val == 8) func8(); else if (val == 9) func9(); else if (val == 10) func10(); else if (val == 11) func11(); else if (val == 12) func12(); else if (val == 13) func13(); else if (val == 14) func14(); else if (val == 15) func15(); else if (val == 16) func16(); else if (val == 17) func17(); else if (val == 18) func18(); else func19(); } afterTime = getTimer(); results.appendText((afterTime-beforeTime)+","); beforeTime = getTimer(); for (i = 0; i < ITERATIONS; ++i) { if (val < 10) { if (val < 5) { if (val < 2) { if (val == 0) { func0(); } else { func1(); } } else { if (val == 2) { func2(); } else if (val == 3) { func3(); } else { func4(); } } } else { if (val < 7) { if (val == 5) { func5(); } else { func6(); } } else { if (val == 7) { func7(); } else if (val == 8) { func8(); } else { func9(); } } } } else { if (val < 15) { if (val < 12) { if (val == 10) { func10(); } else { func11(); } } else { if (val == 12) { func12(); } else if (val == 13) { func13(); } else { func14(); } } } else { if (val < 17) { if (val == 15) { func15(); } else { func16(); } } else { if (val == 17) { func17(); } else if (val == 18) { func18(); } else { func19(); } } } } } afterTime = getTimer(); results.appendText((afterTime-beforeTime)+","); beforeTime = getTimer(); for (i = 0; i < ITERATIONS; ++i) { val == 0 ? func0() : val == 1 ? func1() : val == 2 ? func2() : val == 3 ? func3() : val == 4 ? func4() : val == 5 ? func5() : val == 6 ? func6() : val == 7 ? func7() : val == 8 ? func8() : val == 9 ? func9() : val == 10 ? func10() : val == 11 ? func11() : val == 12 ? func12() : val == 13 ? func13() : val == 14 ? func14() : val == 15 ? func15() : val == 16 ? func16() : val == 17 ? func17() : val == 18 ? func18() : func19(); } afterTime = getTimer(); results.appendText((afterTime-beforeTime)+","); beforeTime = getTimer(); for (i = 0; i < ITERATIONS; ++i) { switch (val) { case 0: func0(); break; case 1: func1(); break; case 2: func2(); break; case 3: func3(); break; case 4: func4(); break; case 5: func5(); break; case 6: func6(); break; case 7: func7(); break; case 8: func8(); break; case 9: func9(); break; case 10: func10(); break; case 11: func11(); break; case 12: func12(); break; case 13: func13(); break; case 14: func14(); break; case 15: func15(); break; case 16: func16(); break; case 17: func17(); break; case 18: func18(); break; default: func19(); } } afterTime = getTimer(); results.appendText((afterTime-beforeTime)+"\n"); } addChild(results); } private function func0(): void{} private function func1(): void{} private function func2(): void{} private function func3(): void{} private function func4(): void{} private function func5(): void{} private function func6(): void{} private function func7(): void{} private function func8(): void{} private function func9(): void{} private function func10(): void{} private function func11(): void{} private function func12(): void{} private function func13(): void{} private function func14(): void{} private function func15(): void{} private function func16(): void{} private function func17(): void{} private function func18(): void{} private function func19(): void{} } }
I ran this test in the following environment:
- Release version of Flash Player 11.6.602.180
- 2.3 Ghz Intel Core i7
- Mac OS X 10.8.2
With two compilers, both in release mode (no debugging or verbose stack traces):
- ASC 1.0: Flex SDK 4.6.0.23201
- ASC 2.0: build 352231
And here are the results I got:
In ASC 1.0, the new “if-else tree” approach beats the competition out by taking a relatively constant amount of time regardless of what the value being checked for was. In contrast, “if-else ladders”, ternary (? :
) operators, and even switch
statements all scaled up in time linearly as the value was checked for later and later in each. This is a clear win for the “if-else tree” to the extent that it’s about twice as fast when the value is 20.
In ASC 2.0, things change dramatically. The ternary operator is horrifically slow now and taking well over 700 milliseconds when the value is 20 compared to only 200 milliseconds with ASC 1.0. It’s so slow that it dominates the graph and I needed to cut it out to see what’s happening with the others. If you look at the second ASC 2.0 chart (with ternary removed), you can see some details emerge. The “if-else ladder” is still performing linearly as it uses more and more time as the value gets toward the end of the if
checks. On the other hand, “if-else ladders” are still performing in roughly constant time regardless of the value. The switch
statement has also changed its performance profile and is now also running in roughly constant time in sharp contrast to the linear time it used to run in.
So, if you want a constant-time approach regardless of compiler you should use an “if-else tree”. If you’re only using ASC 2.0, it’s OK to use a switch
statement instead. Actually, most people would probably consider it a “cleaner” option for code readability and it certainly requires less typing. However, you’ll suffer massively on performance if your code ever gets compiled with ASC 1.0.
So why is the switch
statement so much faster now? The answer lies, of course, in the bytecode generated by each compiler. Here’s what ASC 1.0 puts out:
474 pushbyte 0 475 setlocal3 476 jump bb199 bb135 succs=[bb156] 477 label 478 jump bb156 bb136 succs=[bb198] 479 label 480 getlocal0 481 callpropvoid 482 jump bb198 bb137 succs=[bb198] 483 label 484 getlocal0 485 callpropvoid 486 jump bb198 bb138 succs=[bb198] 487 label 488 getlocal0 489 callpropvoid 490 jump bb198 bb139 succs=[bb198] 491 label 492 getlocal0 493 callpropvoid 494 jump bb198 bb140 succs=[bb198] 495 label 496 getlocal0 497 callpropvoid 498 jump bb198 bb141 succs=[bb198] 499 label 500 getlocal0 501 callpropvoid 502 jump bb198 bb142 succs=[bb198] 503 label 504 getlocal0 505 callpropvoid 506 jump bb198 bb143 succs=[bb198] 507 label 508 getlocal0 509 callpropvoid 510 jump bb198 bb144 succs=[bb198] 511 label 512 getlocal0 513 callpropvoid 514 jump bb198 bb145 succs=[bb198] 515 label 516 getlocal0 517 callpropvoid 518 jump bb198 bb146 succs=[bb198] 519 label 520 getlocal0 521 callpropvoid 522 jump bb198 bb147 succs=[bb198] 523 label 524 getlocal0 525 callpropvoid 526 jump bb198 bb148 succs=[bb198] 527 label 528 getlocal0 529 callpropvoid 530 jump bb198 bb149 succs=[bb198] 531 label 532 getlocal0 533 callpropvoid 534 jump bb198 bb150 succs=[bb198] 535 label 536 getlocal0 537 callpropvoid 538 jump bb198 bb151 succs=[bb198] 539 label 540 getlocal0 541 callpropvoid 542 jump bb198 bb152 succs=[bb198] 543 label 544 getlocal0 545 callpropvoid 546 jump bb198 bb153 succs=[bb198] 547 label 548 getlocal0 549 callpropvoid 550 jump bb198 bb154 succs=[bb198] 551 label 552 getlocal0 553 callpropvoid 554 jump bb198 bb155 succs=[bb198] 555 label 556 getlocal0 557 callpropvoid 558 jump bb198 bb156 succs=[bb158,bb157] 559 getlocal 6 560 setlocal 7 561 pushbyte 0 562 getlocal 7 563 ifstrictne bb158 bb157 succs=[bb197] 564 pushbyte 0 565 jump bb197 bb158 succs=[bb160,bb159] 566 pushbyte 1 567 getlocal 7 568 ifstrictne bb160 bb159 succs=[bb197] 569 pushbyte 1 570 jump bb197 bb160 succs=[bb161,bb162] 571 pushbyte 2 572 getlocal 7 573 ifstrictne bb162 bb161 succs=[bb197] 574 pushbyte 2 575 jump bb197 bb162 succs=[bb163,bb164] 576 pushbyte 3 577 getlocal 7 578 ifstrictne bb164 bb163 succs=[bb197] 579 pushbyte 3 580 jump bb197 bb164 succs=[bb166,bb165] 581 pushbyte 4 582 getlocal 7 583 ifstrictne bb166 bb165 succs=[bb197] 584 pushbyte 4 585 jump bb197 bb166 succs=[bb167,bb168] 586 pushbyte 5 587 getlocal 7 588 ifstrictne bb168 bb167 succs=[bb197] 589 pushbyte 5 590 jump bb197 bb168 succs=[bb169,bb170] 591 pushbyte 6 592 getlocal 7 593 ifstrictne bb170 bb169 succs=[bb197] 594 pushbyte 6 595 jump bb197 bb170 succs=[bb171,bb172] 596 pushbyte 7 597 getlocal 7 598 ifstrictne bb172 bb171 succs=[bb197] 599 pushbyte 7 600 jump bb197 bb172 succs=[bb173,bb174] 601 pushbyte 8 602 getlocal 7 603 ifstrictne bb174 bb173 succs=[bb197] 604 pushbyte 8 605 jump bb197 bb174 succs=[bb176,bb175] 606 pushbyte 9 607 getlocal 7 608 ifstrictne bb176 bb175 succs=[bb197] 609 pushbyte 9 610 jump bb197 bb176 succs=[bb177,bb178] 611 pushbyte 10 612 getlocal 7 613 ifstrictne bb178 bb177 succs=[bb197] 614 pushbyte 10 615 jump bb197 bb178 succs=[bb180,bb179] 616 pushbyte 11 617 getlocal 7 618 ifstrictne bb180 bb179 succs=[bb197] 619 pushbyte 11 620 jump bb197 bb180 succs=[bb182,bb181] 621 pushbyte 12 622 getlocal 7 623 ifstrictne bb182 bb181 succs=[bb197] 624 pushbyte 12 625 jump bb197 bb182 succs=[bb183,bb184] 626 pushbyte 13 627 getlocal 7 628 ifstrictne bb184 bb183 succs=[bb197] 629 pushbyte 13 630 jump bb197 bb184 succs=[bb186,bb185] 631 pushbyte 14 632 getlocal 7 633 ifstrictne bb186 bb185 succs=[bb197] 634 pushbyte 14 635 jump bb197 bb186 succs=[bb188,bb187] 636 pushbyte 15 637 getlocal 7 638 ifstrictne bb188 bb187 succs=[bb197] 639 pushbyte 15 640 jump bb197 bb188 succs=[bb190,bb189] 641 pushbyte 16 642 getlocal 7 643 ifstrictne bb190 bb189 succs=[bb197] 644 pushbyte 16 645 jump bb197 bb190 succs=[bb192,bb191] 646 pushbyte 17 647 getlocal 7 648 ifstrictne bb192 bb191 succs=[bb197] 649 pushbyte 17 650 jump bb197 bb192 succs=[bb193,bb194] 651 pushbyte 18 652 getlocal 7 653 ifstrictne bb194 bb193 succs=[bb197] 654 pushbyte 18 655 jump bb197 bb194 succs=[bb196] 656 jump bb196 bb195 succs=[bb197] 657 pushbyte 19 658 jump bb197 bb196 succs=[bb197] 659 pushbyte 19 bb197 succs=[bb138,bb141,bb198,bb152,bb151,bb147,bb143,bb148,bb153,bb139,bb136,bb140,bb145,bb137,bb142,bb150,bb146,bb155,bb154,bb149,bb144] 660 kill 7 661 lookupswitch default: bb155 maxcase: 20 bb136 bb137 bb138 bb139 bb140 bb141 bb142 bb143 bb144 bb145 bb146 bb147 bb148 bb149 bb150 bb151 bb152 bb153 bb154 bb155
Notice that it essentially implements the switch
statement with an “if-else ladder”: a series of ifstrictle
opcodes, one per case
label. The lookupswitch
opcode comes at the end after all of these, essentially negating it as a way to improve performance. Now let’s look at the bytecode generated by ASC 2.0:
486 pushbyte 0 487 setlocal 6 488 jump bb175 bb153 succs=[bb155,bb159,bb164,bb169,bb163,bb158,bb160,bb170,bb162,bb173,bb156,bb166,bb168,bb165,bb154,bb171,bb167,bb172,bb161,bb157] 489 label 490 getlocal1 491 convert_i 492 lookupswitch default: bb173 maxcase: 19 bb154 bb155 bb156 bb157 bb158 bb159 bb160 bb161 bb162 bb163 bb164 bb165 bb166 bb167 bb168 bb169 bb170 bb171 bb172 bb154 succs=[bb174] 493 findpropstrict func0 494 callpropvoid 495 jump bb174 bb155 succs=[bb174] 496 findpropstrict func1 497 callpropvoid 498 jump bb174 bb156 succs=[bb174] 499 findpropstrict func2 500 callpropvoid 501 jump bb174 bb157 succs=[bb174] 502 findpropstrict func3 503 callpropvoid 504 jump bb174 bb158 succs=[bb174] 505 findpropstrict func4 506 callpropvoid 507 jump bb174 bb159 succs=[bb174] 508 findpropstrict func5 509 callpropvoid 510 jump bb174 bb160 succs=[bb174] 511 findpropstrict func6 512 callpropvoid 513 jump bb174 bb161 succs=[bb174] 514 findpropstrict func7 515 callpropvoid 516 jump bb174 bb162 succs=[bb174] 517 findpropstrict func8 518 callpropvoid 519 jump bb174 bb163 succs=[bb174] 520 findpropstrict func9 521 callpropvoid 522 jump bb174 bb164 succs=[bb174] 523 findpropstrict func10 524 callpropvoid 525 jump bb174 bb165 succs=[bb174] 526 findpropstrict func11 527 callpropvoid 528 jump bb174 bb166 succs=[bb174] 529 findpropstrict func12 530 callpropvoid 531 jump bb174 bb167 succs=[bb174] 532 findpropstrict func13 533 callpropvoid 534 jump bb174 bb168 succs=[bb174] 535 findpropstrict func14 536 callpropvoid 537 jump bb174 bb169 succs=[bb174] 538 findpropstrict func15 539 callpropvoid 540 jump bb174 bb170 succs=[bb174] 541 findpropstrict func16 542 callpropvoid 543 jump bb174 bb171 succs=[bb174] 544 findpropstrict func17 545 callpropvoid 546 jump bb174 bb172 succs=[bb174] 547 findpropstrict func18 548 callpropvoid 549 jump bb174 bb173 succs=[bb174] 550 findpropstrict func19 551 callpropvoid bb174 succs=[bb175] 552 inclocal_i 6 bb175 succs=[bb153,bb176] 553 getlocal 6 554 pushint 555 iflt bb153 bb176
Here the lookupswitch
opcode comes at the beginning and jumps straight away to the appropriate block of code. This is exactly what the opcode is intended for: a quick jump depending on what the value of val
is. This yields the constant-time performance we see in the graph above for ASC 2.0 and restores switch
to its rightful place as the fastest and (arguably) cleanest approach available.
Spot a bug? Have a question or suggestion? Post a comment!
#1 by Ajay on March 18th, 2013 ·
in following line
So, if you want a constant-time approach regardless of compiler you should use an “if-else ladderâ€
Shouldn’t it be “if-else tree”, or am I missing something?
#2 by jackson on March 18th, 2013 ·
Thanks for pointing this out. It was a typo and I’ve now fixed it in the article
#3 by henke37 on March 18th, 2013 ·
So, does this hold true even if you have a non constant expression for the case label?
#4 by jackson on March 18th, 2013 ·
I don’t know, but it sounds like an interesting follow-up test.
#5 by Pierre on March 18th, 2013 ·
Very nice performance tip! Never thought about this.
If you were checking some other type, like… Strings for example, could some similar “if-else tree” technique be used on that? That would probably involve deciding how to split up the String values, either alphabetically and/or by String length, I imagine?
Example:
//Ladder technique:
if(theString==”alpha”) { func0(); }
else if(theString==”beta”) { func1(); }
…
else if(theString==”zebra”) { func25(); }
//Tree technique:
if(theString.charCodeAt(0) < 108) {// char "L" in lowercase, sorta halfway point…
// … either break it down into smaller branches or do sub-ladders…
} else {
// … either break it down into smaller branches or do sub-ladders…
}
Could that be one approach basically?
#6 by jackson on March 18th, 2013 ·
You might be able to adapt it to
String
that way, but you’d need to be very careful about function calls. ThecharCodeAt
calls are going to be very expensive, so there’s probably a sizable overhead to this approach. Still, it’s probably worth trying out to see some objective numbers and know for sure.#7 by skyboy on March 18th, 2013 ·
The charCodeAt calls are actually extremely cheap, compared to a string comparison; the JIT may inline it or perform some other boost, as it’s just a bounds check and an array index operation: constant time.
That particular method is what I use in my JSON decoder, which even with ASC1.0 can outperform the native decoder: whether or not it will be a performance hit there is a complete tossup, and only benchmarking can be used to decide on the best option.
#8 by ben w on March 19th, 2013 ·
didn’t know that! learn something every day, goes to show that one should NEVER assume.
#9 by jackson on March 19th, 2013 ·
I wouldn’t have expected that at all. I would think
charCodeAt
would be just as expensive as any other function call. Is it getting preferential treatment by the compiler, the JIT, or the VM? This definitely warrants a follow-up test.#10 by skyboy on March 19th, 2013 ·
It would have to be the JIT if it is getting inlined, the compiler treats it as a normal function call as far as bytecode goes, and the VM obeys the bytecode passed to it. Though being a static call, it’s pretty fast on top of being very simple: the biggest advantage is the string equality/comparison code is very generic, and thus slow.
For the specific use case in my decoder, the playing field is leveled by the switch conditions in the native code being very sparse: the C++ compiler may be making it a hash index or if/else comparison instead of an array index in preference of reducing memory usage for a relatively minor speed penalty. When the data being parsed is very small, the native code very likely beats mine, but the execution happens too fast to be measurable, while on large data sets the minor penalty adds up.
I hate to keep plugging my JSON class, but I’ve applied literally every speed tweak I can to it over time, to significant effect (and in excess of 1,000 lines of code).
#11 by Stephen Cheung on March 18th, 2013 ·
Great article! Though ternary is the worst in ASC 2.0, but I think ppl seldom use ternary in this way.
#12 by Peter Henry on March 18th, 2013 ·
Perhaps a little outside the goals of the test but I’d be very interested to see the comparison against using a map. Perhaps and Array/Vector for int indices specifically and/or Object/Dictionary for strings. eg.
var functionMap:Object = {
0 : func0,
1 : func1,
// …
19 : func19,
}
for (var val:int = 0; val < 20; ++val)
{
for (i = 0; i < ITERATIONS; ++i)
{
functionMap[val]();
}
}
#13 by jackson on March 18th, 2013 ·
That’s a very good idea for a test. It’s certainly a more limiting approach than an
Object
or aDictionary
that can handle any key, but perhaps there’s a performance advantage for certain cases. I’ll check it out.#14 by ben w on March 19th, 2013 ·
Hi Jackson, thanks for sharing you interesting finds once again! I really think with these tests though you should expose us a button that we can click to run the tests on our machines right here in the browser then maybe spit the results into a HTML div or something though the ExternalInterface api.
That way we can really easily feedback similar results, or if we find something different due do out varying hardware/software. (be sure to make it obvious to those using debug players if you end up doing something like that)
:)
#15 by jackson on March 19th, 2013 ·
I usually only do this when there’s more interesting results than a huge table of data, but I can see how it would save you some time downloading the code and compiling it. I added a “Launch the test app” link to the article just under the code and will try to do the same in the future. As for warning debug players, I try to keep the tests as simple as possible to make them easier to understand and remove any possible side-effects of other features. That said, I’ll still think about adding this as a standard feature since it does skew the results quite badly in most cases.
Thanks for the ideas!
#16 by ryzed on March 19th, 2013 ·
Hi, again.
Want ask to make more variants, like 50-60, if you will test it again.
I’m sure, “nested ifs” will beat “switch” with this changes :)
#17 by JNT on April 2nd, 2013 ·
Thanks for the almost scientific analysis! I really appreciate the performance tips on this site, but isn’t it Adobe’s homework to build strong compilers and VMs (compare JS and also Java number crunching performance e.g.) instead of forcing the devs to search for every little speed hack? Ultimately, this leads to bad programming style, such as using these ‘if-else-trees’ that shouldn’t exist at all, because that’s what the switch statement is for when comparing a single value. And nesting (!) so many ternary operators seems strange to me, at least in AS3. Whatever, thanks for revealing the (still huge?) potential regarding compiler optimizations.
#18 by jackson on April 2nd, 2013 ·
It would definitely be nice if fewer hacks were necessary to get maximum performance, but the last time I tested AS3 vs. JavaScript was in November 2011 and Flash Player’s performance was still on-par with the best of the browsers, so AS3 can hold its own on that front at least. Java and native code (C/C++/ObjC) are another story and one that I haven’t directly tested yet.