ASC 2.0 Conditionals Performance
Surprisingly, some interesting things have been happening with conditionals like if-else in AS3. First, a brand new AS3 compiler—ASC 2.0—has been released with the promise that it’ll generate more efficient bytecode. Second, some readers have pointed out the existence of a new (to me) technique: the “if-else tree”. Today’s article takes a look at just what that is and tests it against the classic options: if-else, the ternary (? :) operator, and the switch statement. Which will be fastest?
First, let’s talk about the newcomer: “if-else trees”. The idea is to replace “if-else ladders” that do N checks for N possible values with a version that does only log(N) checks. Here’s the classic “if-else ladder” for 0-3:
if (val == 0) { // code for 0 } else if (val == 1) { // code for 1 } else if (val == 2) { // code for 2 } else if (val == 3) { // code for 3 }
The worst-case scenario is when val is the last case you check for. For N values, that takes N-1 if statements. For 4 values like above, that’s 3 if statements. For 20 values, that’s 19 if statements. In contrast, let’s look at an “if-else tree” for 0-3:
if (val < 2) { if (val == 0) { // code for 0 } else { // code for 1 } } else { if (val == 2) { // code for 2 } else { // code for 3 } }
If you have more values to check for, you simply nest more if-else statements. For 4 values, there’s only a maximum of 2 if statements that need to execute. That’s a 33% savings on branching instructions, but it gets much better if you have more values. For 20 values, you only have a maximum of 5 if statements instead of 19. That adds up to a huge savings!
So with that in mind, here’s an updated version of the test in Conditionals Performance Revisited that incorporates “if-else trees”:
package { import flash.text.TextField; import flash.utils.getTimer; import flash.display.Sprite; public class IfElseTrees extends Sprite { public function IfElseTrees() { test(); } private function test(): void { var beforeTime:int; var afterTime:int; var i:int; const ITERATIONS:int = 10000000; var results:TextField = new TextField(); results.width = stage.stageWidth; results.height = stage.stageHeight; results.text = "Iterations,If-Else Ladder,If-Else Tree,Ternary,Switch\n"; for (var val:int = 0; val < 20; ++val) { results.appendText(val+","); beforeTime = getTimer(); for (i = 0; i < ITERATIONS; ++i) { if (val == 0) func0(); else if (val == 1) func1(); else if (val == 2) func2(); else if (val == 3) func3(); else if (val == 4) func4(); else if (val == 5) func5(); else if (val == 6) func6(); else if (val == 7) func7(); else if (val == 8) func8(); else if (val == 9) func9(); else if (val == 10) func10(); else if (val == 11) func11(); else if (val == 12) func12(); else if (val == 13) func13(); else if (val == 14) func14(); else if (val == 15) func15(); else if (val == 16) func16(); else if (val == 17) func17(); else if (val == 18) func18(); else func19(); } afterTime = getTimer(); results.appendText((afterTime-beforeTime)+","); beforeTime = getTimer(); for (i = 0; i < ITERATIONS; ++i) { if (val < 10) { if (val < 5) { if (val < 2) { if (val == 0) { func0(); } else { func1(); } } else { if (val == 2) { func2(); } else if (val == 3) { func3(); } else { func4(); } } } else { if (val < 7) { if (val == 5) { func5(); } else { func6(); } } else { if (val == 7) { func7(); } else if (val == 8) { func8(); } else { func9(); } } } } else { if (val < 15) { if (val < 12) { if (val == 10) { func10(); } else { func11(); } } else { if (val == 12) { func12(); } else if (val == 13) { func13(); } else { func14(); } } } else { if (val < 17) { if (val == 15) { func15(); } else { func16(); } } else { if (val == 17) { func17(); } else if (val == 18) { func18(); } else { func19(); } } } } } afterTime = getTimer(); results.appendText((afterTime-beforeTime)+","); beforeTime = getTimer(); for (i = 0; i < ITERATIONS; ++i) { val == 0 ? func0() : val == 1 ? func1() : val == 2 ? func2() : val == 3 ? func3() : val == 4 ? func4() : val == 5 ? func5() : val == 6 ? func6() : val == 7 ? func7() : val == 8 ? func8() : val == 9 ? func9() : val == 10 ? func10() : val == 11 ? func11() : val == 12 ? func12() : val == 13 ? func13() : val == 14 ? func14() : val == 15 ? func15() : val == 16 ? func16() : val == 17 ? func17() : val == 18 ? func18() : func19(); } afterTime = getTimer(); results.appendText((afterTime-beforeTime)+","); beforeTime = getTimer(); for (i = 0; i < ITERATIONS; ++i) { switch (val) { case 0: func0(); break; case 1: func1(); break; case 2: func2(); break; case 3: func3(); break; case 4: func4(); break; case 5: func5(); break; case 6: func6(); break; case 7: func7(); break; case 8: func8(); break; case 9: func9(); break; case 10: func10(); break; case 11: func11(); break; case 12: func12(); break; case 13: func13(); break; case 14: func14(); break; case 15: func15(); break; case 16: func16(); break; case 17: func17(); break; case 18: func18(); break; default: func19(); } } afterTime = getTimer(); results.appendText((afterTime-beforeTime)+"\n"); } addChild(results); } private function func0(): void{} private function func1(): void{} private function func2(): void{} private function func3(): void{} private function func4(): void{} private function func5(): void{} private function func6(): void{} private function func7(): void{} private function func8(): void{} private function func9(): void{} private function func10(): void{} private function func11(): void{} private function func12(): void{} private function func13(): void{} private function func14(): void{} private function func15(): void{} private function func16(): void{} private function func17(): void{} private function func18(): void{} private function func19(): void{} } }
I ran this test in the following environment:
- Release version of Flash Player 11.6.602.180
- 2.3 Ghz Intel Core i7
- Mac OS X 10.8.2
With two compilers, both in release mode (no debugging or verbose stack traces):
- ASC 1.0: Flex SDK 4.6.0.23201
- ASC 2.0: build 352231
And here are the results I got:



In ASC 1.0, the new “if-else tree” approach beats the competition out by taking a relatively constant amount of time regardless of what the value being checked for was. In contrast, “if-else ladders”, ternary (? :) operators, and even switch statements all scaled up in time linearly as the value was checked for later and later in each. This is a clear win for the “if-else tree” to the extent that it’s about twice as fast when the value is 20.
In ASC 2.0, things change dramatically. The ternary operator is horrifically slow now and taking well over 700 milliseconds when the value is 20 compared to only 200 milliseconds with ASC 1.0. It’s so slow that it dominates the graph and I needed to cut it out to see what’s happening with the others. If you look at the second ASC 2.0 chart (with ternary removed), you can see some details emerge. The “if-else ladder” is still performing linearly as it uses more and more time as the value gets toward the end of the if checks. On the other hand, “if-else ladders” are still performing in roughly constant time regardless of the value. The switch statement has also changed its performance profile and is now also running in roughly constant time in sharp contrast to the linear time it used to run in.
So, if you want a constant-time approach regardless of compiler you should use an “if-else tree”. If you’re only using ASC 2.0, it’s OK to use a switch statement instead. Actually, most people would probably consider it a “cleaner” option for code readability and it certainly requires less typing. However, you’ll suffer massively on performance if your code ever gets compiled with ASC 1.0.
So why is the switch statement so much faster now? The answer lies, of course, in the bytecode generated by each compiler. Here’s what ASC 1.0 puts out:
474 pushbyte 0
475 setlocal3
476 jump bb199
bb135
succs=[bb156]
477 label
478 jump bb156
bb136
succs=[bb198]
479 label
480 getlocal0
481 callpropvoid
482 jump bb198
bb137
succs=[bb198]
483 label
484 getlocal0
485 callpropvoid
486 jump bb198
bb138
succs=[bb198]
487 label
488 getlocal0
489 callpropvoid
490 jump bb198
bb139
succs=[bb198]
491 label
492 getlocal0
493 callpropvoid
494 jump bb198
bb140
succs=[bb198]
495 label
496 getlocal0
497 callpropvoid
498 jump bb198
bb141
succs=[bb198]
499 label
500 getlocal0
501 callpropvoid
502 jump bb198
bb142
succs=[bb198]
503 label
504 getlocal0
505 callpropvoid
506 jump bb198
bb143
succs=[bb198]
507 label
508 getlocal0
509 callpropvoid
510 jump bb198
bb144
succs=[bb198]
511 label
512 getlocal0
513 callpropvoid
514 jump bb198
bb145
succs=[bb198]
515 label
516 getlocal0
517 callpropvoid
518 jump bb198
bb146
succs=[bb198]
519 label
520 getlocal0
521 callpropvoid
522 jump bb198
bb147
succs=[bb198]
523 label
524 getlocal0
525 callpropvoid
526 jump bb198
bb148
succs=[bb198]
527 label
528 getlocal0
529 callpropvoid
530 jump bb198
bb149
succs=[bb198]
531 label
532 getlocal0
533 callpropvoid
534 jump bb198
bb150
succs=[bb198]
535 label
536 getlocal0
537 callpropvoid
538 jump bb198
bb151
succs=[bb198]
539 label
540 getlocal0
541 callpropvoid
542 jump bb198
bb152
succs=[bb198]
543 label
544 getlocal0
545 callpropvoid
546 jump bb198
bb153
succs=[bb198]
547 label
548 getlocal0
549 callpropvoid
550 jump bb198
bb154
succs=[bb198]
551 label
552 getlocal0
553 callpropvoid
554 jump bb198
bb155
succs=[bb198]
555 label
556 getlocal0
557 callpropvoid
558 jump bb198
bb156
succs=[bb158,bb157]
559 getlocal 6
560 setlocal 7
561 pushbyte 0
562 getlocal 7
563 ifstrictne bb158
bb157
succs=[bb197]
564 pushbyte 0
565 jump bb197
bb158
succs=[bb160,bb159]
566 pushbyte 1
567 getlocal 7
568 ifstrictne bb160
bb159
succs=[bb197]
569 pushbyte 1
570 jump bb197
bb160
succs=[bb161,bb162]
571 pushbyte 2
572 getlocal 7
573 ifstrictne bb162
bb161
succs=[bb197]
574 pushbyte 2
575 jump bb197
bb162
succs=[bb163,bb164]
576 pushbyte 3
577 getlocal 7
578 ifstrictne bb164
bb163
succs=[bb197]
579 pushbyte 3
580 jump bb197
bb164
succs=[bb166,bb165]
581 pushbyte 4
582 getlocal 7
583 ifstrictne bb166
bb165
succs=[bb197]
584 pushbyte 4
585 jump bb197
bb166
succs=[bb167,bb168]
586 pushbyte 5
587 getlocal 7
588 ifstrictne bb168
bb167
succs=[bb197]
589 pushbyte 5
590 jump bb197
bb168
succs=[bb169,bb170]
591 pushbyte 6
592 getlocal 7
593 ifstrictne bb170
bb169
succs=[bb197]
594 pushbyte 6
595 jump bb197
bb170
succs=[bb171,bb172]
596 pushbyte 7
597 getlocal 7
598 ifstrictne bb172
bb171
succs=[bb197]
599 pushbyte 7
600 jump bb197
bb172
succs=[bb173,bb174]
601 pushbyte 8
602 getlocal 7
603 ifstrictne bb174
bb173
succs=[bb197]
604 pushbyte 8
605 jump bb197
bb174
succs=[bb176,bb175]
606 pushbyte 9
607 getlocal 7
608 ifstrictne bb176
bb175
succs=[bb197]
609 pushbyte 9
610 jump bb197
bb176
succs=[bb177,bb178]
611 pushbyte 10
612 getlocal 7
613 ifstrictne bb178
bb177
succs=[bb197]
614 pushbyte 10
615 jump bb197
bb178
succs=[bb180,bb179]
616 pushbyte 11
617 getlocal 7
618 ifstrictne bb180
bb179
succs=[bb197]
619 pushbyte 11
620 jump bb197
bb180
succs=[bb182,bb181]
621 pushbyte 12
622 getlocal 7
623 ifstrictne bb182
bb181
succs=[bb197]
624 pushbyte 12
625 jump bb197
bb182
succs=[bb183,bb184]
626 pushbyte 13
627 getlocal 7
628 ifstrictne bb184
bb183
succs=[bb197]
629 pushbyte 13
630 jump bb197
bb184
succs=[bb186,bb185]
631 pushbyte 14
632 getlocal 7
633 ifstrictne bb186
bb185
succs=[bb197]
634 pushbyte 14
635 jump bb197
bb186
succs=[bb188,bb187]
636 pushbyte 15
637 getlocal 7
638 ifstrictne bb188
bb187
succs=[bb197]
639 pushbyte 15
640 jump bb197
bb188
succs=[bb190,bb189]
641 pushbyte 16
642 getlocal 7
643 ifstrictne bb190
bb189
succs=[bb197]
644 pushbyte 16
645 jump bb197
bb190
succs=[bb192,bb191]
646 pushbyte 17
647 getlocal 7
648 ifstrictne bb192
bb191
succs=[bb197]
649 pushbyte 17
650 jump bb197
bb192
succs=[bb193,bb194]
651 pushbyte 18
652 getlocal 7
653 ifstrictne bb194
bb193
succs=[bb197]
654 pushbyte 18
655 jump bb197
bb194
succs=[bb196]
656 jump bb196
bb195
succs=[bb197]
657 pushbyte 19
658 jump bb197
bb196
succs=[bb197]
659 pushbyte 19
bb197
succs=[bb138,bb141,bb198,bb152,bb151,bb147,bb143,bb148,bb153,bb139,bb136,bb140,bb145,bb137,bb142,bb150,bb146,bb155,bb154,bb149,bb144]
660 kill 7
661 lookupswitch default: bb155 maxcase: 20 bb136 bb137 bb138 bb139 bb140 bb141 bb142 bb143 bb144 bb145 bb146 bb147 bb148 bb149 bb150 bb151 bb152 bb153 bb154 bb155Notice that it essentially implements the switch statement with an “if-else ladder”: a series of ifstrictle opcodes, one per case label. The lookupswitch opcode comes at the end after all of these, essentially negating it as a way to improve performance. Now let’s look at the bytecode generated by ASC 2.0:
486 pushbyte 0
487 setlocal 6
488 jump bb175
bb153
succs=[bb155,bb159,bb164,bb169,bb163,bb158,bb160,bb170,bb162,bb173,bb156,bb166,bb168,bb165,bb154,bb171,bb167,bb172,bb161,bb157]
489 label
490 getlocal1
491 convert_i
492 lookupswitch default: bb173 maxcase: 19 bb154 bb155 bb156 bb157 bb158 bb159 bb160 bb161 bb162 bb163 bb164 bb165 bb166 bb167 bb168 bb169 bb170 bb171 bb172
bb154
succs=[bb174]
493 findpropstrict func0
494 callpropvoid
495 jump bb174
bb155
succs=[bb174]
496 findpropstrict func1
497 callpropvoid
498 jump bb174
bb156
succs=[bb174]
499 findpropstrict func2
500 callpropvoid
501 jump bb174
bb157
succs=[bb174]
502 findpropstrict func3
503 callpropvoid
504 jump bb174
bb158
succs=[bb174]
505 findpropstrict func4
506 callpropvoid
507 jump bb174
bb159
succs=[bb174]
508 findpropstrict func5
509 callpropvoid
510 jump bb174
bb160
succs=[bb174]
511 findpropstrict func6
512 callpropvoid
513 jump bb174
bb161
succs=[bb174]
514 findpropstrict func7
515 callpropvoid
516 jump bb174
bb162
succs=[bb174]
517 findpropstrict func8
518 callpropvoid
519 jump bb174
bb163
succs=[bb174]
520 findpropstrict func9
521 callpropvoid
522 jump bb174
bb164
succs=[bb174]
523 findpropstrict func10
524 callpropvoid
525 jump bb174
bb165
succs=[bb174]
526 findpropstrict func11
527 callpropvoid
528 jump bb174
bb166
succs=[bb174]
529 findpropstrict func12
530 callpropvoid
531 jump bb174
bb167
succs=[bb174]
532 findpropstrict func13
533 callpropvoid
534 jump bb174
bb168
succs=[bb174]
535 findpropstrict func14
536 callpropvoid
537 jump bb174
bb169
succs=[bb174]
538 findpropstrict func15
539 callpropvoid
540 jump bb174
bb170
succs=[bb174]
541 findpropstrict func16
542 callpropvoid
543 jump bb174
bb171
succs=[bb174]
544 findpropstrict func17
545 callpropvoid
546 jump bb174
bb172
succs=[bb174]
547 findpropstrict func18
548 callpropvoid
549 jump bb174
bb173
succs=[bb174]
550 findpropstrict func19
551 callpropvoid
bb174
succs=[bb175]
552 inclocal_i 6
bb175
succs=[bb153,bb176]
553 getlocal 6
554 pushint
555 iflt bb153
bb176Here the lookupswitch opcode comes at the beginning and jumps straight away to the appropriate block of code. This is exactly what the opcode is intended for: a quick jump depending on what the value of val is. This yields the constant-time performance we see in the graph above for ASC 2.0 and restores switch to its rightful place as the fastest and (arguably) cleanest approach available.
Spot a bug? Have a question or suggestion? Post a comment!
#1 by Ajay on March 18th, 2013
in following line
So, if you want a constant-time approach regardless of compiler you should use an “if-else ladderâ€
Shouldn’t it be “if-else tree”, or am I missing something?
#2 by jackson on March 18th, 2013
Thanks for pointing this out. It was a typo and I’ve now fixed it in the article
#3 by henke37 on March 18th, 2013
So, does this hold true even if you have a non constant expression for the case label?
#4 by jackson on March 18th, 2013
I don’t know, but it sounds like an interesting follow-up test.
#5 by Pierre on March 18th, 2013
Very nice performance tip! Never thought about this.
If you were checking some other type, like… Strings for example, could some similar “if-else tree” technique be used on that? That would probably involve deciding how to split up the String values, either alphabetically and/or by String length, I imagine?
Example:
//Ladder technique:
if(theString==”alpha”) { func0(); }
else if(theString==”beta”) { func1(); }
…
else if(theString==”zebra”) { func25(); }
//Tree technique:
if(theString.charCodeAt(0) < 108) {// char "L" in lowercase, sorta halfway point…
// … either break it down into smaller branches or do sub-ladders…
} else {
// … either break it down into smaller branches or do sub-ladders…
}
Could that be one approach basically?
#6 by jackson on March 18th, 2013
You might be able to adapt it to
Stringthat way, but you’d need to be very careful about function calls. ThecharCodeAtcalls are going to be very expensive, so there’s probably a sizable overhead to this approach. Still, it’s probably worth trying out to see some objective numbers and know for sure.#7 by skyboy on March 18th, 2013
The charCodeAt calls are actually extremely cheap, compared to a string comparison; the JIT may inline it or perform some other boost, as it’s just a bounds check and an array index operation: constant time.
That particular method is what I use in my JSON decoder, which even with ASC1.0 can outperform the native decoder: whether or not it will be a performance hit there is a complete tossup, and only benchmarking can be used to decide on the best option.
#8 by ben w on March 19th, 2013
didn’t know that! learn something every day, goes to show that one should NEVER assume.
#9 by jackson on March 19th, 2013
I wouldn’t have expected that at all. I would think
charCodeAtwould be just as expensive as any other function call. Is it getting preferential treatment by the compiler, the JIT, or the VM? This definitely warrants a follow-up test.#10 by skyboy on March 19th, 2013
It would have to be the JIT if it is getting inlined, the compiler treats it as a normal function call as far as bytecode goes, and the VM obeys the bytecode passed to it. Though being a static call, it’s pretty fast on top of being very simple: the biggest advantage is the string equality/comparison code is very generic, and thus slow.
For the specific use case in my decoder, the playing field is leveled by the switch conditions in the native code being very sparse: the C++ compiler may be making it a hash index or if/else comparison instead of an array index in preference of reducing memory usage for a relatively minor speed penalty. When the data being parsed is very small, the native code very likely beats mine, but the execution happens too fast to be measurable, while on large data sets the minor penalty adds up.
I hate to keep plugging my JSON class, but I’ve applied literally every speed tweak I can to it over time, to significant effect (and in excess of 1,000 lines of code).
#11 by Stephen Cheung on March 18th, 2013
Great article! Though ternary is the worst in ASC 2.0, but I think ppl seldom use ternary in this way.
#12 by Peter Henry on March 18th, 2013
Perhaps a little outside the goals of the test but I’d be very interested to see the comparison against using a map. Perhaps and Array/Vector for int indices specifically and/or Object/Dictionary for strings. eg.
var functionMap:Object = {
0 : func0,
1 : func1,
// …
19 : func19,
}
for (var val:int = 0; val < 20; ++val)
{
for (i = 0; i < ITERATIONS; ++i)
{
functionMap[val]();
}
}
#13 by jackson on March 18th, 2013
That’s a very good idea for a test. It’s certainly a more limiting approach than an
Objector aDictionarythat can handle any key, but perhaps there’s a performance advantage for certain cases. I’ll check it out.#14 by ben w on March 19th, 2013
Hi Jackson, thanks for sharing you interesting finds once again! I really think with these tests though you should expose us a button that we can click to run the tests on our machines right here in the browser then maybe spit the results into a HTML div or something though the ExternalInterface api.
That way we can really easily feedback similar results, or if we find something different due do out varying hardware/software. (be sure to make it obvious to those using debug players if you end up doing something like that)
:)
#15 by jackson on March 19th, 2013
I usually only do this when there’s more interesting results than a huge table of data, but I can see how it would save you some time downloading the code and compiling it. I added a “Launch the test app” link to the article just under the code and will try to do the same in the future. As for warning debug players, I try to keep the tests as simple as possible to make them easier to understand and remove any possible side-effects of other features. That said, I’ll still think about adding this as a standard feature since it does skew the results quite badly in most cases.
Thanks for the ideas!
#16 by ryzed on March 19th, 2013
Hi, again.
Want ask to make more variants, like 50-60, if you will test it again.
I’m sure, “nested ifs” will beat “switch” with this changes :)
#17 by JNT on April 2nd, 2013
Thanks for the almost scientific analysis! I really appreciate the performance tips on this site, but isn’t it Adobe’s homework to build strong compilers and VMs (compare JS and also Java number crunching performance e.g.) instead of forcing the devs to search for every little speed hack? Ultimately, this leads to bad programming style, such as using these ‘if-else-trees’ that shouldn’t exist at all, because that’s what the switch statement is for when comparing a single value. And nesting (!) so many ternary operators seems strange to me, at least in AS3. Whatever, thanks for revealing the (still huge?) potential regarding compiler optimizations.
#18 by jackson on April 2nd, 2013
It would definitely be nice if fewer hacks were necessary to get maximum performance, but the last time I tested AS3 vs. JavaScript was in November 2011 and Flash Player’s performance was still on-par with the best of the browsers, so AS3 can hold its own on that front at least. Java and native code (C/C++/ObjC) are another story and one that I haven’t directly tested yet.