Logical Operator Performance
An absolute fundamental of programming is the concept of logical operators like &&
and ||
. In a recent comment, Chris H pointed out that MXMLC doesn’t do a particularly good job generating bytecode for these operators. Today I’ll look further into the subject and see just how much this impacts performance.
Let’s take a look at a very simple function using the “logical and” (&&
) operator:
private function compound(a:int, b:int, c:int, d:int, e:int): void { if (a == 1 && b == 2 && c == 3 && d == 4 && e == 5) { } }
First note that this function doesn’t really do anything. MXMLC is perfectly free to delete the contents of this function as an optimization since they can have no effect on anything. However MXMLC (at least in version 4.1) doesn’t do this sort of optimization. While bad for performance, it’ll allow us to see the bytecode that would be generated if MXMLC were forced to keep the if
statement around like it would in a real function. So, let’s examine the output bytecode:
function private::compound(int,int,int,int,int):void /* disp_id 0*/ { // local_count=6 max_scope=1 max_stack=2 code_len=53 0 getlocal0 1 pushscope 2 getlocal1 3 pushbyte 1 5 equals 6 dup 7 iffalse L1 11 pop 12 getlocal2 13 pushbyte 2 15 equals L1: 16 dup 17 iffalse L2 21 pop 22 getlocal3 23 pushbyte 3 25 equals L2: 26 dup 27 iffalse L3 31 pop 32 getlocal 4 34 pushbyte 4 36 equals L3: 37 dup 38 iffalse L4 42 pop 43 getlocal 5 45 pushbyte 5 47 equals L4: 48 iffalse L5 L5: 52 returnvoid }
While a bit long, this bytecode is basically doing the same thing over and over as it checks a
, b
, c
, d
, and e
against 1, 2, 3, 4, and 5. The way it does this is through the following series of operations:
getlocal1
– Push the variable to compare to the operand stackpushbyte 1
– Push the literal value to compare to the operand stackequals
– Pop the variable and literal value, compare them, and push the resulting booleandup
– Duplicate the resulting booleaniffalse L1
– Skip second comparison if the comparison failed
While not extremely complicated, this approach has the downside of performing multiple jumps via iffalse
and quite a bit of seemingly-unnecessary stack operations like dup
. Let’s look at an alternative approach in AS3 to see if we can trick the compiler into generating better bytecode whilst still accomplishing the exact same task:
private function chain(a:int, b:int, c:int, d:int, e:int): void { if (a == 1) { if (b == 2) { if (c == 3) { if (d == 4) { if (e == 4) { } } } } } }
This AS3 code is longer since we’ve eschewed the logical operator &&
in favor of equivalent, nested if
blocks. Let’s see if this trick worked by looking at the generated bytecode:
function private::chain(int,int,int,int,int):void /* disp_id 0*/ { // local_count=6 max_scope=1 max_stack=2 code_len=40 0 getlocal0 1 pushscope 2 getlocal1 3 pushbyte 1 5 ifne L1 9 getlocal2 10 pushbyte 2 12 ifne L1 16 getlocal3 17 pushbyte 3 19 ifne L1 23 getlocal 4 25 pushbyte 4 27 ifne L1 31 getlocal 5 33 pushbyte 4 35 ifne L1 L1: 39 returnvoid }
Unlike the AS3, this bytecode is quite a bit shorter and therefore starts out with a natural advantage: smaller code SWF size. Let’s see how it accomplishes the task:
getlocal1
– Push the variable to compare to the operand stackpushbyte 1
– Push the literal value to compare to the operand stackifne L1
– If the variable doesn’t equal the literal, skip to the end
By using ifne
instead of equals
, dup
, and iffalse
to do comparisons, this bytecode is much more compact and taking much better advantage of the available instructions. Further, this version properly skips to the end of the if
chain as soon as a comparison fails rather than simply skipping to a strange mid-way point in the process. This allows the Flash VM to “short circuit” the comparisons properly and hopefully save some execution time. Speaking of execution time, let’s see a performance test:
package { import flash.display.*; import flash.text.*; import flash.utils.*; public class LogicalOperators extends Sprite { public function LogicalOperators() { var logger:TextField = new TextField(); logger.autoSize = TextFieldAutoSize.LEFT; addChild(logger); function log(msg:*): void { logger.appendText(msg+"\n"); } var beforeTime:int; var afterTime:int; var i:int; var a:int, b:int, c:int, d:int, e:int; const REPS:int = 100000000; log("all true"); a = 1; b = 2; c = 3; d = 4; e = 5; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { if (a == 1 && b == 2 && c == 3 && d == 4 && e == 5) { } } afterTime = getTimer(); log("\tCompound: " + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { if (a == 1) { if (b == 2) { if (c == 3) { if (d == 4) { if (e == 4) { } } } } } } afterTime = getTimer(); log("\tChain: " + (afterTime-beforeTime)); log("all false"); a = 0; b = 0; c = 0; d = 0; e = 0; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { if (a == 1 && b == 2 && c == 3 && d == 4 && e == 5) { } } afterTime = getTimer(); log("\tCompound: " + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { if (a == 1) { if (b == 2) { if (c == 3) { if (d == 4) { if (e == 4) { } } } } } } afterTime = getTimer(); log("\tChain: " + (afterTime-beforeTime)); } } }
The performance test measures the time it takes to run through these five comparisons via the “compound” approach (using local operators) and via the “chain” approach (using a chain of if
blocks). Here are the performance results I get:
Environment | All True | All False | ||
---|---|---|---|---|
Compound | Chain | Compound | Chain | |
3.0 Ghz Intel Core 2 Duo, Windows XP | 806 | 428 | 565 | 291 |
2.0 Ghz Intel Core 2 Duo, Mac OS X | 1302 | 639 | 844 | 432 |
As you can see, the performance difference is striking: if
chains are about twice as fast regardless of platform or even if any or all of the comparisons are true. If you don’t mind a little extra typing in AS3, you can save a little SWF size and a lot of execution time by using an if
chain instead of logical operators like &&
.
#1 by Simon Richardson on September 6th, 2010 ·
This is really bad, I can’t believe that simple Logical Operator is so badly written in bytecode – shocking!
#2 by Joa Ebert on September 6th, 2010 ·
Great find. WIll add it to Apparat.
#3 by jackson on September 6th, 2010 ·
Cool!
#4 by bwhiting on September 6th, 2010 ·
#5 by Will on September 6th, 2010 ·
I get different results when I try this with HaXe
Testing at 10 000 000 on a 2.5ghz AMD 64×2 w/ 3gb RAM
#6 by jackson on September 6th, 2010 ·
Wow, apparently HaXe does an even worse job at this than MXMLC/AS3. Your processor is comparable to the ones I used in the test and yet the HaXe test running on it is 2-3x slower. Also, it’s just as disturbing to me that there is a ~3x advantage in HaXe with the “compound” method as the ~2x advantage I found in AS3 for the “chain” method. Really, both methods should result in the same bytecode and it should be the fastest bytecode possible. Currently, that’s the bytecode that MXMLC generates for the AS3 “chain” method as shown above.
Thanks for the HaXe figures. HaXe has been very interesting to me for quite some time and it’s always good to see how performance is shaping up with it.
#7 by aaulia on September 6th, 2010 ·
Wow this is interesting, unfortunately for me, this is something that should be done in tool chain level (nice to hear Joa gonna implement this on apparat ^_^), since this is (well for me, at least) break readability :(
#8 by DieTapete on September 12th, 2010 ·
This is sad.. thanks for letting us know!