Activation Objects
Closures are a really nice feature of AS3 (and JavaScript and AS2) and I’ve shown their performance disadvantages compared to regular methods before. Today I’ll discuss a further performance downside to closures that can slow down your code, not just the function call itself.
As a reminder, closures in AS3 look like this:
function normalFunction(): void { /* do normal function stuff */ function closure(): void { /* do closure stuff */ } /* do normal function stuff */ }
Hence you can have a function as a local variable of another function. This is handy, but slow to call as shown before. Now let’s look at the bytecode generated by MXMLC (4.1) for the above function:
function normalFunction():void /* disp_id 0*/ { activation { var closure:Function /* slot_id 1 */ } // local_count=2 max_scope=2 max_stack=2 code_len=15 0 getlocal0 1 pushscope 2 newactivation 3 dup 4 setlocal1 5 pushscope 6 getscopeobject 1 8 newfunction var undefined():void /* disp_id 0*/ 10 coerce Function 12 setslot 1 14 returnvoid }
For a function that does nothing but declare an empty function, that’s sure a lot of bytecode! Here’s what it’s doing:
getlocal0
,pushscope
– Getthis
and make it the scopenewactivation
– Make a new activation object representing the closuredup
,setlocal1
– Duplicate the activation object and assign it to theclosure
variablepushscope
– Make the closure the scopegetscopeobject 1
– Get the method’s scopenewfunction
– Addclosure
as a new function in the method’s scopecoerce Function
– Convert the newly-created function to aFunction
setslot1
– Set slot 1 of the method toclosure
Essentially, all of the above is to set up the closure. Let’s see a performance test of some simple functions (they sum the first N integers) so we can see what kind of performance impact closures can have:
package { import flash.text.*; import flash.utils.*; import flash.display.*; /** * A test app to show the performance effects of activation objects * @author Jackson Dunstan */ public class ActivationTest extends Sprite { private var __logger:TextField = new TextField(); private function log(msg:*): void { __logger.appendText(msg + "\n"); } public function ActivationTest() { __logger.autoSize = TextFieldAutoSize.LEFT; addChild(__logger); const NUM:int = 1000000000; var beforeTime:int; var afterTime:int; beforeTime = getTimer(); testNoActivation(NUM); afterTime = getTimer(); log("No activation: " + (afterTime-beforeTime)); beforeTime = getTimer(); testDirectActivation(NUM); afterTime = getTimer(); log("Direct Activation: " + (afterTime-beforeTime)); beforeTime = getTimer(); testIndirectActivation(NUM); afterTime = getTimer(); log("Indirect Activation: " + (afterTime-beforeTime)); } private function testNoActivation(num:int): int { var sum:int = 0; for (var i:int = 0; i < num; ++i) { sum += i; } return sum; } private function testDirectActivation(num:int): int { function foo():void{} var sum:int = 0; for (var i:int = 0; i < num; ++i) { sum += i; } return sum; } private function testIndirectActivation(num:int): int { function foo():void{} return testNoActivation(num); } } }
And the performance results:
Environment | No Activation | Direct Activation | Indirect Activation |
---|---|---|---|
2.4 Ghz Intel Core i5, Mac OS X | 2197 | 3548 | 2229 |
3.0 Ghz Intel Core 2 Duo, Windows XP | 4268 | 5590 | 4215 |
The above should be shocking. We’re seeing a 60% slowdown on Mac OS X and 30% slowdown on Windows XP when we have a closure in the function. Keep in mind that the closure is not actually ever called or referred to in any way, so the slowness of actually calling it is not the source of this tremendous slowdown. So what is? Let’s look at the bytecode for testNoActivation
. It’s all perfectly reasonable, so you shouldn’t see any surprises here:
function private::testNoActivation(int):int /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=2 code_len=28 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal2 5 pushbyte 0 7 setlocal3 8 jump L1 L2: 12 label 13 getlocal2 14 getlocal3 15 add 16 convert_i 17 setlocal2 18 inclocal_i 3 L1: 20 getlocal3 21 getlocal1 22 iflt L2 26 getlocal2 27 returnvalue }
With that as a reference, let’s look at the bytecode for testDirectActivation
. See if you can spot the difference:
function private::testDirectActivation(int):int /* disp_id 0*/ { activation { var num:int /* slot_id 1 */ var i:int /* slot_id 4 */ var sum:int /* slot_id 3 */ var foo:Function /* slot_id 2 */ } // local_count=3 max_scope=2 max_stack=3 code_len=77 0 getlocal0 1 pushscope 2 newactivation 3 dup 4 setlocal2 5 pushscope 6 getscopeobject 1 8 getlocal1 9 setslot 1 11 getscopeobject 1 13 newfunction var undefined():void /* disp_id 0*/ 15 coerce Function 17 setslot 2 19 getscopeobject 1 21 pushbyte 0 23 setslot 3 25 getscopeobject 1 27 pushbyte 0 29 setslot 4 31 jump L1 L2: 35 label 36 getscopeobject 1 38 getscopeobject 1 40 getslot 3 42 getscopeobject 1 44 getslot 4 46 add 47 convert_i 48 setslot 3 50 getscopeobject 1 52 getslot 4 54 increment_i 55 getscopeobject 1 57 swap 58 setslot 4 L1: 60 getscopeobject 1 62 getslot 4 64 getscopeobject 1 66 getslot 1 68 iflt L2 72 getscopeobject 1 74 getslot 3 76 returnvalue }
Firstly, I hope you caught that it’s nearly 3x longer than the version without the unused closure. Secondly, I hope you noticed that it’s absolutely littered with getslotobject
, getscopeobject
, getslot
, and setslot
operations. There’s even a label
operation in there for good measure. Thirdly, and least surprisingly, you’ll find all the setup code for the (unused) closure at the top, just as it was in the do-nothing function at the start of this article.
All of the above pointless operations serve to slow down the rest of the function’s work by 30% or 60%, so be extremely careful about using closures in performance-critical code. One simple way around this slowdown is to do as in testIndirectActivation
and externalize the performance-intensive part of the function into another method. While you’ll incur the function call overhead—which can be substantial—you stand a very good chance of dwarfing that overhead with the performance gains made by avoiding all the slow operations that would have otherwise infected your otherwise-fast code. Alternatively, consider eliminating the closure in favor of a method.
As a final note, I re-tested the final version of my linked list class by moving the log
function out to a method as suggested above. The performance of iterating over the list/Array
—by far the most intensive test performed within the function with the log
closure—increased by a factor of 3!
#1 by Nek on October 11th, 2010 ·
Thanks! That’s really useful for performance optimization which I’m more and more into.