JacksonDunstan.com

Closures are a really nice feature of AS3 (and JavaScript and AS2) and I’ve shown their performance disadvantages compared to regular methods before. Today I’ll discuss a further performance downside to closures that can slow down your code, not just the function call itself.

As a reminder, closures in AS3 look like this:

function normalFunction(): void
{
	/* do normal function stuff */
	function closure(): void { /* do closure stuff */ }
	/* do normal function stuff */
}

Hence you can have a function as a local variable of another function. This is handy, but slow to call as shown before. Now let’s look at the bytecode generated by MXMLC (4.1) for the above function:

  function normalFunction():void	/* disp_id 0*/
  {
    activation {
      var closure:Function	/* slot_id 1 */
    }
    // local_count=2 max_scope=2 max_stack=2 code_len=15
    0       getlocal0     	
    1       pushscope     	
    2       newactivation 	
    3       dup           	
    4       setlocal1     	
    5       pushscope     	
    6       getscopeobject	1
    8       newfunction   	var undefined():void	/* disp_id 0*/
    10      coerce        	Function
    12      setslot       	1
    14      returnvoid    	
  }

For a function that does nothing but declare an empty function, that’s sure a lot of bytecode! Here’s what it’s doing:

getlocal0, pushscope – Get this and make it the scope
newactivation – Make a new activation object representing the closure
dup, setlocal1 – Duplicate the activation object and assign it to the closure variable
pushscope – Make the closure the scope
getscopeobject 1 – Get the method’s scope
newfunction – Add closure as a new function in the method’s scope
coerce Function – Convert the newly-created function to a Function
setslot1 – Set slot 1 of the method to closure

Essentially, all of the above is to set up the closure. Let’s see a performance test of some simple functions (they sum the first N integers) so we can see what kind of performance impact closures can have:

package
{
	import flash.text.*;
	import flash.utils.*;
	import flash.display.*;
 
	/**
	*   A test app to show the performance effects of activation objects
	*   @author Jackson Dunstan
	*/
	public class ActivationTest extends Sprite
	{
		private var __logger:TextField = new TextField();
		private function log(msg:*): void
		{
			__logger.appendText(msg + "\n");
		}
 
		public function ActivationTest()
		{
			__logger.autoSize = TextFieldAutoSize.LEFT;
			addChild(__logger);
 
			const NUM:int = 1000000000;
			var beforeTime:int;
			var afterTime:int;
 
			beforeTime = getTimer();
			testNoActivation(NUM);
			afterTime = getTimer();
			log("No activation: " + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			testDirectActivation(NUM);
			afterTime = getTimer();
			log("Direct Activation: " + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			testIndirectActivation(NUM);
			afterTime = getTimer();
			log("Indirect Activation: " + (afterTime-beforeTime));
		}
 
		private function testNoActivation(num:int): int
		{
			var sum:int = 0;
			for (var i:int = 0; i < num; ++i)
			{
				sum += i;
			}
			return sum;
		}
 
		private function testDirectActivation(num:int): int
		{
			function foo():void{}
 
			var sum:int = 0;
			for (var i:int = 0; i < num; ++i)
			{
				sum += i;
			}
			return sum;
		}
 
		private function testIndirectActivation(num:int): int
		{
                        function foo():void{}
 
			return testNoActivation(num);
		}
	}
}

And the performance results:

Environment	No Activation	Direct Activation	Indirect Activation
2.4 Ghz Intel Core i5, Mac OS X	2197	3548	2229
3.0 Ghz Intel Core 2 Duo, Windows XP	4268	5590	4215

The above should be shocking. We’re seeing a 60% slowdown on Mac OS X and 30% slowdown on Windows XP when we have a closure in the function. Keep in mind that the closure is not actually ever called or referred to in any way, so the slowness of actually calling it is not the source of this tremendous slowdown. So what is? Let’s look at the bytecode for testNoActivation. It’s all perfectly reasonable, so you shouldn’t see any surprises here:

  function private::testNoActivation(int):int	/* disp_id 0*/
  {
    // local_count=4 max_scope=1 max_stack=2 code_len=28
    0       getlocal0     	
    1       pushscope     	
    2       pushbyte      	0
    4       setlocal2     	
    5       pushbyte      	0
    7       setlocal3     	
    8       jump          	L1
 
 
    L2: 
    12      label         	
    13      getlocal2     	
    14      getlocal3     	
    15      add           	
    16      convert_i     	
    17      setlocal2     	
    18      inclocal_i    	3
 
    L1: 
    20      getlocal3     	
    21      getlocal1     	
    22      iflt          	L2
 
    26      getlocal2     	
    27      returnvalue   	
  }

With that as a reference, let’s look at the bytecode for testDirectActivation. See if you can spot the difference:

function private::testDirectActivation(int):int	/* disp_id 0*/
  {
    activation {
      var num:int	/* slot_id 1 */
      var i:int	/* slot_id 4 */
      var sum:int	/* slot_id 3 */
      var foo:Function	/* slot_id 2 */
    }
    // local_count=3 max_scope=2 max_stack=3 code_len=77
    0       getlocal0     	
    1       pushscope     	
    2       newactivation 	
    3       dup           	
    4       setlocal2     	
    5       pushscope     	
    6       getscopeobject	1
    8       getlocal1     	
    9       setslot       	1
    11      getscopeobject	1
    13      newfunction   	var undefined():void	/* disp_id 0*/
    15      coerce        	Function
    17      setslot       	2
    19      getscopeobject	1
    21      pushbyte      	0
    23      setslot       	3
    25      getscopeobject	1
    27      pushbyte      	0
    29      setslot       	4
    31      jump          	L1
 
 
    L2: 
    35      label         	
    36      getscopeobject	1
    38      getscopeobject	1
    40      getslot       	3
    42      getscopeobject	1
    44      getslot       	4
    46      add           	
    47      convert_i     	
    48      setslot       	3
    50      getscopeobject	1
    52      getslot       	4
    54      increment_i   	
    55      getscopeobject	1
    57      swap          	
    58      setslot       	4
 
    L1: 
    60      getscopeobject	1
    62      getslot       	4
    64      getscopeobject	1
    66      getslot       	1
    68      iflt          	L2
 
    72      getscopeobject	1
    74      getslot       	3
    76      returnvalue   	
  }

Firstly, I hope you caught that it’s nearly 3x longer than the version without the unused closure. Secondly, I hope you noticed that it’s absolutely littered with getslotobject, getscopeobject, getslot, and setslot operations. There’s even a label operation in there for good measure. Thirdly, and least surprisingly, you’ll find all the setup code for the (unused) closure at the top, just as it was in the do-nothing function at the start of this article.

All of the above pointless operations serve to slow down the rest of the function’s work by 30% or 60%, so be extremely careful about using closures in performance-critical code. One simple way around this slowdown is to do as in testIndirectActivation and externalize the performance-intensive part of the function into another method. While you’ll incur the function call overhead—which can be substantial—you stand a very good chance of dwarfing that overhead with the performance gains made by avoiding all the slow operations that would have otherwise infected your otherwise-fast code. Alternatively, consider eliminating the closure in favor of a method.

As a final note, I re-tested the final version of my linked list class by moving the log function out to a method as suggested above. The performance of iterating over the list/Array—by far the most intensive test performed within the function with the log closure—increased by a factor of 3!

Activation Objects

Comments