Today’s article is about an unintuitive-yet-simple optimization you can use to hugely increase the speed of reading from Array, Vector, Dictionary, Object, and dynamic classes. Need I say more? Read on for this amazing speedup!

I was recently reading an article on Mark Knol’s site about some of the loop optimizations I’ve discussed before (e.g. removing the length getter) when I saw something I’d never seen before, despite my nine articles on AS3 loops and years of reading other people’s AS3 code. Mark was casting the result of an Array read operation to the type of variable he was assigning it to:

function test(items:Array, index:uint): void
{
	var item:MyItem;
 
	item = items[index]; // normal version
	item = items[index] as MyItem; // Mark's version
}

To be sure, you do not need to type as MyItem because the result of indexing an Array is an untyped (*) variable that can be assigned to anything. You don’t even get a compiler warning. If the type can’t be assigned, you’ll simply get null, 0, or some other default value. But, since this was an article on loop optimization and I was about to write a comment pointing out that casts can be expensive, I figured I should test my assumption. As it turns out, this cast wasn’t slowing down his version at all. In fact, it was yielding far superior performance to the version without a cast. Shocked, I developed a full performance test with Array, Vector, Dictionary, Object, and dynamic classes to see if this optimization applied elsewhere:

package
{
	import flash.display.*;
	import flash.utils.*;
	import flash.text.*;
 
	public class CastingLookups extends Sprite
	{
		private var __logger:TextField = new TextField();
		private function row(...cols): void
		{
			__logger.appendText(cols.join(",") + "\n");
		}
 
		public function CastingLookups()
		{
			__logger.autoSize = TextFieldAutoSize.LEFT;
			addChild(__logger);
 
			var beforeTime:int;
			var afterTime:int;
			var noCastTime:int;
			var castTime:int;
			var item:MyItem;
			var i:uint;
			var len:uint = 10000000;
			var itemsArray:Array = new Array(len);
			var itemsVector:Vector.<MyItem> = new Vector.<MyItem>(len);
			var itemsDictionary:Dictionary = new Dictionary();
			var itemsObject:Object = new Object();
			var itemsDynClass:Dynamic = new Dynamic();
			for (i = 0; i < len; ++i)
			{
				itemsArray[i] = 
				itemsVector[i] = 
				itemsDictionary[i] = 
				itemsObject[i] = 
				itemsDynClass[i] = new MyItem();
			}
 
			row("Type", "No Cast Time", "Cast Time");
 
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsArray[i];
			}
			afterTime = getTimer();
			noCastTime = afterTime-beforeTime;
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsArray[i] as MyItem;
			}
			afterTime = getTimer();
			castTime = afterTime-beforeTime;
			row("Array", noCastTime, castTime);
 
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsVector[i];
			}
			afterTime = getTimer();
			noCastTime = afterTime-beforeTime;
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsVector[i] as MyItem;
			}
			afterTime = getTimer();
			castTime = afterTime-beforeTime;
			row("Vector", noCastTime, castTime);
 
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsDictionary[i];
			}
			afterTime = getTimer();
			noCastTime = afterTime-beforeTime;
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsDictionary[i] as MyItem;
			}
			afterTime = getTimer();
			castTime = afterTime-beforeTime;
			row("Dictionary", noCastTime, castTime);
 
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsObject[i];
			}
			afterTime = getTimer();
			noCastTime = afterTime-beforeTime;
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsObject[i] as MyItem;
			}
			afterTime = getTimer();
			castTime = afterTime-beforeTime;
			row("Object", noCastTime, castTime);
 
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsDynClass[i];
			}
			afterTime = getTimer();
			noCastTime = afterTime-beforeTime;
			beforeTime = getTimer();
			for (i = 0; i < len; ++i)
			{
				item = itemsDynClass[i] as MyItem;
			}
			afterTime = getTimer();
			castTime = afterTime-beforeTime;
			row("Dynamic Class", noCastTime, castTime);
		}
	}
}
class MyItem{}
dynamic class Dynamic{}

I ran the performance test with the following environment:

  • Flex SDK (MXMLC) 4.5.1.21328, compiling in release mode (no debugging or verbose stack traces)
  • Release version of Flash Player 10.3.181.34
  • 2.4 Ghz Intel Core i5
  • Mac OS X 10.6.8

And got these results:

Type No Cast Time Cast Time
Array 134 68
Vector 126 63
Dictionary 340 270
Object 332 267
Dynamic Class 331 270

Casting Lookups Performance

The point is not to compare the various container types (as in Map Performance or Accessing Objects), but the huge speedup when the cast is added. For Array and Vector, the cast nearly doubles the speed! For Object, Dictionary, and dynamic classes, the optimization is less drastic, but still about a 25% speedup.

How is this possible? To see, let’s look at the bytecode generated for the “no cast” version of the Vector test: (with annotations by me)

    221     pushbyte      	0               // push 0 literal value
    223     convert_u     	                // convert 0 to an unsigned int
    224     setlocal      	4               // set 0 to i
    226     jump          	L7              // go to block L7
 
 
    L8: 
    230     label         	
    231     getlocal      	7               // get itemsVector
    233     getlocal      	4               // get i
    235     getproperty   	null            // index the vector
    237     coerce        	private::MyItem // implicit cast the result to a MyItem
    239     setlocal3     	                // set the result to item
    240     getlocal      	4               // get i
    242     increment     	                // i++
    243     convert_u     	                // convert i to an unsigned int
    244     setlocal      	4               // set result to i
 
    L7: 
    246     getlocal      	4               // get i
    248     getlocal      	5               // get len
    250     iflt          	L8              // if i < len, go to block L8

Now let’s look at the version with the cast:

    278     pushbyte      	0               // push 0 literal value
    280     convert_u     	                // convert 0 to an unsigned int
    281     setlocal      	4               // set 0 to i
    283     jump          	L9              // go to block L9
 
 
    L10: 
    287     label         	
    288     getlocal      	7               // get itemsVector
    290     getlocal      	4               // get i
    292     getproperty   	null            // index the vector
    294     getglobalscope	                // get the object at the top of the scope chain
    295     getslot       	2               // get the item at slot 2 in the global scope (i.e. MyItem)
    297     astypelate    	                // "as" cast to MyItem
    298     coerce        	private::MyItem // implicit cast to MyItem (again)
    300     setlocal3     	                // set the result to item
    301     getlocal      	4               // get i
    303     increment     	                // i++
    304     convert_u     	                // convert i to an unsigned int
    305     setlocal      	4               // set result to i
 
    L9: 
    307     getlocal      	4               // get i
    309     getlocal      	5               // get len
    311     iflt          	L10             // if i < len, goto block L10

Notice that the only difference is that the cast version adds the as cast via these three lines:

    294     getglobalscope	                // get the object at the top of the scope chain
    295     getslot       	2               // get the item at slot 2 in the global scope (i.e. MyItem)
    297     astypelate    	                // "as" cast to MyItem

These three lines are the only difference between the “cast” and “no cast” versions of every tested type.

How can adding instructions yield a 2x performance increase? I do not know. I’ve looked over the source code and bytecode at least a dozen times now and am positive that I haven’t switched the order or anything silly like that. If you spot an error, please comment below about it. Barring any mistake though, it looks like we have a way to hugely increase the speed at which we can access Array, Vector, Dictionary, Object, and dynamic classes!