JacksonDunstan.com | Why Static Is Slow

Why Static Is Slow

January 30, 2012 Tags: bytecode, compiler, non-static, performance, static, variable

Using static variables and functions is slow. That was the conclusion of the previous article on statics, but the subject is actually more nuanced than that. Today we’ll explore static more in-depth and find out just why it is so slow.

Based on some keen comments (particularly by skyboy), I’ve put together the following test:

// BaseClass.as
package
{
	import flash.display.Sprite;
	public class BaseClass extends Sprite
	{
		protected var superVal:Number = 44;
	}
}
 
// StaticTest2.as
package
{
	import flash.display.*;
	import flash.utils.*;
	import flash.text.*;
 
	public class StaticTest2 extends BaseClass
	{
		private var __logger:TextField = new TextField();
		private function row(...cols): void
		{
			__logger.appendText(cols.join(",")+"\n");
		}
 
		protected var val:Number = 33;
		protected static var staticVal:Number = 33;
 
		public function StaticTest2()
		{
			__logger.autoSize = TextFieldAutoSize.LEFT;
			addChild(__logger);
 
			var beforeTime:int;
			var afterTime:int;
			var REPS:int = 1000000;
			var i:int;
			var c:Class;
			var num:Number;
 
			row("Test", "Time");
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				c = Math;
			}
			afterTime = getTimer();
			row("Get class", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = Math.PI;
			}
			afterTime = getTimer();
			row("Dot access static var", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = Math["PI"];
			}
			afterTime = getTimer();
			row("Index access static var", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = val;
			}
			afterTime = getTimer();
			row("num = val", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = this.val;
			}
			afterTime = getTimer();
			row("num = this.val", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = superVal;
			}
			afterTime = getTimer();
			row("num = superVal", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = this.superVal;
			}
			afterTime = getTimer();
			row("num = this.superVal", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = super.superVal;
			}
			afterTime = getTimer();
			row("num = super.superVal", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = staticVal;
			}
			afterTime = getTimer();
			row("num = staticVal", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = StaticTest2.staticVal;
			}
			afterTime = getTimer();
			row("num = StaticTest2.staticVal", (afterTime-beforeTime));
		}
	}
}

Let’s look at the bytecode generated for each of these in order. First, just getting a class: (annotated by me)

getlex        	:Math  // get the Math class
coerce        	:Class // cast it to a Class
setlocal      	5      // assign it to the c variable

Next is getting a static variable (PI) of a class (Math), which was the focus of the last article:

getlex        	:Math // get the Math class
getproperty   	:PI   // get its PI property
convert_d     	      // convert PI to a Number
setlocal      	6     // assign PI to the num variable

Next we follow skyboy’s suggestion and access the PI property of Math by indexing like so: Math["PI"]

getlex        	:Math // get the Math class
pushstring    	"PI"  // push the "PI" string on the stack
getproperty   	private,StaticTest2,http://adobe.com/AS3/2006/builtin,,flash.text,flash.utils,private,,flash.display,StaticTest2,BaseClass,flash.display:Sprite,flash.display:DisplayObjectContainer,flash.display:InteractiveObject,flash.display:DisplayObject,flash.events:EventDispatcher:null // get the PI property
convert_d     	      // convert PI to a Number
setlocal      	6     // assign PI to the num variable

Wow, that’s a lot of arguments to getproperty! Later on in the article we’ll see what kind of performance impact that has. For now, let’s continue with some comparisons of non-static fields starting with num = val:

getlocal0     	                // get the "this" object
getproperty   	StaticTest2:val // get val, which is defined in StaticTest2
convert_d     	                // convert val to a Number
setlocal      	6               // assign val to the num variable

That was pretty straightforward. If we type a little more (as is common) and use num = this.val, does the compiler generate different bytecode? Let’s see:

getlocal0     	                // get the "this" object
getproperty   	StaticTest2:val // get val, which is defined in StaticTest2
convert_d     	                // convert val to a Number
setlocal      	6               // assign val to the num variable

The answer is clear: no, the generated code is identical whether or not you use the this keyword. Now what about variables defined in our parent class? Let’s start with num = superVal:

getlex        	BaseClass:superVal // get superVal
convert_d     	                   // convert superVal to a Number
setlocal      	6                  // assign superVal to the num variable

This bytecode uses a getlex just like the static accesses did, but it avoids the getproperty instruction. We’ll find out in a little bit what kind of performance impact this has. In the meantime, let’s see the bytecode generated when we use the this keyword to write num = this.superVal:

getlocal0     	                   // get the "this" object
getproperty   	BaseClass:superVal // get superVal
convert_d     	                   // convert superVal to a Number
setlocal      	6                  // assign superVal to num

Using the this keyword totally transforms the bytecode! Rather than using getlex, the bytecode is now identical to the num = this.val bytecode except for referencing the parent class instead of the base class in getproperty. Does the compiler do this when we use the super keyword? Let’s look:

getlocal0     	                   // get the "this" object
getsuper      	BaseClass:superVal // get superVal from the base class
convert_d     	                   // convert superVal to a Number
setlocal      	6                  // assign superVal to num

Here we have a third set of bytecode generated for functionally equivalent code. This version gets the this object like the version using the this keyword, but it uses a new instruction—getsuper—to fetch the superVal variable. Now that we have a good set of non-statics to compare against, let’s turn our attention back to statics and check out num = staticVal:

getlex        	StaticTest2:staticVal // get staticVal
convert_d     	                      // convert staticVal to a Number
setlocal      	6                     // assign staticVal to num

This access has generated the fewest number (3) of instructions so far, but are they faster given that they have a getlex? We’ll see in just a bit. Let’s look at the final test with num = StaticTest2.staticVal:

getglobalscope	                      // get the global scope
getslot       	1                     // get the first slot of the global scope
getproperty   	StaticTest2:staticVal // get the slot's first property: staticVal
convert_d     	                      // convert staticVal to a Number
setlocal      	6                     // assign staticVal to num

This has to be the most roundabout way of getting to staticVal given that it results in the same functionality as the previous version that simply had some irrelevant specification left off.

Now that we’ve inspected the bytecode, let’s look at the results. I ran these tests in the following environment:

Flex SDK (MXMLC) 4.5.1.21328, compiling in release mode (no debugging or verbose stack traces)
Release version of Flash Player 11.1.102.55
2.4 Ghz Intel Core i5
Mac OS X 10.7.2

And got these results:

Test	Time
Get class	3
Dot access static var	8
Index access static var	1360
num = val	2
num = this.val	2
num = superVal	2
num = this.superVal	2
num = super.superVal	2
num = staticVal	2
num = StaticTest2.staticVal	2

Field Performance

Given that the firs three tests forced the number of iterations down so far that there was apparently no difference between the rest of them, I decided to make another test class that doesn’t include the first three tests and crank up the iterations to examine the performance differences between the remaining tests:

package
{
	import flash.display.*;
	import flash.utils.*;
	import flash.text.*;
 
	public class StaticTest3 extends BaseClass
	{
		private var __logger:TextField = new TextField();
		private function row(...cols): void
		{
			__logger.appendText(cols.join(",")+"\n");
		}
 
		protected var val:Number = 33;
		protected static var staticVal:Number = 33;
 
		public function StaticTest3()
		{
			__logger.autoSize = TextFieldAutoSize.LEFT;
			addChild(__logger);
 
			var beforeTime:int;
			var afterTime:int;
			var REPS:int = 100000000;
			var i:int;
			var c:Class;
			var num:Number;
 
			row("Test", "Time");
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = val;
			}
			afterTime = getTimer();
			row("num = val", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = this.val;
			}
			afterTime = getTimer();
			row("num = this.val", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = superVal;
			}
			afterTime = getTimer();
			row("num = superVal", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = this.superVal;
			}
			afterTime = getTimer();
			row("num = this.superVal", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = super.superVal;
			}
			afterTime = getTimer();
			row("num = super.superVal", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = staticVal;
			}
			afterTime = getTimer();
			row("num = staticVal", (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				num = StaticTest3.staticVal;
			}
			afterTime = getTimer();
			row("num = StaticTest3.staticVal", (afterTime-beforeTime));
		}
	}
}

Here are the results for StaticTest3 in the same environment:

Test	Time
num = val	215
num = this.val	223
num = superVal	219
num = this.superVal	220
num = super.superVal	220
num = staticVal	222
num = StaticTest3.staticVal	293

Field Performance (Limited Set)

Now that we have the results data, let’s draw some conclusions:

Indexing a static variable (Math["PI"]) is far and away the slowest way you can access a field. Avoid it at all costs in performance-critical code.
Just getting a class (c = Math) is 50% slower than accessing a field (static or not) in the same class or any superclass, let alone the time needed to access its fields.
Accessing a static field of another class is about 4x slower than accessing any field in the same class or superclass.
Specifying even the same class (StaticTest3.staticVar) results in bytecode that is slower than any other way of accessing a field of the same class or superclass, static or not.
Despite wildly different bytecode, all other ways of accessing fields of the same class or superclass are roughly the same speed. The JIT is probably producing identical machine code for all of these different sets of bytecode instructions.

The main takeaway here is to remember that static is slow when you’re accessing through a class name. Minimize that (perhaps by caching) and you should be fine.

Spot a bug? Have a suggestion? Post a comment!

Comments

#1 by NemoStein on January 30th, 2012 · Reply

“The main takeaway here is to remember that static is slow when youâ€™re accessing through a class name. Minimize that (perhaps by caching) and you should be fine.”
In other words:

package
{
	public class Stuffer
	{
		public var someStuff:Object = {};
	}
}

package
{
	public class GetterClass
	{
		public var someStuff:Object = Stuffer.someStuff;
		
		public funtion getSomeStuffA():Object
		{
			return Stuffer.someStuff; // Bad!!
		}
		
		public funtion getSomeStuffB():Object
		{
			
			return someStuff; // Good!!
		}
	}
}

#2 by jackson on January 30th, 2012 · Reply

Exactly, but you could also make someStuff static and then reference it like this:

return someStuff;

Not like this:

return GetterClass.someStuff;

#3 by ben w on January 30th, 2012 · Reply

out of curiosity could you run the following test

var PI_String:String = "PI"; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { num = Math[PI_String]; } afterTime = getTimer(); row("predefined Index access static var", (afterTime-beforeTime));

just to see what impact creating the sting object a zillion times had on the test

cheers, ben
- #4 by jackson on January 30th, 2012 · Reply
  
  I actually did that test too, but dropped it when I found that it was quite a bit slower than the string literal version. Perhaps I’ll go more in depth on why in another article.
#5 by Aleksandr Makov on January 30th, 2012 · Reply

Hey Jackson, thanks for the research. What did you use to inspect the bytecode?
- #6 by jackson on January 30th, 2012 · Reply
  
  You’re welcome. :)
  
  I used swfdump that comes with the Flex SDK. Adobe has a nice article on how to use it. Basically you just do this:
  
  /path/to/flex/sdk/bin/swfdump -abc StaticTest2.swf > bytecode.txt
  
  Then check out bytecode.txt in a text editor. Scroll down a little and you should see the function names with their bytecode.

#7 by skyboy on January 30th, 2012 · Reply

I’d like to comment on this bit of code:
getproperty private,StaticTest2,http://adobe.com/AS3/2006/builtin,,flash.text,flash.utils,private,,flash.display,StaticTest2,BaseClass,flash.display:Sprite,flash.display:DisplayObjectContainer,flash.display:InteractiveObject,flash.display:DisplayObject,flash.events:EventDispatcher:null // get the PI property

Those aren’t a list of arguments; the arguments (Object, followed by the String) are on the stack. That’s actually a multiname and all of the names it contains; perhaps a NamespaceSet. All of the opcodes that have an argument on the same line are reading an index into a pool somewhere; such as with callmethod and callstatic – two opcodes for calling functions you will never find in SWFs, but they’re completely implemented. All that’s needed is a compiler or post-compiler that makes use of them: they skip most of the checks, and call directly into the SWF function pool (callstatic; this has limitations on application) or into the method pool of a class (callmethod; same limitations as callstatic).
The opcodes that do this are often faster than their stack-based brothers; but not always, due to optimizations made by the JIT in flash; for instance: (x * (x * (x * x))) is more likely to be optimized into faster code than x * x * x * x because of what the compiler generates; getlocal, getlocal, getlocal, getlocal, multiply, multiply, multiply vs. getlocal, getlocal, multiply, getlocal, multiply, getlocal, multiply. the JIT can optimize the former into a single superword (superwords are 16 bits where words are 8 bits; in the context of the interpreter) that loads 4 local variables in one go much more easily than the latter.
As useful as the JIT optimizations are, in most cases they don’t seem to be applied at all because the compiler generates dumb assembler, not designed to take advantage of the JIT optimizations. In some cases the compiler generates outright stupid code (like converting uints to float, decrementing then back to uint only when the decrement takes place inside a condition or brackets), but that’s another complaint.

This instruction is using a runtime multiname as well, which invokes a massive overhead (it’s this exact multiname that’s delayed me making my own post-compile optimizer; I’m still struggling with implementing a fully type-strict stack and interpreter just to address these properly).

This specific opcode could get an entire article series devoted to it; I’ll continue in another comment because I’m not sure what the length limit is.

#8 by jackson on January 30th, 2012 · Reply

I’m not sure what the length limit is, either. :)

Very interesting food for thought. Perhaps I’ll write an article on workarounds for the compiler like your (x * (x * (x * x))) or uint -> Number -> uint for decrement. Any more to share for inclusion?
- #9 by skyboy on January 30th, 2012 · Reply
  
  I can’t say every optimization the JIT makes and how to take advantage of them (I don’t know terribly many to start with; I haven’t dug deep into the JIT) but here is a list+implementations of the superwords: http://pastebin.com/g9tqypW3
  
  sp is the current working stack; pc is the current byte in the SWF’s ABC code (post-optimization) that’s being executed; I’m not entirely certain what framep is.
  
  The full code of the interpreter can be found here.
  - #10 by Kyle Murray / Krilnon on January 31st, 2012 · Reply
    
    framep is the frame pointer, which points to an address in the current stack frame. It’s useful because the stack pointer register often changes around during the execution of code in that frame.

#11 by skyboy on January 30th, 2012 · Reply

This is actual live code from the Flash Player AS3 interpreter source:

#define GET_MULTINAME_PTR(decl, arg)  decl = pool->precomputedMultiname(uint32_t(arg))

            INSTR(getlex) {
                SAVE_EXPC;
                // findpropstrict + getproperty
                // stack in:  -
                // stack out: value
                GET_MULTINAME_PTR(multiname, U30ARG);
                // only non-runtime names are allowed.  but this still includes
                // wildcard and attribute names.
                a1 = env->findproperty(scope, scopeBase, scopeDepth, multiname, true, withBase);
                *(++sp) = toplevel->getproperty(a1, multiname, toplevel->toVTable(a1));
                NEXT;
            }

#if 0
            // Multiname is neither isRuntime or isRtns
            INSTR(getproperty_fast) {
                SAVE_EXPC;
                GET_MULTINAME_PTR(multiname, U30ARG);
                sp[0] = toplevel->getproperty(sp[0], multiname, toplevel->toVTable(sp[0]));
                NEXT;
            }
#endif

            // get a property using a multiname ref
            INSTR(getproperty) {
                SAVE_EXPC;
                GET_MULTINAME_PTR(multiname, U30ARG);
                if (!multiname->isRuntime())
                {
                    *sp = toplevel->getproperty(*sp, multiname, toplevel->toVTable(*sp));
                }
                else if (!multiname->isRtns() && IS_INTEGER(*sp) && atomCanBeUint32(*sp) && AvmCore::isObject(sp[-1]))
                {
                    a2 = *(sp--);   // key
                    *sp = AvmCore::atomToScriptObject(*sp)->getUintProperty(UINT32_VALUE(a2));
                }
                else if(multiname->isRtns() || !AvmCore::isDictionaryLookup(*sp, *(sp-1)))
                {
                    aux_memory->multiname2 = *multiname;
                    sp = initMultiname(env, aux_memory->multiname2, sp);
                    *sp = toplevel->getproperty(*sp, &aux_memory->multiname2, toplevel->toVTable(*sp));
                }
                else
                {
                    a2 = *(sp--);   // key
                    *sp = AvmCore::atomToScriptObject(*sp)->getAtomProperty(a2);
                }
                NEXT;
            }

Yes, that getproperty_fast instruction is in the code, and it is commented out with precompiler tags. I don’t know the specifics on why implementing that superword was ditched, but I believe it has to do with the Dictionary class.

The getlex instruction does the same thing getproperty does, but it’s so much faster because it doesn’t have to deal with Array/Vector/Dictionary or runtime multinames. If that opcode was generated for getting the properties of static classes, we wouldn’t have performance pitfalls for accessing them. I still believe the access is dynamic, just not runtime dynamic – the classes are of the type Class, which has no properties or methods defined. The difference between accessing a static property and instance property appears to be similar to the difference seen with the dot operator vs index operator on another class (perhaps Dictionary?) in a previous article. So while the name is known, it’s still a dynamic look up when compared to the look up for an instance; it’s just not a dynamic runtime look up.
Of course, I can’t say for certain without digging deeper into the multiname pool of an individual SWF along with the associated code in the AS3 interpreter.

Apparat has a class or two in it that appear to exploit that by having non-dynamic calls to the function of a class, potentially by treating the class as an instance. I haven’t looked into what exactly goes on in that class to say, but using Apparat you could devise a test of keeping the Math class in a local variable then calling its methods/retrieving properties in a loop; Following up with a test that uses a single getlex instruction to grab the static properties.

More on callstatic/callmethod: The specific limitation I mentioned for both is that you can only call into the method pool of the running SWF. So these opcodes may very well result in huge performance gains in calling functions, they can have zero impact for calling functions native to AS3; such as the Math functions. You can see more of how they’re faster in this code:

            INSTR(callstatic) {
                SAVE_EXPC;
                // stack in: receiver, arg1..N
                // stack out: result
                u1 = U30ARG;            // method_id
                i2 = (intptr_t)U30ARG;  // argc
                env->nullcheck(sp[-i2]);
                // ISSUE if arg types were checked in verifier, this coerces again.
                f = env->abcEnv()->getMethod((uint32_t)u1);
                a1 = f->coerceEnter((int32_t)i2, sp-i2);
                *(sp -= i2) = a1;
                NEXT;
            }

            INSTR(callmethod) {
                SAVE_EXPC;
                // stack in: receiver, arg1..N
                // stack out: result
                u1 = U30ARG-1;         // disp_id
                i2 = (intptr_t)U30ARG; // argc
                a2p = sp-i2;           // atomv

                // must be a real class instance for this to be used.  primitives that have
                // methods will only have final bindings and no dispatch table.
                VTable* vtable = toplevel->toVTable(a2p[0]); // includes null check
                AvmAssert(u1 < vtable->traits->getTraitsBindings()->methodCount);
                f = vtable->methods[u1];
                // ISSUE if arg types were checked in verifier, this coerces again.
                a1 = f->coerceEnter((int32_t)i2, a2p);
                *(sp -= i2) = a1;
                NEXT;
            }

            INSTR(callproperty) {
                u1 = WORD_CODE_ONLY(WOP_callproperty) ABC_CODE_ONLY(OP_callproperty);
            callproperty_impl:
                SAVE_EXPC;
                GET_MULTINAME_PTR(multiname, U30ARG);
                i1 = (intptr_t)U30ARG; /* argc */
                a2p = sp - i1; /* atomv */
                sp = a2p;
                if (multiname->isRuntime())
                {
                    aux_memory->multiname2 = *multiname;
                    sp = initMultiname(env, aux_memory->multiname2, sp);
                    multiname = &aux_memory->multiname2;
                }
                a1 = *sp; /* base */
                if (u1 == WORD_CODE_ONLY(WOP_callproplex) ABC_CODE_ONLY(OP_callproplex))
                    a2p[0] = nullObjectAtom;
                *sp = toplevel->callproperty(a1, multiname, (int32_t)i1, a2p, toplevel->toVTable(a1));
                if (u1 == WORD_CODE_ONLY(WOP_callpropvoid) ABC_CODE_ONLY(OP_callpropvoid))
                    sp--;
                NEXT;
            }

            INSTR(callproplex) {
                u1 = WORD_CODE_ONLY(WOP_callproplex) ABC_CODE_ONLY(OP_callproplex);
                goto callproperty_impl;
            }

            INSTR(callpropvoid) {
                u1 = WORD_CODE_ONLY(WOP_callpropvoid) ABC_CODE_ONLY(OP_callpropvoid);
                goto callproperty_impl;
            }

            INSTR(call) {
                SAVE_EXPC;
                i1 = (intptr_t)U30ARG; // argc
                // stack in: function, receiver, arg1, ... argN
                // stack out: result
                a1 = toplevel->op_call(sp[-i1-1]/*function*/, (int32_t)i1, sp-i1);
                *(sp = sp-i1-1) = a1;
                NEXT;
            }

I can’t say why the compiler doesn’t make use of these two extra fully implemented instructions, but my post-compile optimizer will most definitely have the option to test it. If I can manage to get the stack/interpreting completed.

#12 by jackson on January 31st, 2012 · Reply

Very interesting insights. Thanks for sharing. When your post-compile optimizer is in usable shape, I’d love to test it out for you.

#13 by Rackdoll on January 31st, 2012 · Reply

First of all…. –> Nice article.

Second.
I was wondering if you tried the private static var / const in your tests ?
IF so can you post those results….. ?

thnx :)
#14 by Rackdoll on January 31st, 2012 · Reply

or are the results same as ” protected static” ?
Is there even a difference in performance..-> protected , private or public constants ?
- #15 by skyboy on January 31st, 2012 · Reply
  
  public is marginally faster (less than 1%) at thousands of iterations; the rest of them have less meaningful differences and const/var perform identically.
  
  When it comes to these matters (public/private/protected/internal/custom | const/var), write your code for what you need it to do; performance gains/losses is entirely meaningless due to incredibly small gains, that result in nothing of value when the user has other applications running that compete for resources (so, everyone).
  - #16 by Rackdoll on January 31st, 2012 · Reply
    
    ok. seems clear enough. Thnx for the feedback!
    keep on the greatness!
#17 by Kyle Murray / Krilnon on January 31st, 2012 · Reply

Using your tests, I don’t get such ridiculously slow results for Math[‘PI’]:

Flash Player 11.2.202.197 (beta 5, 64-bit, ‘release’, ActiveX)
SDK 4.5.1
Core i7-2600K 3.4 GHz

Get class,1 Dot access static var,5 Index access static var,63 // 63 is a lot lower than 1360 Math.public::PI,4 // I added this one num = val,2 num = this.val,2 num = superVal,2 num = this.superVal,2 num = super.superVal,1 num = staticVal,2 num = StaticTest2.staticVal,

Of course there are a number of differences in our setups, but it seemed like a notable performance difference.
- #18 by NemoStein on February 1st, 2012 · Reply
  
  “Flash Player 11.2.202.197 (beta 5, 64-bit, â€˜releaseâ€™, ActiveX)”
  
  Maybe FP 11.2 has some improvements that can lead to this result.
  Beta 5 should be the last beta before release, so, in two weeks we can see the “real” results of this.
#19 by Amit Patel on February 14th, 2012 · Reply

Since you’re testing variable access inside a loop, have you looked at the bytecode for the loop operations as well, to see how much of your running time is from the loop (increment, test, jump) vs the variable access?
- #20 by jackson on February 14th, 2012 · Reply
  
  This is a good idea for a test, so I tried it out on the same test machine as in the article, I’m getting about 209 ms for an empty loop. So all the accesses except the static access through a class name are a little more differentiated than in the above graph, but the basic idea still stands: static access through a class name is relatively way slower than all other kinds of access.
  - #21 by Amit Patel on February 15th, 2012 · Reply
    
    Thanks! If I understand right, that means accessing a local var (215-209 = 6) is over twice as fast as accessing an instance this.var (223 – 209 = 14) in the loop test, but they both test the same in the first test. Interesting. (I also don’t have a good sense of the std.dev of these measurements so that might account for the difference)
    - #22 by jackson on February 15th, 2012 · Reply
      
      Yes, local variables are a good deal quicker than fields, especially when writing to them. For more on this, see my article Local Variable Caching.
#23 by Alex on December 11th, 2012 · Reply

Hi. Got an off-topic question here. How do you see the bytecode? And if you look at whole .swf bytecode, how do you identify the method which a portion of bytecode relates to? Thanks.
- #24 by jackson on December 11th, 2012 · Reply
  
  Adobe’s Flex SDK comes with a little command line tool to show you the bytecode. Just run this command:
  
  # Print it all to the screen FLEX_SDK/bin/swfdump -abc MyApp.swf # Save it all to a file FLEX_SDK/bin/swfdump -abc MyApp.swf > Output.txt
  - #25 by Alex on December 14th, 2012 · Reply
    
    Thank you very much!