JacksonDunstan.com | Top 10 Performance Tips for 2012

Top 10 Performance Tips for 2012

January 2, 2012 Tags: GC, native, performance, stage3d

It’s a new year and it’s time to make some New Years resolutions for Flash performance. Today’s article is a collection of what i consider 10 top tips for improving the performance of your Flash apps. Read on for the list!

Use native code instead of AS3
Generally, it’s better to let Flash Player’s native code do the heavy lifting rather than using AS3. The reason for this is that the VM that AS3 code runs in necessarily has some overhead to it compared to the functions you can call in the Flash Player API (everything in the flash.* packages). One exception to this rule is when the API does something you want to avoid such as allocate memory.
Eliminate allocations to reduce GC
In addition to the allocations that you expect—such as those that you trigger by using the new operator—there are tons of hidden allocations such as String objects from concatenation, objects the Flash Player creates such as Events. These allocations are slow and collecting them when you’re done with them is even slower, so try to get rid of them.
Reuse objects to reduce GC
When you’re done with objects, Flash Player’s garbage collector will reclaim their memory for reuse later. Unfortunately, this process is very slow and it’s hard to control when it will happen. Instead of making new objects, consider reusing existing ones. One technique that can help with this is called free lists.
Don’t use anything dynamic
That includes dynamic functions (e.g. anonymous functions and functions declared as local variables), objects such as Object and MovieClip, the [] operator for accessing fields, and untyped (*) variables. All of these are much slower than their static equivalents such as regular methods, non-dynamic classes, the dot operator, and typed variables.
Offload to the GPU
These days in Flash you have the ability to use video cards’ GPUs in addition to the main CPU. Using a combination of these is the key to high performance with 3D graphics (Stage3D) and high-definition video (StageVideo).
Reduce function calls
Function calls are, very unfortunately, quite slow in AS3. This includes getters and setters, which occur all the time (e.g. Array.length). Try caching the results of functions instead of calling them more than once, especially with getters. In extreme cases, manually inline the body of functions into one larger function.
Use speciality functions and classes rather than general purpose ones
Sprite uses less memory than MovieClip, Shape uses less memory than Sprite, and BitmapData.copyPixels is faster than BitmapData.draw.
Use fewer static accesses
Accessing static variables, constants, and functions is slower than accessing their non-static counterparts. Consider using non-static alternatives or caching those static accesses as either non-static variables/constants or local variables/constants.
Prefer local variables to fields
Reading and writing class and object variables (a.k.a. fields, member variables) is much slower than accessing local variables. Cache field accesses as local variables when you use them a lot.
Eliminate pointless code
It’s common to see variables initialized to their default value, which slows down the creation of objects and the execution time of functions. Get rid of this code that does nothing as a matter of good habit and the performance advantages will add up across your entire app.

Stay tuned for more articles with even more Flash performance tips.

Comments

#1 by AlexG on January 2nd, 2012 · Reply

Great article ! Some more code examples would make it easier to perceive information.
Thanks !
#2 by Aleksandr Makov on January 2nd, 2012 · Reply

Good stuff! Thanks! I’d add: avoid using while keyword…it evaluates the condition every cycle accessing subjects of comparsion, it’s a DAMN expensive little word…
#3 by Tronster on January 2nd, 2012 · Reply

Get list of tips; I somehow had missed your 2009 free-list article; a great compliment to object pools!
#4 by NemoStein on January 2nd, 2012 · Reply

Indeed, great article!
The only tip that I couldn’t understand was the 1st one (native instead AS3).
Could we get some (archived, maybe) article as example?
- #5 by Bob on January 2nd, 2012 · Reply
  
  For example, use a flash.filters filter for your images instead of writing your own with get/set bitmap methods.
  - #6 by jackson on January 2nd, 2012 · Reply
    
    That’s a good example. In general, try to use the Flash API instead of doing the same thing yourself. You may even find that using two or three Flash API functions is faster than a function you could write yourself that did the same thing in a more direct way. The advantage of native code for expensive operations (like image filtering as Bob points out) is rather large so it’s always good to consider what the Flash API offers before you write a custom function yourself.
#7 by NemoStein on January 3rd, 2012 · Reply

Thx, Bob and Jackson.

By the way, Jackson, you suggested to “Use fewer static accesses”.
For some time now I was writing code like this:

[Embed(source = "assets/image.png")] private static const imgImage:Class;

I thought that, as a constant, the compiler knew that it would never change, and as a static property, the instantiation occur only once and in the first time the class is called.
Am I irreversibly wrong.
- #8 by jackson on January 3rd, 2012 · Reply
  
  The tip is to access static variables like that as infrequently as possible. For example, instead of this:
  
  // Pretty pointless, but make an array of the first 1000 multiples of PI const len:int = 1000; var timesPI:Array = new Array(len); for (var i:int; i < len; ++i) { timesPI[i] = Math.PI * i; }
  
  Cache the Math.PI as a local variable:
  
  // Pretty pointless, but make an array of the first 1000 multiples of PI const len:int = 1000; const PI:Number = Math.PI; var timesPI:Array = new Array(len); for (var i:int; i < len; ++i) { timesPI[i] = PI * i; }
  
  You therefore reduce your static field accesses from 1000 to 1 at the cost of a local variable and 1000 local variable accesses. Since local variable accesses are much cheaper, this results in a big performance savings. The same would apply for your embedded asset if you were accessing the static variable it’s bound to a lot, such as an image tile being repeated in a game.
#9 by snick on January 3rd, 2012 · Reply

Just complaining about the tip #8. Statically means that compiler (so the runtime) know the exact position of that object (and the size). In the byte code it simply replace the address location every time there’s an invocation.
On the other hand non static declaration suffers of the dynamic luck up system ( that’s the way AS3, as java does, resolve inheritance and type checking at runtime).

So, i would say that statical definition are more efficient. For completeness i must admit that static defining thing is not a best practice, and sometimes lead in inefficient use of memory.

I’d like to divagate a bit on point #10. Determining if a variable is used or not within is scope is a Turing-Complete problem. So modern compilers usually automatically delete unused variables (and also do lot of optimization in loops index, inlining and so so on) this usually rise the compilation time but enhance efficiency. Somehow this is what Apparat (by Joa Ebert) does (and what MXML doesn’t).
- #10 by jackson on January 3rd, 2012 · Reply
  
  Hey snick, these are good issues to bring up as Flash (the compiler and the player) oftentimes behave in unexpected ways, especially regarding performance. I can’t locate a good article to reference for #8, so I made a little test app:
  
  package { import flash.display.Sprite; import flash.events.Event; import flash.text.TextField; import flash.text.TextFieldAutoSize; import flash.utils.getTimer; public class StaticTest extends Sprite { public function StaticTest() { addEventListener(Event.ENTER_FRAME, test); } private function test(ev:Event): void { removeEventListener(Event.ENTER_FRAME, test); var beforeTime:int; var afterTime:int; var staticTime:int; var localTime:int; const REPS:int = 1000000000; var i:int; var temp:Number; var localCache:Number; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { temp = Math.PI; } afterTime = getTimer(); staticTime = afterTime - beforeTime; beforeTime = getTimer(); localCache = Math.PI; for (i = 0; i < REPS; ++i) { temp = localCache; } afterTime = getTimer(); localTime = afterTime - beforeTime; var tf:TextField = new TextField(); tf.autoSize = TextFieldAutoSize.LEFT; tf.text = "Static: " + staticTime + "\nLocal time: " + localTime; addChild(tf); } } }
  
  Here are my results:
  
  Static: 7635 Local time: 2106
  
  So whatever’s going behind the scenes, the local caching (as recommended in the article) is paying off with a ~3.5x speedup. Now let’s look at the bytecode for the bodies of the loops:
  
  // static version: L2: 57 label 58 getlex Math 60 getproperty PI 62 convert_d 63 setlocal 8 65 inclocal_i 7 // local cached version: L4: 110 label 111 getlocal 9 113 convert_d 114 setlocal 8 116 inclocal_i 7
  
  The difference is that the static version has getlex and getproperty but the local cached version only has getlocal, which is apparently much cheaper. It’s tough to tell what’s going on once this bytecode goes though JIT, but it’s definitely not being converted to something as cheap as a static memory location access.
  
  As for point #10, I’m referring only to the stock MXMLC/compc/asc compiler from Adobe. Third party tools like Apparat can clean up some of the poorly-generated bytecode, but most people won’t integrate such a tool into their build system. Still, it’s a good and relevant point.
  - #11 by snick on January 3rd, 2012 · Reply
    
    That’s really curious. Isn’t it? thank you to bring up a concrete example. As you point out, this is not the “expected in theory” behavior. I asked Tinic to comes here and explain us what’s happening under the hood.
    
    Also i must rephrase my poor english phrase “compiler (so the runtime)” to “what compiler produces at compile time for the runtime”.
    And yep mxmlC instead mxml ;)
    
    btw, thanx you for the article!
    - #12 by skyboy on January 7th, 2012 · Reply
      
      Flash doesn’t resolve consts or static to anything more than the standard lookup. In optimized code this needn’t be a problem for functions since they can be called by index (not that apparat supports it or MXMLC generates it); for properties, lookups aren’t dynamic in most cases, unless it’s on a generic Object that has no properties predefined. Read times for non-static tend to be only around 10% slower on properties than local variables. Write times are astronomical in comparison, however.
      
      Unused local variables aren’t really much of a problem either, since it takes O(1) time to look up any index (though the first three are slightly faster by virtue of being a single op). There are many more ways to make all of this faster in the VM, but Adobe’s too tied up adding float and float4 (three new ops are coming with them, too. Expect generic addition to get even slower as I do not see special math ops for them yet)
      - #13 by Chris on February 11th, 2014 · Reply
        
        The write time seems to be the same these days? Just tested with Air 4.0.
#14 by focus on January 7th, 2012 · Reply

Jackson, thanks for this roundup, very helpful!
#15 by dimumurray on January 9th, 2012 · Reply

Great pointers. Reading this reminded me of Grant Skinner’s Quick As A Flash slide presentation. Great resource.
#16 by Nolan on September 30th, 2013 · Reply

Thanks for these tips Jackson, they’re a huge help.