How To Use a Profiler To Get Better Performance
The site has had many articles about improving the performance of your app, but never discussed the basic methodology on which all optimizations should be based. Today’s article will go over a scientific approach to optimizing that makes use of a tool known as a profiler and demonstrate using an AS3 application just why it’s so important to usage such a tool.
A profiler is a tool that gathers statistics about the performance cost of each function in your app and presents it to you in a useful way. Usually, you’ll get a long list of functions with the top function taking the most time to complete and the bottom function taking the least. You see at a glance which functions are worth your time to optimize and which are not. This information is often surprising, even to programmers with many years of experience optimizing for performance. As an experiment, take a look at the following simple AS3 app and see if you can guess the performance problem.
package { import flash.display.Sprite; import flash.display.StageAlign; import flash.display.StageScaleMode; import flash.events.Event; import flash.text.TextField; import flash.text.TextFieldAutoSize; import flash.utils.getTimer; public class ProfileMe extends Sprite { private static const SIZE:int = 5000; private var logger:TextField = new TextField(); private var vec:Vector.<Number> = new Vector.<Number>(SIZE); public function ProfileMe() { addEventListener(Event.ENTER_FRAME, onEnterFrame); stage.align = StageAlign.TOP_LEFT; stage.scaleMode = StageScaleMode.NO_SCALE; logger.text = "Running test..."; logger.y = 100; logger.autoSize = TextFieldAutoSize.LEFT; addChild(logger); } private function onEnterFrame(ev:Event): void { logger.text = ""; var beforeTime:int; var afterTime:int; var totalTime:int; row("Operation", "Time"); beforeTime = getTimer(); buildVector(); afterTime = getTimer(); totalTime += afterTime - beforeTime; row("buildVector", (afterTime-beforeTime)); beforeTime = getTimer(); vec.sort(vecCompare); afterTime = getTimer(); totalTime += afterTime - beforeTime; row("sort", (afterTime-beforeTime)); row("total", totalTime); } private function buildVector(): void { var SIZE:int = ProfileMe.SIZE; var vec:Vector.<Number> = this.vec; for (var i:int; i < SIZE; ++i) { vec[i] = Math.abs(i) * Math.ceil(i) * Math.cos(i) * Math.exp(i) * Math.floor(i) * Math.round(i) * Math.sin(i) * Math.sqrt(i); } } private function vecCompare(a:Number, b:Number): int { if (a < b) { return -1; } else if (a > b) { return 1; } return 0; } private function row(...cols): void { logger.appendText(cols.join(",")+"\n"); } } }
In step one, the app builds a Vector
of Number
out of a lot of Math
calls. In step two, the app calls Vector.sort
to sort the list. I ran this test on the following environment:
- Flex SDK (MXMLC) 4.5.1.21328, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 11.1.102.63
- 2.4 Ghz Intel Core i5
- Mac OS X 10.7.3
And got these results
Operation | Time |
---|---|
buildVector | 1 |
sort | 78 |
total | 79 |
In a debug version of Flash Player, which is required to run the profiler, I got:
Operation | Time |
---|---|
buildVector | 7 |
sort | 620 |
total | 627 |
So clearly the Math
calls are faster than the Vector
sorting. In this simple app it was easy to add getTimer
calls around the only two functions. But what if your app consists of thousands or tens of thousands of lines of code? Clearly, it’s impractical to add so many getTimer
calls, even if you limit yourself to what you guess are the expensive portions of your app.
Enter the profiler. There are many available for AS3, usually as part of an IDE like Flash Builder, FlashDevelop, or FDT. Instead, we’ll be using TheMiner (formerly FlashPreloadProfiler) which is built in pure AS3 code rather than as an external tool. To set it up, let’s add a few lines of code to the above app:
DEBUG::profile { if (Capabilities.isDebugger) { addChild(new TheMiner()); } }
DEBUG::profile
is simply a Boolean
compile-time constant that lets us turn off the profiler with a compiler setting. Even if it’s enabled, it requires a debug version of the Flash Player to run, so we don’t try to run if Capabilities.isDebugger
is false
.
Next, we simply download the TheMiner SWC and add it to the application. If you’re compiling with the command line tool MXMLC or COMPC, your new command will look like this:
mxmlc --library-path+=TheMiner_en_v1_3_10.swc ProfileMe.as
Now when we run the app we see a UI for the profiler at the top:
Clicking on the “Performance Profiler” button, we see:
Here we immediately see the source of the problem in top listed function:
Function Name | % |
---|---|
ProfileMe/vecCompare | 81.02 |
Vector. |
17.63 |
ProfileMe/buildVector | 0.41 |
Math$/sqrt | 0.14 |
Notice how the sorting functions (the first two) dwarf the building functions (the second two). Together, they’re taking over 98% of the total run time! It would be a waste of our time to worry about the building functions, so let’s optimize the sorting ones. To do that, we’ll use skyboy‘s fastSort function instead of plain old Vector.sort
. It’s a simple one line change from:
vec.sort(vecCompare);
To:
fastSort(vec, vecCompare);
With this in place, I now get these results in a release player:
Operation | Time |
---|---|
buildVector | 1 |
sort | 23 |
total | 24 |
And in a debug player:
Operation | Time |
---|---|
buildVector | 7 |
sort | 48 |
total | 55 |
So in release we’ve optimized the total application from 79 milliseconds to 24, nearly a 3x improvement. If we had spent our time optimizing out all of the Math
calls with something like a lookup table, we could have only possibly gotten a 1 millisecond savings, which would be about 1% faster.
In conclusion, a profiler is definitely a tool that you want to use while performance tuning your app. It helps you quickly and easily identify the performance problems and, perhaps even more importantly, the performance problems you don’t have. Don’t waste time optimizing (and often uglifying) your code if you don’t have to. Instead, try out a profiler like TheMiner and speed up your app without taking shots in the dark.
Questions? Comments? Spot a bug or typo? Post a comment!
#1 by jpauclair on March 12th, 2012 ·
Wow… awesome!
#2 by Simon on March 12th, 2012 ·
Should that say “So clearly the Math calls are faster than the Vector sorting.” instead of “So clearly the Math calls are slower than the Vector sorting.”
#3 by jackson on March 12th, 2012 ·
Thanks for spotting that. I’ve updated the article.
#4 by Henke37 on March 12th, 2012 ·
Uhm, the tables point at a “compute” row, this is a little confusing given that the method is called buildVector.
#5 by jackson on March 12th, 2012 ·
That was the time to compute the values of the
Vector
, but I can see how “buildVector” would be a clearer name so I’ve updated the article. Thanks for the tip.#6 by Bob on March 12th, 2012 ·
Isn’t Adobe coming out with some super new Profiler soon? Goggles perhaps?
#7 by jackson on March 12th, 2012 ·
Yes, they gave a talk about it at MAX.
#8 by Martin on March 12th, 2012 ·
Thanks for that.
One thing, I don´t get:
I just dived into skyboys sorting-functions, to get a deeper understanding of sorting arrays and vecs.
You use it like this :
But the function is awaititing other params:
Hm.. I don´t get it. You can sort on fields in skyboys fucntion, but you cannot pass your own sorting function.
#9 by jackson on March 12th, 2012 ·
You’re right! It looks like it’s getting transformed to a
uint
of 0, but theVector
is still sorted. Since the sort was trivial in the first place, the net result is the same: a trivially-sortedVector
in much less time. The actual guts of the optimization isn’t so much the point as using a profiler to find and optimize a chunk of your program. Still, it’s misleading in the article so thank you for pointing it out. :)#10 by Martin on March 13th, 2012 ·
Ah ok. Here we got it:
if (!(rest[0] is Number)) rest[0] = 0;
A bit offtopic, but it was making me crazy.
Thanks.
Martin
#11 by skyboy on March 14th, 2012 ·
I intend to add sort functions back — previous implementation was scrapped due to poor (imo) implementation — but I’ll be investigating methods more thoroughly in an attempt to best Array since Array’s native implementation has no overhead for calling Function objects vs. instance methods (gah! cheats.).
#12 by jackson on March 14th, 2012 ·
Cool, that will certainly come in handy.
#13 by skyboy on March 19th, 2012 ·
I have added them back in now, and so passing in the sort function makes a substantial difference (~5x) vs. passing in Array.NUMERIC; though in the version used by this article, String sorting was invoked.
#14 by jpauclair on March 12th, 2012 ·
Hey again,
Looks there are some unexpected behaviour…
The buildVector function create a list that is mostly made of negative infinity and positive infinity.
for some reason, the vec.sort() REAAALy don’t like that.
on my comp, with this “buildVector”, vec.sort() is 10x slower than fastSort.
BUT!
if we replace it with a simple Math.random(), fastSort takes twice the time it was before, and vec.sort become 4 time faster than fastSort!!
The other thing is that fastSort is using void pointer everywhere.
So when using native type like int, uint and Number, there are HUGE allocation of memory (void to Number conversion)
http://jpauclair.net/2012/02/25/epic-memory-track-down/
So If we take a normal vector with valid values…
The result for me on a 50 000 length vector is more something like this:
Vector.sort(Compare) : 100ms + 400Ko allocation
fastSort : 400ms + 5Mo / loop allocation
And Now the nice thing!
I updated the fastSort code to use native int, uint and Number.
Result?
NEW fastSort: 20ms + ZERO allocation
Here is the new Code:
http://jpauclair.net/2012/03/12/fastsort-faster-is-better/
;)
#15 by jackson on March 12th, 2012 ·
That’s a really nice optimization to
fastSort
and if I did this article over again I would definitely use your version. That said, the actual magnitude of the optimization is a bit peripheral to this article. The main point I was trying to make was the importance of using a profiler to find and track down performance issues in order to spend your optimization time wisely. Of course it’s nice that the optimization can be super effective, but the idea is that even a tiny (2%) optimization to theVector
sorting would be more effective than a super (100%) optimization tobuildVector
.#16 by skyboy on March 14th, 2012 ·
I’ve just pushed the related update I had in progress (though another feature; specifying the start index, isn’t correctly implemented everywhere. tomorrow). I’ve consistently seen my method outperform Array::sortOn (by as much as 3x) across various machines with random data*.
http://megaswf.com/serve/2229466
For the sortOn tests:
* With reverse-sorted data, Array::sortOn outperforms fastSort by 50%; for sorted data, fastSort outperforms Array::sortOn by skipping the sorting phase if the presort (for NaN) finds they are all in order (excluding NaN, which is shuffled off to the end of the Array and not counted in the sorted-or-not).
#17 by skyboy on March 15th, 2012 ·
In my testing, I found that accessing an Array from a variable typed as * is 5-20% faster than a variable typed as Array; and a variable typed Object is 10-30% faster than a variable typed Array.
http://megaswf.com/filelinks/2234496
I can’t explain this behavior.
#18 by jackson on March 15th, 2012 ·
Me neither, especially since I’m seeing the opposite on my Core i5 Mac:
Array
: 302*
: 354Object
: 369What environment did you test in?
#19 by skyboy on March 15th, 2012 ·
32 bit x86 XP SP3; FP 11.1.102.55 release standalone on an intel celeron (northwood 0.13 micrometer; 8 KB L1 cache, 128 KB L2 cache)
#20 by AlexG on March 19th, 2012 ·
How do you think which profiler is the best? And specifically, which is better, the Flash Builder profiler or theMiner ?
#21 by jackson on March 19th, 2012 ·
Calling one profiler “the best” is really tough as each has its pros and cons. In the end they all use the
flash.sampler
package so they’re operating on the same set of data. I used TheMiner in the article because it’s really simple to add to a project and there is a free version for non-commercial use as well as its predecessor (FlashPreloadProfiler) for free commercial use (I think), so it’s directly usable by all my readers. This is in contrast to Flash Builder and FDT, which have fine profilers but charge a fee to use their products after the trial version expires.I think it’s a good idea to try out all of the profilers (e.g. Flash Builder, FDT, FlashDevelop, TheMiner) and decide which you like best. But keep in mind that Adobe’s next-gen profiler is on the way…
#22 by You have received a message № 343. Go > https://telegra.ph/Go-to-your-personal-cabinet-08-25?hs=5878e764c20c28ea7ffaf6d6aaf4aa87& on October 10th, 2024 ·
3mf107