My last article on Callback Strategies highlighted some pretty severe performance differences between my Runnable strategy, as3signals by Robert Penner, and Flash’s native Event system. My simple Runnable technique had an artificial advantage though: it was not a proper library but instead a little bit of code built right into the test app. Today I’m introducing TurboSignals as an AS3 library making use of the Runnable technique.
Basics
TurboSignals is a simple library. It includes signal classes for up to ten parameters (tt>Signal0, Signal1, ..., Signal10) as well as a class for var args (SignalN). These are paired with slot interfaces (Slot0, Slot1, ..., Slot10 and SlotN) that you implement in order to receive the signal's callback. The purpose of an explicit slot type is to avoid using a Function variable, which is very slow. This was shown in my article on Runnables along with how Runnables are a good strategy for avoiding that slowdown. Some helper classes (FunctionSlot0, FunctionSlot1, ..., FunctionSlot10 and FunctionSlotN), however slow, are provided for when you really want to provide an arbitrary function.
Advanced Features
One nicety of TurboSignals is that the dispatch operation is "safe" insomuch as that calls to addSlot, removeSlot, and removeAllSlots will not affect which slots are called. One drawback though is that parameters to dispatch are all untyped (*) and therefore there is no compile-time checking of the parameters and the possibility exists that there will be type errors at runtime. This is true too in the Event/EventDispatcher system as well as as3signals, only the latter explicitly checks for this problem at runtime to give more informative errors.
Usage Example (faster)
This example runs at maximum speed, which is probably not needed for simple button clicks.
import com.jacksondunstan.signals.*; public class Button extends Sprite { public var clicked:Signal0; public function Button() { addEventListener(MouseEvent.MOUSE_DOWN, onMouseDown); } private function onMouseDown(ev:MouseEvent): void { this.clicked.dispatch(); } } public class MainMenu implements Slot0 { public function MainMenu(button:Button) { button.clicked.addSlot(this); } public function onSignal0(): void { trace("button was clicked"); } }
Usage Example (slower)
This example allows you to make your callback private and name it as you wish, courtesy of the FunctionSlot0 adapter class.
import com.jacksondunstan.signals.*; public class Button extends Sprite { public var clicked:Signal0; public function Button() { addEventListener(MouseEvent.MOUSE_DOWN, onMouseDown); } private function onMouseDown(ev:MouseEvent): void { this.clicked.dispatch(); } } public class MainMenu { public function MainMenu(button:Button) { button.clicked.addSlot(new FunctionSlot0(onButtonClicked)); } private function onButtonClicked(): void { trace("button was clicked"); } }
Usage Example (complex)
This example runs at maximum speed with multiple buttons.
import com.jacksondunstan.signals.*; public class Button extends Sprite { public var clicked:Signal1; public function Button() { addEventListener(MouseEvent.MOUSE_DOWN, onMouseDown); } private function onMouseDown(ev:MouseEvent): void { this.clicked.dispatch(this); } } public class MainMenu implements Slot1 { private var __button1:Button; private var __button2:Button; public function MainMenu(button1:Button, button2:Button) { __button1 = button1; __button2 = button2; button1.clicked.addSlot(this); button2.clicked.addSlot(this); } public function onSignal1(target:Button): void { if (target == __button1) { trace("button 1 was clicked"); } else if (target == __button2) { trace("button 2 was clicked"); } } }
Parameter Passing Strategy
As you can see from above, functionality from alternative systems can be emulated in TurboSignals by simply adding more event parameters. The Event/EventDispatcher system's Event.target is attained by simply passing a reference to this as a parameter to dispatch. Likewise, the type field can easily be passed. You may also choose to pass objects like Event that include these too, which may help with speed as multiple arguments slow down the dispatch operation.
Performance Data
The TurboSignals distribution includes a suite of performance tests for TurboSignals as well as as3signals and Event/EventDispatcher. Here are the results for the first version:
TurboSignals - 1 Listener (1000000 dispatches)
| Environment | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.2 Ghz Intel Core 2 Duo, 2GB, Mac OS X 10.6 | 56 | 91 | 98 | 97 | 101 | 90 | 94 | 107 | 104 | 116 | 119 | 1917 |
TurboSignals - 10 Listeners (1000000 dispatches)
| Environment | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.2 Ghz Intel Core 2 Duo, 2GB, Mac OS X 10.6 | 317 | 600 | 570 | 585 | 567 | 606 | 582 | 562 | 580 | 614 | 628 | 7679 |
TurboSignals - 1 Function Listener (1000000 dispatches)
| Environment | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.2 Ghz Intel Core 2 Duo, 2GB, Mac OS X 10.6 | 317 | 600 | 570 | 585 | 567 | 606 | 582 | 562 | 580 | 614 | 628 | 7679 |
TurboSignals - 10 Function Listeners (1000000 dispatches)
| Environment | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.2 Ghz Intel Core 2 Duo, 2GB, Mac OS X 10.6 | 2985 | 3359 | 3449 | 3433 | 3551 | 3564 | 3534 | 3597 | 3624 | 3731 | 3862 | 22184 |
as3signals - 1 Function Listener (1000000 dispatches)
| Environment | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.2 Ghz Intel Core 2 Duo, 2GB, Mac OS X 10.6 | 954 | 1158 | 1326 | 1418 | 1534 | 1696 | 1818 | 1951 | 2055 | 2467 | 2740 | n/a |
as3signals - 10 Function Listeners (1000000 dispatches)
| Environment | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.2 Ghz Intel Core 2 Duo, 2GB, Mac OS X 10.6 | 2568 | 2908 | 3394 | 3656 | 3918 | 4249 | 4535 | 4805 | 5354 | 6288 | 6912 | n/a |
Event/EventDispatcher - 1 Function Listener (1000000 dispatches)
| Environment | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.2 Ghz Intel Core 2 Duo, 2GB, Mac OS X 10.6 | n/a | 4886 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Event/EventDispatcher - 10 Function Listeners (1000000 dispatches)
| Environment | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.2 Ghz Intel Core 2 Duo, 2GB, Mac OS X 10.6 | n/a | 33755 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Performance Graphs




Performance Analysis
The latest version of as3signals (as of today, 2/15/2010) goes a long way to improve performance of the Event/EventDispatcher system, especially on Mac OS X. TurboSignals goes a lot further though and nearly matches the speed of its inspiration: the simple list of Runnables. TurboSignals manages to dispatch events about 17 times faster than as3signals when implementing the slot directly and about 3 times faster than as3signals when using a Function variable to allow for a the callback to be private, named, or anonymous. That said, as3signals is itself 4-13x faster than the Event/EventDispatcher system. So if you are planning on dispatching frequently or to many listeners, you should definitely take a look at TurboSignals.
#1 by Piergiorgio Niero on February 16th, 2010 · | Quote
What about adding\removing listeners at runtime?
is there any possibility to “(un)subscribe” (from)to a signal at runtime?
I think the interface implementation way only would become a lot limitating for medium\large projects.
is there any way to workaround that?
#2 by jackson on February 16th, 2010 · | Quote
Adding/subscribing is done by addSlot and removing/unsubscribing is done either by removeSlot or removeAllSlots.
You can work around implementing your own slot only at the cost of a speed hit. The primary way of doing this is to use FunctionSlotX classes, where X is 0-10 or N. See the example titled “Usage Example (Slower)” and the corresponding performance data and graphs related to “function listeners”. The idea is that TurboSignals gives you the option to implement the slot interface yourself for maximum speed where you really need it (see the “Why Would I Need Speed?” section of the TurboSignals project page.
#3 by whitered on February 16th, 2010 · | Quote
The next step to improve performance is to use Vector instead of Array. And here is my extremelly simple implementation of signals: http://github.com/whitered/Kote/blob/master/src/ru/whitered/kote/Signal.as
#4 by jackson on February 16th, 2010 · | Quote
I originally used Vector instead of Array in my last article: Callback Strategies. I switched to Array to make the library usable in Flash 9 rather than restricting it to Flash 10. The performance difference was negligible anyhow.
I did quite a lot of looking around for other implementations of signals and slots whilst working on my own, but didn’t find yours. Thanks for the link! It is indeed a very simple implementation, but sometimes that’s all you want. One performance-related aspect that has helped Robert Penner (and myself, I suppose) is to only copy the callbacks list when it actually needs to be copied. That is, set a flag indicating if you’re dispatching and if it is true in your addCallback or removeCallback then you can copy then. Robert Penner claims that doubled his performance!
#5 by whitered on February 16th, 2010 · | Quote
wow, thanks for useful idea. I’ll surely apply it.
#6 by whitered on February 17th, 2010 · | Quote
trying to implement this optimization on my project, I’ve found a bug that exists in TurboSignals. If we redispatch the signal that is already dispatched, its __slotsNeedCopying flag will be reset, so its slots can be changed on the run. See the demo: http://pastie.org/828806
#7 by jackson on February 17th, 2010 · | Quote
Nice find! I will look into this shortly and reply here with any resolution I come up with. If you have a way of resolving it, I do welcome patches.
Thank you for the detailed example!
#8 by whitered on February 17th, 2010 · | Quote
just clone your slots if __slotsNeedCopying in dispatch() method as you do in other ones
#9 by jackson on February 17th, 2010 · | Quote
That would work, but I’d prefer not to copy the slots unless it’s really necessary or it helps performance. I implemented the fix by adding a __numDispatchesInProgress integer that I use to determine if I can safely set __slotsNeedCopying to false after the dispatch is done. I also added a unit test based on your demo code above. Version 1.0.1 is now up on the project page and Google Code and it should run at virtually the same speed as version 1.0.
#10 by whitered on February 18th, 2010 · | Quote
This implementation has a hidden danger: signal become broken when a slot throw an error. In that case signal’s __numDispatchesInProgress counter will always be greater than in should be so the signal will often clone its slots when it isn’t necessary.
I think redispatches happens rather rarely so I’ve decided to keep things simple and to clone my callbacks in dispatch method on redispatches.
#11 by jackson on February 18th, 2010 · | Quote
Yes, it would lead to lots of slot list copying later on and it would also lead to the rest of the slots not being called. The former is a performance problem and the latter is a correctness problem. Unfortunately, the only way I know of to fix the correctness problem is to wrap the slot call in a try/catch block. As I’ve shown before, this would introduce a big performance problem itself.
Personally I think that you should expect uncaught exceptions to do bad things to your program. TurboSignals is but a drop in the ocean of code that isn’t handling thrown errors. Luckily with TurboSignals you get off easy: the rest of your slots don’t get called and it performs slower from then on, but at least it never crashes.
I’d welcome anyone to reply here with comments on how TurboSignals should handle uncaught errors thrown by slots.
#12 by whitered on February 18th, 2010 · | Quote
Yes, this is true that you should expect uncaught exceptions to do bad things. But I think that you should not expect bad things from exception that was caught and handled.
Code like that can delude a developer:
On the other hand this implementation will result in slower dispatching but do it clearly and in very rare situations (recursive dispatchings) http://github.com/whitered/Kote/blob/master/src/ru/whitered/kote/Signal.as
Not but what to wrap the slot call in a try/catch block is the worst idea as I think.
#13 by jackson on February 18th, 2010 · | Quote
I like your logic a lot and I can totally see this point of view. It seems that in this situation, as you point out, we must choose where to optimize. A case like you show above could very well happen and the current penalty in TurboSignals would be slower addSlot, removeSlot, and removeAllSlots calls from then on. This is indeed a steep penalty, so avoiding it as you have optimizes to remove this penalty. But in the process of optimizing that way you have, as you again have pointed out, made recursive dispatching slower.
So the question is this: should we optimize for recursive dispatching (TurboSignals’ current way) or for thrown errors (the way the signal class you linked works)? To me the matter comes down to the legitimacy of the two uses cases. In my view, errors are bad but recursive dispatching is a valid feature of TurboSignals. I prefer not to punish the valid uses of TurboSignals for the sake of the invalid uses. You can, of course, do whatever you’d like to in your own signal class. :)
#14 by Robert Penner on February 16th, 2010 · | Quote
This is great! I love your creative thinking and passion for performance testing. The more options, the better.
#15 by Robert Penner on February 16th, 2010 · | Quote
I noticed you’re using AsUnit 4 alpha for your unit tests. I looked through your test code and it looks really good so far (there’s a lot of it). I’d love to hear your thoughts on unit testing and the role it played in TurboSignals development.
#16 by jackson on February 16th, 2010 · | Quote
TurboSignals seemed to be a good place to apply unit testing since its functionality is what I call “pure data”. That is, there is no graphical or audio output that can only be properly judged by the human eye and ear, no interaction requiring precision timing (especially with Flash Player!), and so forth. Further, I didn’t have any (public) application to start using TurboSignals in, so I needed some test application. I started off with a homebrew set of tests like I’ve done in previous articles, but it started to get messy, overly verbose, and difficult to scale after about a dozen tests. All of this pointed toward using some unit testing framework.
I went with AsUnit as it’s clearly one of the most popular choices and the little I’ve seen of it before (mostly from you!) looked pretty good. I couldn’t find a SWC build of it though and didn’t feel like setting up Ruby just to do the build, so I actually took your build of version 4’s alpha out of as3signals. It seems to run fast enough, very consistently, and was easy to set up.
All in all I’d say this was a good place to use unit testing and it’s been quite successful. It certainly isn’t a project built with TDD as I wrote all of the code before the tests, but the validation part of it is very nice. :)
#17 by Joa Ebert on February 16th, 2010 · | Quote
Hey, the TurboSignals look very promising and I like the approach of having classes like Signal0. Scala does this as well very successful. It is a shame you can not apply more syntactit sugar.
Did you try what happens when you are using linked lists and object pooling approaches as well? I think your life will be much easier if your listeners are nodes of a linked list. Traversal should be faster and removing listeners is not a big deal as well. It can work without a copy. Most of the time you are removing a listener in its callback so that can be even done in O(1).
#18 by jackson on February 16th, 2010 · | Quote
I haven’t tried linked lists yet, but I’m willing to give it a go. Earlier on in development I tried making the iterator a private field and then checking the index to remove from against it to see if it was at or before the current callback. In such a case there’s no need to copy the list, which would be a nice optimization as far as speed, garbage creation, and (instantaneous) memory footprint. This turned out to be disastrous to performance though as accessing a field is quite a bit slower than accessing a local variable.
So it seems as though I’d still need to copy the list on addSlot/removeSlot/removeAllSlots if I switched from Array to a linked list. In my my implementation from December it looked like both copying and traversing were slower in linked lists. Do you have some suggestions on how to make the linked list implementation fast enough to make it worthwhile here? I’d much appreciate any suggestions you could give.
#19 by matthew on February 16th, 2010 · | Quote
It seems to me that the speed benefits aren’t worth the cost in readability and required code.
For one, this system requires you to check the target in the callback whenever you have multiple dispatchers using the same signal number. That could get out of hand pretty fast, as your example with two buttons shows. Chances are that every listener for more than one signal would need to act as a switch, delegating to other functions based on the target.
It also couples the implementation of the listener and dispatcher in a strange, non-semantic way, requiring your listener to know an arbitrary piece of metadata (the slot number) in addition to the event type (“clicked”). A tiny change in the dispatcher’s implementation (the slot number of a particular event) would require huge, cascading changes across an entire codebase.
If you’re really in need of the performance benefits gained from using the Observer pattern, maybe it would be better to use tools that would facilitate that (templates, snippets, macros, etc.)
#20 by jackson on February 16th, 2010 · | Quote
Check out the example titled “Usage Example (Slower)” using FunctionSlot. That shows how to use TurboSignals like you’d use Event/EventDispatcher or as3signals with an arbitrary Function callback. Then check out the performance data to see how TurboSignals is still faster than Event/EventDispatcher or as3signals, even when using FunctionSlots. If you only used Signal1 with FunctionSlot1 to dispatch Event objects (or similar), you’d still realize a speed gain without any of the drawbacks you mention.
But the important part about TurboSignals is that it gives you a choice to go even further and speed up your callbacks by implementing the slot directly. Yes, you lose some code cleanliness in the process, but sometimes it’s worth it. By providing FunctionSlot classes in TurboSignals in addition to Slot interfaces, I leave that choice up to the individual programmer on his or her own project.
#21 by Alec McEachran on February 16th, 2010 · | Quote
Hi Jackson,
Congratulations, this is really interesting work. While I don’t imagine I’ll use it in many places in projects I work on since the syntax is somewhat confusing for team environments, I am definitely going to give this a good hard look for game loops, and other performance critical code.
Now I understand why you were holding back running as3signals vs Event analysis earlier!
This is a great addition to the as3 canon. Thanks.
#22 by jackson on February 16th, 2010 · | Quote
I’d like to hear your thoughts on how to improve the syntax. Personally, I think the “Usage Example (Slower)” code shows how to use it quite similarly to Event/EventDispatcher and as3signals and the performance data shows that even there you get a nice speedup.
#23 by Winx Alex on March 24th, 2010 · | Quote
For all this we can conclude that events run faster if they are called on object(not true pointer or thru EventDispatcher) but object.function(args).
So you need to make few changes in original Robert Panner code. Let took Signal.as and
- “add” function should took add(object,”nameOftheFunction”)
-listeners array would take now object-”nameOftheFunction” pair instead of pointer of the function
-”dispatch” function
in instead of listener.apply(null, valueObjects);
should make object["nameOftheFunction"](valueObjects) or something like that maybe with object["nameOftheFunction"].apply(object,valueObjects)…..
#24 by jackson on March 24th, 2010 · | Quote
It would be fantastic if this would result in a dramatic speedup similar to the speedup you get in TurboSignals. This would relieve the programmer of the need to create Slot derivatives and allow them to name callback functions however they wish, both downsides of using TurboSignals. However, I believe that there are two reasons why this would result in code that runs at the same speed or possibly even slower than as3signals does right now. Firstly, the dynamic access (object["nameOfTheFunction"]) is quite slow. Secondly, what you get back is indeed just a Function variable, so calling it (.apply(object,valueObjects)) would be as slow as the original (listener.apply(null, valueObjects).
What makes TurboSignals fast is the strong guarantee that the compiler and JIT are afforded by using a typed object: the Slot class. Removing the typing causes the slowdowns you see in both as3signals and EventDispatcher, although the latter has many more problems than just weak typing.
#25 by Winx Alex on March 25th, 2010 · | Quote
Could you make test.
object["nameOfTheFunction"])
vs
object.nameOfTheFunction().
so we see what is quite slow. Everything is compromise. Little bit slower but cleaner. Except in atom accelerator :)
#26 by jackson on March 26th, 2010 · | Quote
Sure. Given these simple slots:
I wrote this simple test:
And got these results:
So it’s about 40x slower than a direct call. That’s a bit much of a compromise for TurboSignals. :)