Many programmers are aware of a special case where you can use a bitwise shift for multiplication or division when you’re multiplying or dividing by a power of two. For example, you can replace i * 2 with i << 1 and i / 2 with i >> 1. A lesser-known trick works for modulus. But are these alternatives actually any faster? Today's article puts them to the test!
Here are today's competitors:
// Multiplication i * 8; // normal i << 3; // bitwise [8 = 2^3, so use 3] // Division i / 16; // normal i >> 4; // bitwise [16 = 2^4, so use 4] // Modulus i % 4; // normal i & 3; // bitwise [4 = 1 << 2, apply ((1 << 2) - 1), so use 3] |
Here's a performance test that performs each of these operations a lot of times:
package { import flash.display.*; import flash.utils.*; import flash.text.*; public class FasterDivMod extends Sprite { private var __logger:TextField = new TextField(); private function row(...cols): void { __logger.appendText(cols.join(",")+"\n"); } public function FasterDivMod() { stage.align = StageAlign.TOP_LEFT; stage.scaleMode = StageScaleMode.NO_SCALE; __logger.autoSize = TextFieldAutoSize.LEFT; addChild(__logger); init(); } private function init(): void { var beforeTime:int; var afterTime:int; var i:int; var REPS:int = 100000000; var absInt:int; row("Method", "Time"); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i / 4; } afterTime = getTimer(); row("Div: i / 4", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i >> 2; } afterTime = getTimer(); row("Div: i >> 2", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i * 4; } afterTime = getTimer(); row("Mul: i * 4", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i << 2; } afterTime = getTimer(); row("Mul: i << 2", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i % 4; } afterTime = getTimer(); row("Mod: i % 4", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i & 3; // ((1 << 2) - 1) == 3 } afterTime = getTimer(); row("Mod: i & 3", (afterTime-beforeTime), absInt); } } } |
I ran this test app in the following environment:
- Flex SDK (MXMLC) 4.6.0.23201, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 11.3.300.271
- 2.3 Ghz Intel Core i7
- Mac OS X 10.8.0
And here are the results I got:
| Method | Time |
|---|---|
| Div: i / 4 | 358 |
| Div: i >> 2 | 224 |
| Mul: i * 4 | 206 |
| Mul: i << 2 | 255 |
| Mod: i % 4 | 918 |
| Mod: i & 3 | 254 |

The above results validate the bitwise versions in two out of three tests: division and modulus. In the multiplication case, the normal version actually performs about 20% faster than the bitwise equivalent. On the other hand, division is nearly twice as fast with the bitwise shift and the bitwise modulus (really just an &) is more than three times faster! So if you're got a lot of divides or mods in your performance-critical code, swap them over to the bitwise versions!
Spot a bug? Know why multiplication is faster than a left bitwise shift and want to fill us all in? Post a comment!

#1 by makc on August 20th, 2012 · | Quote
these numbers differ from version to version too much to have any meaning. for example http://wonderfl.net/c/A4Hc in 11.2.202.229 it was 1779/1708 ms (as is evident in the screenshot), now in 11.3.300.271 it’s 1822/4941 ms (unary minus is twice slower than multiplication :)
#2 by jackson on August 20th, 2012 · | Quote
I get 1833 for
a = b * -1and 1822 fora = -b(which I’d expect) for 11.3.300.271 (same environment as in the article). If you have an 11.2 Player handy, would you mind posting your numbers for the test in the article with 11.2 and 11.3?#3 by Nicolas on August 20th, 2012 · | Quote
I think you may have a typo at the top
// Division
i * 16; // normal
looks like a multiplication.
Also how does the division compare with (* .25) ?
#4 by jackson on August 20th, 2012 · | Quote
Thanks for spotting that. I’ve updated the article.
I would guess that multiplying by a floating-point number would be a good deal slower, but I should do a test to make sure. Perhaps there will be a follow-up article about that. :)
#5 by Hasufel on August 23rd, 2012 · | Quote
Jackson hi :),
it’s good that you fiddle with the bitwise gems.
I am tho not sure you do your implementation correctly, as on my humble end substituting
<<to * is faster and not slower – but I do this on uints, not ints, and do not store the value sumwhere as you do there, but use it right away (I’m aware of the famous debate about uint treated as ints or even numbers thingy…). In all my tests, no matter what version of Flash player and platform used,
>>,
<<, and
&(the fastest of the three) perform faster than the regulars operators.
Refer to http://wonderfl.net/c/bVZM, considering the original from Devon_o – http://wonderfl.net/c/3mIv – and finally leading to something like http://wonderfl.net/c/15wK – unachievable display the ‘standard’ way, thx to mostly bitwise ops and vector bitwise cast calls.
One important thing to mention (seeing you blog ‘n blog on optim) : I often use vector call casting – accelerating the calls inside vectors big, big time (in LUT mode for instance). Use the bitwise
to do the cast in uint – having a _lookupTable[i] where i is not uint is slower than doing _lookupTable[uint(i)] and is slower (depends on cases but mostly) than
.
Btw, in your test, you do a *4 – which is a pretty small operation – maybe try *256 or so…
! – also,getTimer() renders… an uint. not an int.
My two cents anyway :) Keep it up, good blog!
#6 by jackson on August 24th, 2012 · | Quote
Good points on
intvs.uint. I didn’t testuintin the article, but there may be some performance differences there.As for
getTimer, the performance impact of a single conversion should be so minor that it won’t affect the test, which includes a loop of one hundred million iterations. Still, point taken. :)