## Bitwise Alternatives to Multiply, Divide, and Modulus: Faster?

Many programmers are aware of a special case where you can use a bitwise shift for multiplication or division when you’re multiplying or dividing by a power of two. For example, you can replace `i * 2`

with `i << 1`

and `i / 2`

with `i >> 1`

. A lesser-known trick works for modulus. But are these alternatives actually any faster? Today's article puts them to the test!

Here are today's competitors:

// Multiplication i * 8; // normal i << 3; // bitwise [8 = 2^3, so use 3] // Division i / 16; // normal i >> 4; // bitwise [16 = 2^4, so use 4] // Modulus i % 4; // normal i & 3; // bitwise [4 = 1 << 2, apply ((1 << 2) - 1), so use 3] |

Here's a performance test that performs each of these operations *a lot* of times:

package { import flash.display.*; import flash.utils.*; import flash.text.*; public class FasterDivMod extends Sprite { private var __logger:TextField = new TextField(); private function row(...cols): void { __logger.appendText(cols.join(",")+"\n"); } public function FasterDivMod() { stage.align = StageAlign.TOP_LEFT; stage.scaleMode = StageScaleMode.NO_SCALE; __logger.autoSize = TextFieldAutoSize.LEFT; addChild(__logger); init(); } private function init(): void { var beforeTime:int; var afterTime:int; var i:int; var REPS:int = 100000000; var absInt:int; row("Method", "Time"); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i / 4; } afterTime = getTimer(); row("Div: i / 4", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i >> 2; } afterTime = getTimer(); row("Div: i >> 2", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i * 4; } afterTime = getTimer(); row("Mul: i * 4", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i << 2; } afterTime = getTimer(); row("Mul: i << 2", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i % 4; } afterTime = getTimer(); row("Mod: i % 4", (afterTime-beforeTime), absInt); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { absInt = i & 3; // ((1 << 2) - 1) == 3 } afterTime = getTimer(); row("Mod: i & 3", (afterTime-beforeTime), absInt); } } } |

I ran this test app in the following environment:

- Flex SDK (MXMLC) 4.6.0.23201, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 11.3.300.271
- 2.3 Ghz Intel Core i7
- Mac OS X 10.8.0

And here are the results I got:

Method | Time |
---|---|

Div: i / 4 | 358 |

Div: i >> 2 | 224 |

Mul: i * 4 | 206 |

Mul: i << 2 | 255 |

Mod: i % 4 | 918 |

Mod: i & 3 | 254 |

The above results validate the bitwise versions in two out of three tests: division and modulus. In the multiplication case, the normal version actually performs about 20% faster than the bitwise equivalent. On the other hand, division is nearly twice as fast with the bitwise shift and the bitwise modulus (really just an `&`

) is more than three times faster! So if you're got a lot of divides or mods in your performance-critical code, swap them over to the bitwise versions!

Spot a bug? Know why multiplication is faster than a left bitwise shift and want to fill us all in? Post a comment!

#1 by makc on August 20th, 2012 · | Quote

these numbers differ from version to version too much to have any meaning. for example http://wonderfl.net/c/A4Hc in 11.2.202.229 it was 1779/1708 ms (as is evident in the screenshot), now in 11.3.300.271 it’s 1822/4941 ms (unary minus is twice slower than multiplication :)

#2 by jackson on August 20th, 2012 · | Quote

I get 1833 for

`a = b * -1`

and 1822 for`a = -b`

(which I’d expect) for 11.3.300.271 (same environment as in the article). If you have an 11.2 Player handy, would you mind posting your numbers for the test in the article with 11.2 and 11.3?#3 by Nicolas on August 20th, 2012 · | Quote

I think you may have a typo at the top

// Division

i * 16; // normal

looks like a multiplication.

Also how does the division compare with (* .25) ?

#4 by jackson on August 20th, 2012 · | Quote

Thanks for spotting that. I’ve updated the article.

I would guess that multiplying by a floating-point number would be a good deal slower, but I should do a test to make sure. Perhaps there will be a follow-up article about that. :)

#5 by Hasufel on August 23rd, 2012 · | Quote

Jackson hi :),

it’s good that you fiddle with the bitwise gems.

I am tho not sure you do your implementation correctly, as on my humble end substituting

`<<`

to * is faster and not slower – but I do this on uints, not ints, and do not store the value sumwhere as you do there, but use it right away (I’m aware of the famous debate about uint treated as ints or even numbers thingy…). In all my tests, no matter what version of Flash player and platform used,

`>>`

,

`<<`

, and

`&`

(the fastest of the three) perform faster than the regulars operators.

Refer to http://wonderfl.net/c/bVZM, considering the original from Devon_o – http://wonderfl.net/c/3mIv – and finally leading to something like http://wonderfl.net/c/15wK – unachievable display the ‘standard’ way, thx to mostly bitwise ops and vector bitwise cast calls.

One important thing to mention (seeing you blog ‘n blog on optim) : I often use vector call casting – accelerating the calls inside vectors big, big time (in LUT mode for instance). Use the bitwise

to do the cast in uint – having a _lookupTable[i] where i is not uint is slower than doing _lookupTable[uint(i)] and is slower (depends on cases but mostly) than

.

Btw, in your test, you do a *4 – which is a pretty small operation – maybe try *256 or so…

! – also,getTimer() renders… an uint. not an int.

My two cents anyway :) Keep it up, good blog!

#6 by jackson on August 24th, 2012 · | Quote

Good points on

`int`

vs.`uint`

. I didn’t test`uint`

in the article, but there may be some performance differences there.As for

`getTimer`

, the performance impact of a single conversion should be so minor that it won’t affect the test, which includes a loop of one hundred million iterations. Still, point taken. :)#7 by benjamin guihaire on August 7th, 2013 · | Quote

the modulo of powerOf2 is great, but we can go further:

there is a technique to perform modulus of (powerOf2Minus1) values see http://graphics.stanford.edu/~seander/bithacks.html#ModulusDivisionParallel

and to faster perform integer modulus for all other value in AS3, I wrote this:

http://guihaire.com/code/?p=457

#8 by

Chuon January 13th, 2014 · | QuoteI am shocked the compiler doesn’t perform these optimizations internally. Any good compiler should reduce *any* multiplication or division by a constant into a shift plus add. The algorithm for division can be found here: http://ridiculousfish.com/blog/posts/labor-of-division-episode-i.html. The algorithm for multiplication is similar.