JacksonDunstan.com | Bitwise Alternatives to Multiply, Divide, and Modulus: Faster?

Bitwise Alternatives to Multiply, Divide, and Modulus: Faster?

August 20, 2012 Tags: bitwise, mod, performance

Many programmers are aware of a special case where you can use a bitwise shift for multiplication or division when you’re multiplying or dividing by a power of two. For example, you can replace i * 2 with i << 1 and i / 2 with i >> 1. A lesser-known trick works for modulus. But are these alternatives actually any faster? Today's article puts them to the test!

Here are today's competitors:

// Multiplication
i * 8; // normal
i << 3; // bitwise [8 = 2^3, so use 3]
 
// Division
i / 16; // normal
i >> 4; // bitwise [16 = 2^4, so use 4]
 
// Modulus
i % 4; // normal
i & 3; // bitwise [4 = 1 << 2, apply ((1 << 2) - 1), so use 3]

Here's a performance test that performs each of these operations a lot of times:

package
{
	import flash.display.*;
	import flash.utils.*;
	import flash.text.*;
 
	public class FasterDivMod extends Sprite
	{
		private var __logger:TextField = new TextField();
		private function row(...cols): void
		{
			__logger.appendText(cols.join(",")+"\n");
		}
 
		public function FasterDivMod()
		{
			stage.align = StageAlign.TOP_LEFT;
			stage.scaleMode = StageScaleMode.NO_SCALE;
 
			__logger.autoSize = TextFieldAutoSize.LEFT;
			addChild(__logger);
 
			init();
		}
 
		private function init(): void
		{
			var beforeTime:int;
			var afterTime:int;
			var i:int;
			var REPS:int = 100000000;
			var absInt:int;
 
			row("Method", "Time");
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				absInt = i / 4;
			}
			afterTime = getTimer();
			row("Div: i / 4", (afterTime-beforeTime), absInt);
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				absInt = i >> 2;
			}
			afterTime = getTimer();
			row("Div: i >> 2", (afterTime-beforeTime), absInt);
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				absInt = i * 4;
			}
			afterTime = getTimer();
			row("Mul: i * 4", (afterTime-beforeTime), absInt);
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				absInt = i << 2;
			}
			afterTime = getTimer();
			row("Mul: i << 2", (afterTime-beforeTime), absInt);
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				absInt = i % 4;
			}
			afterTime = getTimer();
			row("Mod: i % 4", (afterTime-beforeTime), absInt);
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				absInt = i & 3; // ((1 << 2) - 1) == 3
			}
			afterTime = getTimer();
			row("Mod: i & 3", (afterTime-beforeTime), absInt);
		}
	}
}

I ran this test app in the following environment:

Flex SDK (MXMLC) 4.6.0.23201, compiling in release mode (no debugging or verbose stack traces)
Release version of Flash Player 11.3.300.271
2.3 Ghz Intel Core i7
Mac OS X 10.8.0

And here are the results I got:

Method	Time
Div: i / 4	358
Div: i >> 2	224
Mul: i * 4	206
Mul: i << 2	255
Mod: i % 4	918
Mod: i & 3	254

Performance Chart

The above results validate the bitwise versions in two out of three tests: division and modulus. In the multiplication case, the normal version actually performs about 20% faster than the bitwise equivalent. On the other hand, division is nearly twice as fast with the bitwise shift and the bitwise modulus (really just an &) is more than three times faster! So if you're got a lot of divides or mods in your performance-critical code, swap them over to the bitwise versions!

Spot a bug? Know why multiplication is faster than a left bitwise shift and want to fill us all in? Post a comment!

Comments

#1 by makc on August 20th, 2012 · Reply

these numbers differ from version to version too much to have any meaning. for example http://wonderfl.net/c/A4Hc in 11.2.202.229 it was 1779/1708 ms (as is evident in the screenshot), now in 11.3.300.271 it’s 1822/4941 ms (unary minus is twice slower than multiplication :)
- #2 by jackson on August 20th, 2012 · Reply
  
  I get 1833 for a = b * -1 and 1822 for a = -b (which I’d expect) for 11.3.300.271 (same environment as in the article). If you have an 11.2 Player handy, would you mind posting your numbers for the test in the article with 11.2 and 11.3?
#3 by Nicolas on August 20th, 2012 · Reply

I think you may have a typo at the top

// Division
i * 16; // normal

looks like a multiplication.

Also how does the division compare with (* .25) ?
- #4 by jackson on August 20th, 2012 · Reply
  
  Thanks for spotting that. I’ve updated the article.
  
  I would guess that multiplying by a floating-point number would be a good deal slower, but I should do a test to make sure. Perhaps there will be a follow-up article about that. :)
#5 by Hasufel on August 23rd, 2012 · Reply

Jackson hi :),
it’s good that you fiddle with the bitwise gems.
I am tho not sure you do your implementation correctly, as on my humble end substituting

<<

to * is faster and not slower – but I do this on uints, not ints, and do not store the value sumwhere as you do there, but use it right away (I’m aware of the famous debate about uint treated as ints or even numbers thingy…). In all my tests, no matter what version of Flash player and platform used,

>>

,

<<

, and

&

(the fastest of the three) perform faster than the regulars operators.
Refer to http://wonderfl.net/c/bVZM, considering the original from Devon_o – http://wonderfl.net/c/3mIv – and finally leading to something like http://wonderfl.net/c/15wK – unachievable display the ‘standard’ way, thx to mostly bitwise ops and vector bitwise cast calls.

One important thing to mention (seeing you blog ‘n blog on optim) : I often use vector call casting – accelerating the calls inside vectors big, big time (in LUT mode for instance). Use the bitwise

>>0

to do the cast in uint – having a _lookupTable[i] where i is not uint is slower than doing _lookupTable[uint(i)] and is slower (depends on cases but mostly) than

_lookupTable[i>>0]

.

Btw, in your test, you do a *4 – which is a pretty small operation – maybe try *256 or so…

<<9

! – also,getTimer() renders… an uint. not an int.

My two cents anyway :) Keep it up, good blog!
- #6 by jackson on August 24th, 2012 · Reply
  
  Good points on int vs. uint. I didn’t test uint in the article, but there may be some performance differences there.
  
  As for getTimer, the performance impact of a single conversion should be so minor that it won’t affect the test, which includes a loop of one hundred million iterations. Still, point taken. :)
#7 by benjamin guihaire on August 7th, 2013 · Reply

the modulo of powerOf2 is great, but we can go further:

there is a technique to perform modulus of (powerOf2Minus1) values see http://graphics.stanford.edu/~seander/bithacks.html#ModulusDivisionParallel

and to faster perform integer modulus for all other value in AS3, I wrote this:
http://guihaire.com/code/?p=457
#8 by Chu on January 13th, 2014 · Reply

I am shocked the compiler doesn’t perform these optimizations internally. Any good compiler should reduce *any* multiplication or division by a constant into a shift plus add. The algorithm for division can be found here: http://ridiculousfish.com/blog/posts/labor-of-division-episode-i.html. The algorithm for multiplication is similar.
#9 by Fider on April 11th, 2017 · Reply

Seems that bitwise operations are useless, even in embedded world.
I run my test multiple times and advantage of bitwise modulo over classic modulo is not so huge as in your test result.
- #10 by jackson on April 11th, 2017 · Reply
  
  Thanks for reporting your updated test results. I guess a lot has changed in the five years since I wrote the article.
  - #11 by Robert on February 22nd, 2018 · Reply
    
    There’s a considerable difference between modulus and the special case of mod-ding by a power of 2; I believe your bitwise operation only works when the divisor is a power of two. (Isn’t that only 10 numbers out the first 1000 divisors in the whole numbers? (including mod 1)
    - #12 by jackson on February 22nd, 2018 · Reply
      
      Yep, that’s why the article started out by mentioning that these are special-case tricks:
      
      Many programmers are aware of a special case where you can use a bitwise shift for multiplication or division when youâ€™re multiplying or dividing by a power of two.
#13 by Johnny Boy on April 8th, 2018 · Reply

For those who are more curious, this article examines the fastest way to get the remainder between using modulus or an alternative with addition:
http://cc.davelozinski.com/c-sharp/use-the-modulus-operator-or-alternative

Itâ€™s obviously more for speed and micro-optimization junkies, but you’ll see several different faster ways than the straight modulus operation.
#14 by Andrew Martin on April 9th, 2020 · Reply

It is clear that the writer is a data geek.

I enjoy the way he writes and organizes facts.
It is always such a joy to read articles created by actual professionals.
I’m fed up with all that no-name, ghostwritten articles.
That’s why it was so good to look at a persuasive piece.
I visit the author has ground knowledge it the subject as well as some practical experience.
Such sort of information is more precious than copypasted blog articles thoughts.
#15 by Richard Coleman on April 14th, 2020 · Reply

The content has been instead grabbing and interesting enough to receive all
possible nuances to recall. I do enjoy studying the content
and the composing manner of the author, etc..