Faster Math.sin and Math.cos Through Lookup Tables
Trigonometry functions like Math.sin
and Math.cos
are widely used in AS3 games and other apps that make intensive use of graphics, sound, or physics. Today’s article shows you a way to trade a little bit of memory for much faster trigonometry computations, which can seriously speed up any app that makes heavy use of Math.sin
, Math.cos
, and the like. Keep reading for the class that’ll help you do this, a demo app, performance testing, analysis, and more. (UPDATE: optimized thanks to skyboy)
Lookup tables are a common technique in programming. They involve doing expensive computations once, storing them in a table, and accessing them later. In AS3, the idea is that accessing the values stored in a Vector
table is faster than computing the value and therefore you get a speed boost.
For such a widely-used technique, it doesn’t seem to be getting much usage in the AS3 world. I found a test on actionscript.org that uses lookup tables for Math.sin
and shows a performance boost, but the lookup table can only be used with radians that happen to be whole numbers (integers) and is therefore not very useful and an unfair comparison to the robust Math.sin
function. There is also an article on polygonal.de discussing the subject, but it doesn’t provide full source, uses the relatively-slow Array
because it was written before Vector
debuted in Flash Player 10, and doesn’t say specifically if lookup tables are used to achieve the alleged 14x performance boost. It does, however, have a cool demo app that inspired my demo app below.
Due to this lack of existing lookup table work, I decided to make my own lookup table for trig functions: any function like Math.sin
and Math.cos
that takes parameters in the range [0,2pi). Here is the basic usage:
// Make a lookup table for Math.sin with 2 decimal places of precision var sineLUT:TrigLUT = new TrigLUT(2, Math.sin); // Query with arbitrary radians, just like the original Math.sin result = sineLUT.val(-45.123);
If you don’t need to pass an arbitrary radians value, there are even faster functions that work as long as you can guarantee limits:
// Radians >= 0 result = sineLUT.valPositive(45.123); // Radians on (-2pi,2pi) result = sineLUT.valNormalized(-4.123); // Radians on [0,2pi) result = sineLUT.valNormalizedPositive(4.123);
The lookup table and precision variable are public, so you can avoid the slow function call by inlining them. The functions are small, so it won’t bloat up your code or be too messy:
// Inline version of TrigLUT.valNormalizedPositive result = sineLUT.table[int(4.123*sineLUT.pow)];
For the ultimate speed boost when you’re doing lots of lookups in one function, cache the lookup table and precision variable as local variables:
var sineTable:Vector.<Number> = sineLUT.table; var sinePow:Number = sineLUT.pow; result = sineTable[int(4.123*sinePow)];
You can even make a lookup table for arbitrary functions. Here’s one for the square of Math.sin
with 2 decimal places of precision:
var sineSquaredLUT:TrigLUT = new TrigLUT( 2, function(r:Number):Number{r = Math.sin(r); return r*r;} );
Use this lookup table the same exact way you used the Math.sin
lookup table:
sineSquaredLUT.val(-45.123);
Now that you know how to use it, here is the source code for TrigLUT
(Trigonometry LookUp Table):
/* The MIT License Copyright (c) 2011 Jackson Dunstan Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ package { /** * A lookup table for trig values (e.g. sine, cosine) to improve on the * performance of the static functions found in the Math class * @author Jackson Dunstan */ public class TrigLUT { /** 2 * PI, the number of radians in a circle*/ public static const TWO_PI:Number = 2.0 * Math.PI; /** The static TWO_PI cached as a non-static field*/ private const TWO_PI:Number = TrigLUT.TWO_PI; /** Table of trig function values*/ public var table:Vector.<Number>; /** 10^decimals of precision*/ public var pow:Number; /** * Make the look up table * @param numDigits Number of digits places of precision * @param mathFunc Math function to call to generate stored values. * Must be valid on [0,2pi). * @throws Error If mathFunc is null or invalid on [0,2pi) */ public function TrigLUT(numDigits:uint, mathFunc:Function) { var pow:Number = this.pow = Math.pow(10, numDigits); var round:Number = 1.0 / pow; var len:uint = 1 + this.TWO_PI*pow; var table:Vector.<Number> = this.table = new Vector.<Number>(len); var theta:Number = 0; for (var i:uint = 0; i < len; ++i) { table[i] = mathFunc(theta); theta += round; } } /** * Look up the value of the given number of radians * @param radians Radians to look up the value of * @return The value of the given number of radians */ public function val(radians:Number): Number { return radians >= 0 ? this.table[int((radians%this.TWO_PI)*this.pow)] : this.table[int((TWO_PI+radians%this.TWO_PI)*this.pow)]; } /** * Look up the value of the given number of radians * @param radians Radians to look up the value of. Must be positive. * @return The sine of the given number of radians * @throws RangeError If radians is not positive */ public function valPositive(radians:Number): Number { return this.table[int((radians%this.TWO_PI)*this.pow)]; } /** * Look up the value of the given number of radians * @param radians Radians to look up the value of. Must be on (-2pi,2pi). * @return The value of the given number of radians * @throws RangeError If radians is not on (-2pi,2pi) */ public function valNormalized(radians:Number): Number { return radians >= 0 ? this.table[int(radians*this.pow)] : this.table[int((this.TWO_PI+radians)*this.pow)]; } /** * Look up the value of the given number of radians * @param radians Radians to look up the value of. Must be on [0,2pi). * @return The value of the given number of radians * @throws RangeError If radians is not on [0,2pi) */ public function valNormalizedPositive(radians:Number): Number { return this.table[int(radians*this.pow)]; } } }
Since the whole point of this exercise is to get a performance boost, let’s do some performance testing! Here is an app that compares the performance of Math.sin
and Math.cos
to TrigLUT
:
/* The MIT License Copyright (c) 2011 Jackson Dunstan Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ package { import flash.display.*; import flash.utils.*; import flash.text.*; /** * Performance test app for TrigLUT, the trig function lookup table * @author Jackson Dunstan */ public class TrigLUTTest extends Sprite { private var __logger:TextField = new TextField(); private function log(msg:*): void { __logger.appendText(msg + "\n"); } public function TrigLUTTest() { __logger.autoSize = TextFieldAutoSize.LEFT; addChild(__logger); const DIGITS:uint = 2; const STEP:Number = 0.05; var beforeTime:int; var afterTime:int; var lut:TrigLUT = new TrigLUT(DIGITS, Math.sin); var lutTable:Vector.<Number> = lut.table; var lutPow:Number = lut.pow; const REPS:int = 100000; const NUM:int = int(TrigLUT.TWO_PI / STEP); var theta:Number; var i:int; var j:int; var result:Number; const TWO_PI:Number = TrigLUT.TWO_PI; log("Function,Time"); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = Math.sin(theta); theta += STEP; } } afterTime = getTimer(); log("Math.sin," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = Math.cos(theta); theta += STEP; } } afterTime = getTimer(); log("Math.cos," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lut.val(theta); theta += STEP; } } afterTime = getTimer(); log("TrigLUT.val," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lut.valPositive(theta); theta += STEP; } } afterTime = getTimer(); log("TrigLUT.valPositive," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lut.valNormalized(theta); theta += STEP; } } afterTime = getTimer(); log("TrigLUT.valNormalized," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lut.valNormalizedPositive(theta); theta += STEP; } } afterTime = getTimer(); log("TrigLUT.valNormalizedPositive," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = theta >= 0 ? lutTable[int((theta%TWO_PI)*lutPow)] : lutTable[int((TWO_PI+theta%TWO_PI)*lutPow)]; theta += STEP; } } afterTime = getTimer(); log("inline val," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lutTable[int((theta%TWO_PI)*lutPow)]; theta += STEP; } } afterTime = getTimer(); log("inline valPositive," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = theta >= 0 ? lutTable[int(theta*lutPow)] : lutTable[int((TWO_PI+theta)*lutPow)]; theta += STEP; } } afterTime = getTimer(); log("inline valNormalized," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lutTable[int(theta*lutPow)]; theta += STEP; } } afterTime = getTimer(); log("inline valNormalizedPositive," + (afterTime-beforeTime)); } } }
Here is the test environment I ran this app on:
- Flex SDK (MXMLC) 4.1.0.16076, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 10.3.181.14
- 2.4 Ghz Intel Core i5
- Mac OS X 10.6.7
And here are the results I got:
Function | Time |
---|---|
Math.sin | 509 |
Math.cos | 509 |
TrigLUT.val | 464 |
TrigLUT.valPositive | 447 |
TrigLUT.valNormalized | 253 |
TrigLUT.valNormalizedPositive | 214 |
inline val | 336 |
inline valPositive | 329 |
inline valNormalized | 145 |
inline valNormalizedPositive | 133 |
Here are the same results in graph form:
These results clearly show the performance benefits of TrigLUT
:
TrigLUT
is always faster thanMath.sin
andMath.cos
- Reducing the acceptable radian values helps performance a lot, especially guaranteeing positive values
- Inlining lookups results in maximum performance
- In the best case,
TrigLUT
beats theMath
class functions by almost 5x!
With such compelling results, we should make sure that they are still accurate. and not wildly different than the results we would normally get from the Math
class functions. TrigLUT
allows you to set the number of digits of precision you’d like. The more digits you want, the bigger and more accurate the lookup table gets. It’s important to remember though that increasing precision does not slow down TrigLUT
. Usually, I find that two digits of precision (a lookup table with 628 Numbers
= 4.9KB of memory) is enough for most uses, but I leave the precision up to each user of TrigLUT
to determine their own needs. Here is the source code for a small demo app that simply draws a sine or cosine curve using the Math
class functions and the approximation via TrigLUT
over the top. A live demo follows.
/* The MIT License Copyright (c) 2011 Jackson Dunstan Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ package { import flash.display.*; import flash.events.*; import flash.text.*; import flash.geom.*; /** * Demonstration app for TrigLUT, the trig function lookup table * @author Jackson Dunstan */ [SWF(width=640,height=480,backgroundColor=0xEEEAD9)] public class TrigLUTDemo extends Sprite { private static const TEXT_FORMAT:TextFormat = new TextFormat("_sans", 11); private var bmd:BitmapData; private var bmdRect:Rectangle; private var mathFuncName:String; private var digits:uint; private var mathFuncFrame:Sprite; private var digitsFrame:Sprite; public function TrigLUTDemo() { stage.scaleMode = StageScaleMode.NO_SCALE; stage.align = StageAlign.TOP_LEFT; this.bmd = new BitmapData(stage.stageWidth, stage.stageHeight, false, 0xffffffff); addChild(new Bitmap(bmd)); this.bmdRect = new Rectangle(0, 0, this.bmd.width, this.bmd.height); this.mathFuncFrame = addChoices( "Math function: ", onMathFuncButton, 0, ["sin","cos"] ); this.digitsFrame = addChoices( "Digits of precision: ", onPrecisionButton, this.mathFuncFrame.height, ["0","1", "2", "3", "4", "5"] ); this.digits = 2; this.mathFuncName = "sin"; redraw(); var about:TextField = new TextField(); about.defaultTextFormat = TEXT_FORMAT; about.y = this.digitsFrame.y + this.digitsFrame.height; about.htmlText = "TrigLUT Demo by <font color=\"#0071BB\"><a href=\"" + "http://jacksondunstan.com/articles/1190" + "\">JacksonDunstan.com</a></font>\n" + "Black line: trig function\n" + "Red line: lookup table"; about.autoSize = TextFieldAutoSize.LEFT; about.selectable = false; addChild(about); } private function addChoices( promptStr:String, callback:Function, y:Number, choices:Array ): Sprite { var prompt:TextField = new TextField(); prompt.defaultTextFormat = TEXT_FORMAT; prompt.text = promptStr; prompt.autoSize = TextFieldAutoSize.LEFT; prompt.selectable = false; addChild(prompt); var x:Number = prompt.width; const PAD:Number = 3; for each (var choiceName:String in choices) { var tf:TextField = new TextField(); tf.defaultTextFormat = TEXT_FORMAT; tf.name = "label"; tf.text = choiceName; tf.autoSize = TextFieldAutoSize.LEFT; tf.selectable = false; tf.x = tf.y = PAD; var button:Sprite = new Sprite(); button.name = choiceName; button.graphics.beginFill(0xE6E2D1); button.graphics.drawRect(0, 0, tf.width+PAD*2, tf.height+PAD*2); button.graphics.endFill(); button.addChild(tf); button.addEventListener( MouseEvent.CLICK, function(ev:MouseEvent): void { var button:Sprite = ev.currentTarget as Sprite; var label:TextField = button.getChildByName("label") as TextField; callback(label.text); } ); button.x = x; button.y = y; addChild(button); x += button.width + PAD; } prompt.y = y + (button.height - prompt.height) / 2; var frame:Sprite = new Sprite(); frame.graphics.lineStyle(1); frame.graphics.drawRect(0, 0, tf.width+PAD*2, tf.height+PAD*2); addChild(frame); return frame; } private function onMathFuncButton(label:String): void { this.mathFuncName = label; redraw(); } private function onPrecisionButton(label:String): void { this.digits = int(label); redraw(); } private function redraw(): void { var mathFuncButton:Sprite = getChildByName(this.mathFuncName) as Sprite; this.mathFuncFrame.x = mathFuncButton.x; this.mathFuncFrame.y = mathFuncButton.y; var digitsButton:Sprite = getChildByName(String(this.digits)) as Sprite; this.digitsFrame.x = digitsButton.x; this.digitsFrame.y = digitsButton.y; var mathFunc:Function = Math[this.mathFuncName]; var lut:TrigLUT = new TrigLUT(this.digits, mathFunc); var bmd:BitmapData = this.bmd; bmd.lock(); var w:int = bmd.width; var h:int = bmd.height; bmd.fillRect(this.bmdRect, 0xEEEAD9); var halfH:int = h / 2; var stepTheta:Number = TrigLUT.TWO_PI / w; var theta:Number = 0; for (var x:int = 0; x < w; ++x) { for (var y:int = 0; y < h; ++y) { bmd.setPixel( x, halfH + mathFunc(theta)*halfH, 0xff000000 ); bmd.setPixel( x, halfH + lut.valNormalizedPositive(theta)*halfH, 0xffff0000 ); } theta += stepTheta; } bmd.unlock(); } } }
Here is a live version of the demo:
In short, TrigLUT
offers improved performance for trig functions like Math.sin
and Math.cos
by using lookup tables. If anyone has any suggestions for improving TrigLUT
or alternatives to lookup tables entirely, feel free to join in the discussion below in the comments. To get the performance test, demo app, and TrigLUT
class all in one convenient ZIP, you can download TrigLUT here.
#1 by Karl on May 16th, 2011 ·
Nice, a comparison with Joa Eberts FastMath sine and cosine function would also be interesting.
http://code.google.com/p/apparat/source/browse/as3/Apparat.Library.AS3/src/apparat/math/FastMath.as?r=c8b764b3a535184a604e9d2309655de99be1d8b8
#2 by jackson on May 16th, 2011 ·
I considered FastMath while writing the article, but decided to leave it out since it doesn’t use lookup tables. Still, it achieves the same goal by producing faster sine and cosine results and is therefore a possibility for a future article.
#3 by Henke37 on May 16th, 2011 ·
PROTIP: cos is sin with an offset, so you can use the same table for both if you apply the offset for the lookups.
#4 by jackson on May 16th, 2011 ·
Indeed it is, but it makes the lookup slower. I’d rather spend another 4.9KB of memory and have unique tables. Even on mobile, it’s not much RAM.
#5 by Mark on May 16th, 2011 ·
Interesting article (again)! I ones used a dictionary for this, which started empty, and gets filled by lazy instantiation. However after a while it was using a lot of memory, so thats slows the thing down too. I think it is clever to use rounded values to save ram.
Maybe if you want more precision (dimension) you could calculate the average between the prev/next angle-items in the lookup array around the current returned item. I don’t know if that helps anyway or if you loose too much speed on that.
Thanks for this helpful article!
#6 by jackson on May 16th, 2011 ·
Glad to hear you enjoyed the article and that there are still more people looking for a way to improve their trig computation performance in AS3. :)
I considered using some interpolation like in polygonal.de article I linked, but the lookup becomes much slower due to the extra
Vector
access and math. To get better precision, I think it’s worth trading a little more RAM. 3 decimal places is pretty extreme and it’s only 6283Numbers
= 49KB. Pretty cheap (if you ask me) and with zero slowdown.I also considered storing the values in a
Dictionary
, but sinceDictionary
access is 3x slower thanVector
access, I figured the overhead would more than outweigh theNumber
multiply (* pow
) andint()
explicit type conversion.#7 by skyboy on May 16th, 2011 ·
I did a small rewrite to increase performance by removing the static lookup (instance instead) and the conditional branch: the result was 15% faster execution of val:
Using this:
#8 by jackson on May 16th, 2011 ·
Good catch!
I’ve updated the article with a version of
TrigLUT
that caches the static field as a non-static and updated the ZIP download too. The graphs and performance data are not (yet) updated, but I do see a drop on my Windows machine from ~340 milliseconds to ~300 milliseconds (~13%) forTrigLUT.val
and similar drops in the other tests using that value.#9 by Alex Nino (yoambulante) on May 19th, 2011 ·
Hi! this is quite interesting you were dealing with this, I found a different approach for sorting this, basically re-making the SIN and COS function on the fly in the same block of code will run faster than Math.sin and also you have more control on its accuracy, I am using it quite a lot in my programs, I got about 50% extra performance by doing this, cheers! http://www.yoambulante.com/en/blog/math_sincos.php
#10 by Alec McEachran on May 25th, 2011 ·
Thanks for the article Jackson.
To clarify however, polygonal.de’s 14x speed increase doesn’t come from a lookup table at all, but from a quadratic approximation of the sine curve that is more than accurate enough for most speed-critical circumstances. This has been my favoured approximation for several years now, though without haXe it can make for slightly ugly-looking code. I would be interested whether your lookup approach compares favourably with the quadratic approximation technique.
Regards,
Alec
#11 by jackson on May 25th, 2011 ·
You’re in luck, Alec! This week’s article is all about how you can get even better trig performance by inlining code like the aforementioned polygonal.de version instead of using lookup tables. I didn’t quite get a 14x improvement there, but it’s been a while since that article was written. Still, 10x isn’t so bad. :)
#12 by Ben Beaumont on January 2nd, 2013 ·
Hi,
Out of interest I just ran these tests using FP 11.5.31.135 on an iMac OSX 10.8 and the Math.sin method now seems to be an equivalent performance to the inlined methods – Math.sin,126 compared to inline polygonal.de val (high),155
#13 by benjamin guihaire on August 8th, 2013 ·
the modulo of the angle passed to sin or cos can be computed super fast if the sin LUT has a length of power of 2.
Ex: if the LUT has a length of 2048 for sin values from [0;2PI] range, we just need to convert the angle to be in the correct range by doing angle = angleIn * 2048/(2PI) = angleIn * 325.9493
and the modulo 2PI becomes SinLUT[ angle % 2048] , or faster SinLUT[angle & 2047]
and finally, sinval = SinLUT[ (angle*325.9493)&2047];
full profiling and code of the method here:
http://guihaire.com/code/?p=520