Even Faster Trig Through Inlining
Last week’s article showed you a way to improve the performance of trig functions like Math.sin
by almost 4x by using lookup tables. This week’s article will go even further and show you how to increase this speedup to over 10x!
The lookup table approach works by replacing a static function call like Math.sin
with as little as a Vector
index, an int()
type conversion, and a multiply. The result was a whopping 4x improvement over Math.sin
and its ilk. This was, of course, only true in the best case scenario where you had a number on [0,2π) and the performance advantages faded as you added uncertainties so that the number could be negative or greater than ± π.
Today’s article will discuss non-lookup table approaches offered by polygonal.de and yoambulante.com. In the polygonal.de case, there are two quality options—low and high—and no temporary variables are required. In contrast, the yoambulante.com way includes four quality options—low, acceptable, good, and excellent—and uses between 2 and 6 temporary variables, not to mention 2 to 6 constants.
So, let’s pit these various trig functions—Math.sin
, TrigLUT
, polygonal.de’s inline code, and yoambulante.com’s inline code—against each other and see who’s fastest. The polygonal.de code supported an arbitrary number of radians via conditionals, so I simply stripped out this support to generate the equivalent of TrigLUT
‘s valNormalized
, valPositive
, and valNormalizedPositive
functions. The yoambulante.com code worked on positive numbers as well as negative numbers, but lacked support for numbers greater than 2π, which I added in the val
and valPositive
case by using polygonal.de’s simple if
statement. The result is this performance test:
package { import flash.display.*; import flash.utils.*; import flash.text.*; /** * Performance test app for faster ways of doing trig * @author Jackson Dunstan */ public class FastTrigTest extends Sprite { private var __logger:TextField = new TextField(); private function log(msg:*): void { __logger.appendText(msg + "\n"); } public function FastTrigTest() { __logger.autoSize = TextFieldAutoSize.LEFT; addChild(__logger); const DIGITS:uint = 2; const STEP:Number = 0.05; var beforeTime:int; var afterTime:int; var lut:TrigLUT = new TrigLUT(DIGITS, Math.sin); var lutTable:Vector.<Number> = lut.table; var lutPow:Number = lut.pow; const REPS:int = 100000; const NUM:int = int(Math.PI / STEP); var theta:Number; var temp:Number; var i:int; var j:int; var result:Number; const TWO_PI:Number = TrigLUT.TWO_PI; //code by Alex Nino, yoambulante.com //these are our factorial constant values const f3:Number = (1 * 2 * 3); const f5:Number = (f3 * 4 * 5); const f7:Number = (f5 * 6 * 7); const f9:Number = (f7 * 8 * 9); const f11:Number = (f9 * 10 * 11); const f13:Number = (f11 * 12 * 13); //these are the variables where we store the powers, //note they have to be defined each time we execute the function var p3:Number; var p5:Number; var p7:Number; var p9:Number; var p11:Number; var p13:Number; /* for (theta = -10; theta <= 10; ++theta) { log(theta + ": " + Math.sin(theta) + " ... " + yoambulanteExcellent(theta)); } */ log("Function,Time"); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = Math.sin(theta); theta += STEP; } } afterTime = getTimer(); log("Math.sin," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lut.val(theta); theta += STEP; } } afterTime = getTimer(); log("TrigLUT.val," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lut.valPositive(theta); theta += STEP; } } afterTime = getTimer(); log("TrigLUT.valPositive," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lut.valNormalized(theta); theta += STEP; } } afterTime = getTimer(); log("TrigLUT.valNormalized," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lut.valNormalizedPositive(theta); theta += STEP; } } afterTime = getTimer(); log("TrigLUT.valNormalizedPositive," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = theta >= 0 ? lutTable[int((theta%TWO_PI)*lutPow)] : lutTable[int((TWO_PI+theta%TWO_PI)*lutPow)]; theta += STEP; } } afterTime = getTimer(); log("inline TrigLUT val," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lutTable[int((theta%TWO_PI)*lutPow)]; theta += STEP; } } afterTime = getTimer(); log("inline TrigLUT valPositive," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = theta >= 0 ? lutTable[int(theta*lutPow)] : lutTable[int((TWO_PI+theta)*lutPow)]; theta += STEP; } } afterTime = getTimer(); log("inline TrigLUT valNormalized," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = lutTable[int(theta*lutPow)]; theta += STEP; } } afterTime = getTimer(); log("inline TrigLUT valNormalizedPositive," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp < -3.14159265) temp += 6.28318531; else if (temp > 3.14159265) temp -= 6.28318531; //compute sine if (temp < 0) result = 1.27323954 * temp + .405284735 * temp * temp; else result = 1.27323954 * temp - 0.405284735 * temp * temp; theta += STEP; } } afterTime = getTimer(); log("inline polygonal.de val (low)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp > 3.14159265) temp -= 6.28318531; //compute sine result = 1.27323954 * temp - 0.405284735 * temp * temp; theta += STEP; } } afterTime = getTimer(); log("inline polygonal.de valPositive (low)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { //compute sine if (theta < 0) result = 1.27323954 * theta + .405284735 * theta * theta; else result = 1.27323954 * theta - 0.405284735 * theta * theta; theta += STEP; } } afterTime = getTimer(); log("inline polygonal.de valNormalized (low)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { result = 1.27323954 * theta - 0.405284735 * theta * theta; theta += STEP; } } afterTime = getTimer(); log("inline polygonal.de valNormalizedPositive (low)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp < -3.14159265) temp += 6.28318531; else if (temp > 3.14159265) temp -= 6.28318531; //compute resulte if (temp < 0) { result = 1.27323954 * temp + .405284735 * temp * temp; if (result < 0) result = .225 * (result *-result - result) + result; else result = .225 * (result * result - result) + result; } else { result = 1.27323954 * temp - 0.405284735 * temp * temp; if (result < 0) result = .225 * (result *-result - result) + result; else result = .225 * (result * result - result) + result; } theta += STEP; } } afterTime = getTimer(); log("inline polygonal.de val (high)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp > 3.14159265) temp -= 6.28318531; //compute resulte result = 1.27323954 * temp - 0.405284735 * temp * temp; if (result < 0) result = .225 * (result *-result - result) + result; else result = .225 * (result * result - result) + result; theta += STEP; } } afterTime = getTimer(); log("inline polygonal.de valPositive (high)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //compute resulte if (temp < 0) { result = 1.27323954 * temp + .405284735 * temp * temp; if (result < 0) result = .225 * (result *-result - result) + result; else result = .225 * (result * result - result) + result; } else { result = 1.27323954 * temp - 0.405284735 * temp * temp; if (result < 0) result = .225 * (result *-result - result) + result; else result = .225 * (result * result - result) + result; } theta += STEP; } } afterTime = getTimer(); log("inline polygonal.de valNormalized (high)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; result = 1.27323954 * temp - 0.405284735 * temp * temp; if (result < 0) result = .225 * (result *-result - result) + result; else result = .225 * (result * result - result) + result; theta += STEP; } } afterTime = getTimer(); log("inline polygonal.de valNormalizedPositive (high)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp < -3.14159265) temp += 6.28318531; else if (temp > 3.14159265) temp -= 6.28318531; p3 = temp * temp * temp; p5 = p3 * temp * temp; result = temp - (p3 / f3) + (p5 / f5); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com val (low)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { p3 = theta * theta * theta; p5 = p3 * theta * theta; result = theta - (p3 / f3) + (p5 / f5); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valNormalized (low)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp > 3.14159265) temp -= 6.28318531; p3 = temp * temp * temp; p5 = p3 * temp * temp; result = temp - (p3 / f3) + (p5 / f5); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valPositive (low)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { p3 = theta * theta * theta; p5 = p3 * theta * theta; result = theta - (p3 / f3) + (p5 / f5); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valNormalizedPositive (low)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp < -3.14159265) temp += 6.28318531; else if (temp > 3.14159265) temp -= 6.28318531; p3 = temp * temp * temp; p5 = p3 * temp * temp; p7 = p5 * temp * temp; result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com val (acceptable)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { p3 = theta * theta * theta; p5 = p3 * theta * theta; p7 = p5 * theta * theta; result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valNormalized (acceptable)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp > 3.14159265) temp -= 6.28318531; p3 = temp * temp * temp; p5 = p3 * temp * temp; p7 = p5 * temp * temp; result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valPositive (acceptable)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { p3 = theta * theta * theta; p5 = p3 * theta * theta; p7 = p5 * theta * theta; result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valNormalizedPositive (acceptable)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp < -3.14159265) temp += 6.28318531; else if (temp > 3.14159265) temp -= 6.28318531; p3 = temp * temp * temp; p5 = p3 * temp * temp; p7 = p5 * temp * temp; p9 = p7 * temp * temp; result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com val (good)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { p3 = theta * theta * theta; p5 = p3 * theta * theta; p7 = p5 * theta * theta; p9 = p7 * theta * theta; result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valNormalized (good)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp > 3.14159265) temp -= 6.28318531; p3 = temp * temp * temp; p5 = p3 * temp * temp; p7 = p5 * temp * temp; p9 = p7 * temp * temp; result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valPositive (good)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { p3 = theta * theta * theta; p5 = p3 * theta * theta; p7 = p5 * theta * theta; p9 = p7 * theta * theta; result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valNormalizedPositive (good)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp < -3.14159265) temp += 6.28318531; else if (temp > 3.14159265) temp -= 6.28318531; p3 = temp * temp * temp; p5 = p3 * temp * temp; p7 = p5 * temp * temp; p9 = p7 * temp * temp; p11 = p9 * temp * temp; p13 = p11 * temp * temp; result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com val (excellent)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { p3 = theta * theta * theta; p5 = p3 * theta * theta; p7 = p5 * theta * theta; p9 = p7 * theta * theta; p11 = p9 * theta * theta; p13 = p11 * theta * theta; result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valNormalized (excellent)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { temp = theta; //always wrap input angle to -PI..PI if (temp > 3.14159265) temp -= 6.28318531; p3 = temp * temp * temp; p5 = p3 * temp * temp; p7 = p5 * temp * temp; p9 = p7 * temp * temp; p11 = p9 * temp * temp; p13 = p11 * temp * temp; result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valPositive (excellent)," + (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { theta = 0; for (j = 0; j < NUM; ++j) { p3 = theta * theta * theta; p5 = p3 * theta * theta; p7 = p5 * theta * theta; p9 = p7 * theta * theta; p11 = p9 * theta * theta; p13 = p11 * theta * theta; result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13); theta += STEP; } } afterTime = getTimer(); log("inline yoambulante.com valNormalizedPositive (excellent)," + (afterTime-beforeTime)); } } }
Here is the test environment I ran the test on:
- Flex SDK (MXMLC) 4.1.0.16076, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 10.3.181.14
- 2.4 Ghz Intel Core i5
- Mac OS X 10.6.7
And here are the results I got:
Function | Time |
---|---|
Math.sin | 238 |
TrigLUT.val | 217 |
TrigLUT.valPositive | 200 |
TrigLUT.valNormalized | 121 |
TrigLUT.valNormalizedPositive | 106 |
inline TrigLUT val | 165 |
inline TrigLUT valPositive | 163 |
inline TrigLUT valNormalized | 73 |
inline TrigLUT valNormalizedPositive | 68 |
inline polygonal.de val (low) | 26 |
inline polygonal.de valPositive (low) | 25 |
inline polygonal.de valNormalized (low) | 24 |
inline polygonal.de valNormalizedPositive (low) | 22 |
inline polygonal.de val (high) | 32 |
inline polygonal.de valPositive (high) | 28 |
inline polygonal.de valNormalized (high) | 26 |
inline polygonal.de valNormalizedPositive (high) | 24 |
inline yoambulante.com val (low) | 27 |
inline yoambulante.com valNormalized (low) | 22 |
inline yoambulante.com valPositive (low) | 24 |
inline yoambulante.com valNormalizedPositive (low) | 22 |
inline yoambulante.com val (acceptable) | 26 |
inline yoambulante.com valNormalized (acceptable) | 22 |
inline yoambulante.com valPositive (acceptable) | 24 |
inline yoambulante.com valNormalizedPositive (acceptable) | 22 |
inline yoambulante.com val (good) | 27 |
inline yoambulante.com valNormalized (good) | 22 |
inline yoambulante.com valPositive (good) | 24 |
inline yoambulante.com valNormalizedPositive (good) | 23 |
inline yoambulante.com val (excellent) | 27 |
inline yoambulante.com valNormalized (excellent) | 22 |
inline yoambulante.com valPositive (excellent) | 25 |
inline yoambulante.com valNormalizedPositive (excellent) | 22 |
The differences between these various approaches are much more apparent as a graph:
We can easily see that the inline versions are far faster than the lookup table version, even in its inline form, due to their lack of a Vector
access. They each benefit, albeit much more slightly, from the limited-scope versions such as valPositive
and valNormalized
. Between th two inline approaches, the polygonal.de one seems to be superior for these reasons:
- On average, it is 8% faster.
- It is more accurate, especially around ± 2π
- It requires no temporary variables, which is a big advantage when not computing tons of trig values in one huge run
- Its quality is quite good, even in the “low” version
For more on the above quality claims, let’s have a look at a small demo app. I’ve ported over the TrigLUT
demo to support all of the approaches from the performance test. To demonstrate the accuracy downsides of the yoamulante.com version, I’ve adjusted the graphed range from [0,2π) to (-π,&pi). Here’s the demo’s source code followed by a live SWF version:
package { import flash.display.*; import flash.filters.*; import flash.events.*; import flash.text.*; import flash.geom.*; /** * Demonstration app for faster ways of doing trig * @author Jackson Dunstan */ [SWF(width=640,height=480,backgroundColor=0xEEEAD9)] public class FastTrigDemo extends Sprite { private static const TEXT_FORMAT:TextFormat = new TextFormat("_sans", 11); private static const SINE_LUT_LOW:TrigLUT = new TrigLUT(2, Math.sin); private static const SINE_LUT_HIGH:TrigLUT = new TrigLUT(3, Math.sin); private static const APPROX_METHOD_CATEGORIES:Array = [ [ ["TrigLUT"], ["low","lutLow"], ["high","lutHigh"] ], [ ["polygonal.de"], ["low","polygonalDeLow"], ["high","polygonalDeHigh"] ], [ ["yoambulante.com"], ["low","yoambulanteLow"], ["acceptable","yoambulanteAcceptable"], ["good","yoambulanteGood"], ["excellent","yoambulanteExcellent"] ] ] private var bmd:BitmapData; private var bmdRect:Rectangle; private var approxFunc:Function; public function FastTrigDemo() { stage.scaleMode = StageScaleMode.NO_SCALE; stage.align = StageAlign.TOP_LEFT; var x:Number; var y:Number = 0; var prompt:TextField = new TextField(); prompt.y = y; prompt.defaultTextFormat = TEXT_FORMAT; prompt.htmlText = "<b>Approximation Method:</b>"; prompt.autoSize = TextFieldAutoSize.LEFT; prompt.selectable = false; addChild(prompt); const PAD:Number = 3; var buttons:Array = []; y += prompt.height + PAD; for each (var category:Array in APPROX_METHOD_CATEGORIES) { var categoryName:String = category.shift(); var categoryNameTF:TextField = new TextField(); categoryNameTF.mouseEnabled = false; categoryNameTF.defaultTextFormat = TEXT_FORMAT; categoryNameTF.htmlText = "<b>" + categoryName + "</b>"; categoryNameTF.autoSize = TextFieldAutoSize.LEFT; categoryNameTF.selectable = false; addChild(categoryNameTF); x = categoryNameTF.width + PAD; for each (var choice:Array in category) { var choiceName:String = choice[0]; var choiceFuncName:String = choice[1]; var tf:TextField = new TextField(); tf.mouseEnabled = false; tf.defaultTextFormat = TEXT_FORMAT; tf.text = choiceName; tf.autoSize = TextFieldAutoSize.LEFT; tf.selectable = false; tf.x = tf.y = PAD; var button:Sprite = new Sprite(); button.buttonMode = true; button.name = choiceFuncName; button.graphics.beginFill(0xE6E2D1); button.graphics.drawRect(0, 0, tf.width+PAD*2, tf.height+PAD*2); button.graphics.endFill(); button.addChild(tf); button.addEventListener( MouseEvent.CLICK, function(ev:MouseEvent): void { var clickButton:Sprite = ev.currentTarget as Sprite; for each (var otherButton:Sprite in buttons) { otherButton.filters = [new BevelFilter()]; otherButton.buttonMode = true; otherButton.mouseEnabled = true; } clickButton.filters = []; clickButton.buttonMode = false; clickButton.mouseEnabled = false; approxFunc = FastTrigDemo[clickButton.name]; redraw(); } ); button.x = x; button.y = y; buttons.push(button); addChild(button); x += button.width + PAD; } categoryNameTF.y = y + (button.height-categoryNameTF.height) / 2; y += button.height + PAD; } var about:TextField = new TextField(); about.defaultTextFormat = TEXT_FORMAT; about.y = y; about.htmlText = "Black line: Math.sin\n" + "Red line: approximation\n" + "Demo by <font color=\"#0071BB\"><a href=\"" + "http://jacksondunstan.com/articles/1213" + "\">JacksonDunstan.com</a></font>"; about.autoSize = TextFieldAutoSize.LEFT; about.selectable = false; addChild(about); var offset:int = stage.stageHeight - height; for (var i:int; i < numChildren; ++i) { getChildAt(i).y += offset; } this.bmd = new BitmapData(stage.stageWidth, stage.stageHeight, false, 0xffffffff); this.bmdRect = new Rectangle(0, 0, this.bmd.width, this.bmd.height); addChildAt(new Bitmap(bmd), 0); buttons[0].dispatchEvent(new MouseEvent(MouseEvent.CLICK)); } private function redraw(): void { var bmd:BitmapData = this.bmd; bmd.lock(); var w:int = bmd.width; var h:int = bmd.height; bmd.fillRect(this.bmdRect, 0xEEEAD9); var halfH:int = h / 2; var stepTheta:Number = TrigLUT.TWO_PI / w; var theta:Number = -Math.PI; for (var x:int = 0; x < w; ++x) { for (var y:int = 0; y < h; ++y) { bmd.setPixel( x, halfH + Math.sin(theta)*halfH, 0xff000000 ); bmd.setPixel( x, halfH + this.approxFunc(theta)*halfH, 0xffff0000 ); } theta += stepTheta; } bmd.unlock(); } private static function lutLow(x:Number): Number { return SINE_LUT_LOW.valNormalized(x); } private static function lutHigh(x:Number): Number { return SINE_LUT_HIGH.valNormalized(x); } private static function polygonalDeLow(x:Number): Number { //always wrap input angle to -PI..PI if (x < -3.14159265) x += 6.28318531; else if (x > 3.14159265) x -= 6.28318531; //compute sine if (x < 0) return 1.27323954 * x + .405284735 * x * x; else return 1.27323954 * x - 0.405284735 * x * x; } private static function polygonalDeHigh(x:Number): Number { //always wrap input angle to -PI..PI if (x < -3.14159265) x += 6.28318531; else if (x > 3.14159265) x -= 6.28318531; //compute sine if (x < 0) { var sin:Number = 1.27323954 * x + .405284735 * x * x; if (sin < 0) return .225 * (sin *-sin - sin) + sin; else return .225 * (sin * sin - sin) + sin; } else { sin = 1.27323954 * x - 0.405284735 * x * x; if (sin < 0) return .225 * (sin *-sin - sin) + sin; else return .225 * (sin * sin - sin) + sin; } } private static function yoambulanteLow(rad:Number): Number { //code by Alex Nino, yoambulante.com //these are our factorial constant values const f3:Number = (1 * 2 * 3); const f5:Number = (f3 * 4 * 5); //these are the constiables where we store the powers, //note they have to be defined each time we execute the function const p3:Number = rad * rad * rad; const p5:Number = p3 * rad * rad; return rad - (p3 / f3) + (p5 / f5); } private static function yoambulanteAcceptable(rad:Number): Number { //code by Alex Nino, yoambulante.com //these are our factorial constant values const f3:Number = (1 * 2 * 3); const f5:Number = (f3 * 4 * 5); const f7:Number = (f5 * 6 * 7); //these are the constiables where we store the powers, //note they have to be defined each time we execute the function const p3:Number = rad * rad * rad; const p5:Number = p3 * rad * rad; const p7:Number = p5 * rad * rad; return rad - (p3 / f3) + (p5 / f5) - (p7 / f7); } private static function yoambulanteGood(rad:Number): Number { //code by Alex Nino, yoambulante.com //these are our factorial constant values const f3:Number = (1 * 2 * 3); const f5:Number = (f3 * 4 * 5); const f7:Number = (f5 * 6 * 7); const f9:Number = (f7 * 8 * 9); //these are the constiables where we store the powers, //note they have to be defined each time we execute the function const p3:Number = rad * rad * rad; const p5:Number = p3 * rad * rad; const p7:Number = p5 * rad * rad; const p9:Number = p7 * rad * rad; return rad - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9); } private static function yoambulanteExcellent(rad:Number): Number { //code by Alex Nino, yoambulante.com //these are our factorial constant values const f3:Number = (1 * 2 * 3); const f5:Number = (f3 * 4 * 5); const f7:Number = (f5 * 6 * 7); const f9:Number = (f7 * 8 * 9); const f11:Number = (f9 * 10 * 11); const f13:Number = (f11 * 12 * 13); //these are the constiables where we store the powers, //note they have to be defined each time we execute the function const p3:Number = rad * rad * rad; const p5:Number = p3 * rad * rad; const p7:Number = p5 * rad * rad; const p9:Number = p7 * rad * rad; const p11:Number = p9 * rad * rad; const p13:Number = p11 * rad * rad; return rad - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13); } } }
Here is the live version for the above source code:
In short, the polygonal.de source code is simple, accurate, and extremely fast. I recommend using it either manually (copy/paste as I have) or via the FastMath
class in Apparat. By doing so, you’ll speed up your trig code by over 10x! In the case that you need even greater precision, there is always TrigLUT
if you can live with the 3x slowdown.
#1 by emrah on May 26th, 2011 ·
hi,
i have been fallowing your performance things, they looks cool but i have not understand well what is going on here.
please keep it simple. thanks
#2 by jackson on May 26th, 2011 ·
In a nutshell, the article shows that you’ll get a 10x performance increase by replacing this:
With this:
Most of the rest of it is talking about why this is true and comparing it to other ways to improve the
Math.sin
call.Feel free to post any specific questions. I’m happy to explain further.
#3 by Forrest on September 8th, 2011 ·
very nice!
can you please post the COS version of this inline code? thx :)
p.s. another thing you could investigate is ATAN2 function…it is very slow :(
#4 by jackson on September 8th, 2011 ·
The polygonal.de article has the code you’re looking for.
There’s no inline
atan2
available from this article or its links, but you could always useTrigLUT
withMath.atan2
to make a quick lookup table.#5 by Forrest on September 9th, 2011 ·
can you please post the inline version of Math.cos? :)
thx very much!
#6 by jackson on September 9th, 2011 ·
It’s on polygonal.de, but sure:
#7 by Forrest on September 9th, 2011 ·
thank you :)
#8 by Daniel on June 3rd, 2011 ·
I was looking into using an approximated sine some time ago.
here’s my attempt http://blog.daspoda.com/2010-11-12/sine-calculation
I use
#9 by jackson on June 3rd, 2011 ·
I missed your article while researching for this article. I’ve yet to test the speed difference between the
if
branch and a simple%
operator, but I’d be inclined to think that your version might be superior in at least the case where theif
executes (whenangle
> 2π.Thanks for the link and the tip!
#10 by Carl Lydon on August 18th, 2012 ·
For use in games, maybe it would be even faster to generate an array of results that can be looked up by degree. For example, you can, at runtime run through 0-359 degrees (or 0-90) and get the sin results, put it in an array, then during the game get the sin from a given degree by just grabbing that item from the array. You can also reduce the precision of the results before putting it in the array to avoid long decimal values.
I did something similar to replace the Math.random function and it sped things up considerably. I generated about 1000 random numbers at runtime, reduced their precision, put them in an array, and cycled through them starting from a seed value.
#11 by jackson on August 18th, 2012 ·
I must be misunderstanding your idea. How is this different than the
TrigLUT
lookup table in the article?#12 by Carl Lydon on August 18th, 2012 ·
cosine results, rather.
#13 by Carl Lydon on August 18th, 2012 ·
Oh, you’re right, i didn’t read the TrigLut article, sorry.
After looking at it, I am surprised that this is not the fastest way to do Trig functions, though. Looking something up in an array or vector is slower than doing floating point multiplication?
I think TrigLut is working at a disadvantage because you’re still doing floating point operations as part of the lookup. Instead of messing around with Pi and radians, you can just decide in your game how accurate you need your trig to be and work in degrees, which is what As3 needs for rotating objects anyway.
#14 by Carl Lydon on August 18th, 2012 ·
For example, a typical thing you need to do continuously in a game is calculate rise/run values based on a degree angle. Instead of doing math during the game to convert to radians and do trig I do this at start of game:
private function preCalcRiseRun ():void {
pRiseRunArray = [];
for (var i = 0; i< 361; i++) {
var rr:Array = calcRiseRun(i);
pRRArray.push (rr);
}
}
//
private function calcRiseRun (vAngle:int):Array {
var rads:Number=0.0174 * vAngle;
var dx1:Number=Math.cos(rads);
var dy1:Number=Math.sin(rads);
var dx2:int = Math.round(dx1 * 1000);
var dy2:int = Math.round(dy1 * 1000);
var dx3:Number = dx2/1000;
var dy3:Number = dy2/1000;
return [dx3,dy3];
}
and then during game, if vAngle = angle in degrees:
var rise:number = pRRArray[vAngle][0];
var run:number = pRRArray[vAngle][1];
I guess this stuff should be stored as vectors to be faster? Is this slower than your method I'm wondering? I guess I should do some tests.
#15 by jackson on August 18th, 2012 ·
It sounds like a good idea for a followup article. Thanks for the tip. Let me know if you do the tests and have some interesting results.
#16 by benjamin guihaire on July 29th, 2013 ·
the fastest solution for me is still a LUT , and it can work with any angle,
see with my code : http://guihaire.com/code/?p=520