JacksonDunstan.com | Even Faster Trig Through Inlining

Even Faster Trig Through Inlining

May 23, 2011 Tags: inline, inlining, lookup table, performance, sine, trigonometry

Last week’s article showed you a way to improve the performance of trig functions like Math.sin by almost 4x by using lookup tables. This week’s article will go even further and show you how to increase this speedup to over 10x!

The lookup table approach works by replacing a static function call like Math.sin with as little as a Vector index, an int() type conversion, and a multiply. The result was a whopping 4x improvement over Math.sin and its ilk. This was, of course, only true in the best case scenario where you had a number on [0,2π) and the performance advantages faded as you added uncertainties so that the number could be negative or greater than ± π.

Today’s article will discuss non-lookup table approaches offered by polygonal.de and yoambulante.com. In the polygonal.de case, there are two quality options—low and high—and no temporary variables are required. In contrast, the yoambulante.com way includes four quality options—low, acceptable, good, and excellent—and uses between 2 and 6 temporary variables, not to mention 2 to 6 constants.

So, let’s pit these various trig functions—Math.sin, TrigLUT, polygonal.de’s inline code, and yoambulante.com’s inline code—against each other and see who’s fastest. The polygonal.de code supported an arbitrary number of radians via conditionals, so I simply stripped out this support to generate the equivalent of TrigLUT‘s valNormalized, valPositive, and valNormalizedPositive functions. The yoambulante.com code worked on positive numbers as well as negative numbers, but lacked support for numbers greater than 2π, which I added in the val and valPositive case by using polygonal.de’s simple if statement. The result is this performance test:

package
{
	import flash.display.*;
	import flash.utils.*;
	import flash.text.*;
 
	/**
	*   Performance test app for faster ways of doing trig
	*   @author Jackson Dunstan
	*/
	public class FastTrigTest extends Sprite
	{
		private var __logger:TextField = new TextField();
		private function log(msg:*): void { __logger.appendText(msg + "\n"); }
 
		public function FastTrigTest()
		{
			__logger.autoSize = TextFieldAutoSize.LEFT;
			addChild(__logger);
 
			const DIGITS:uint = 2;
			const STEP:Number = 0.05;
			var beforeTime:int;
			var afterTime:int;
			var lut:TrigLUT = new TrigLUT(DIGITS, Math.sin);
			var lutTable:Vector.<Number> = lut.table;
			var lutPow:Number = lut.pow;
			const REPS:int = 100000;
			const NUM:int = int(Math.PI / STEP);
			var theta:Number;
			var temp:Number;
			var i:int;
			var j:int;
			var result:Number;
			const TWO_PI:Number = TrigLUT.TWO_PI;
 
			//code by Alex Nino, yoambulante.com
			//these are our factorial constant values
			const f3:Number = (1 * 2 * 3);
			const f5:Number = (f3 * 4 * 5);
			const f7:Number = (f5 * 6 * 7);
			const f9:Number = (f7 * 8 * 9);
			const f11:Number = (f9 * 10 * 11);
			const f13:Number = (f11 * 12 * 13);
			//these are the variables where we store the powers,
			//note they have to be defined each time we execute the function
			var p3:Number;
			var p5:Number;
			var p7:Number;
			var p9:Number;
			var p11:Number;
			var p13:Number;
 
			/*
			for (theta = -10; theta <= 10; ++theta)
			{
				log(theta + ": " + Math.sin(theta) + " ... " + yoambulanteExcellent(theta));
			}
			*/
			log("Function,Time");
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = Math.sin(theta);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("Math.sin," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = lut.val(theta);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("TrigLUT.val," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = lut.valPositive(theta);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("TrigLUT.valPositive," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = lut.valNormalized(theta);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("TrigLUT.valNormalized," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = lut.valNormalizedPositive(theta);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("TrigLUT.valNormalizedPositive," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = theta >= 0
						? lutTable[int((theta%TWO_PI)*lutPow)]
						: lutTable[int((TWO_PI+theta%TWO_PI)*lutPow)];
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline TrigLUT val," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = lutTable[int((theta%TWO_PI)*lutPow)];
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline TrigLUT valPositive," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = theta >= 0
						? lutTable[int(theta*lutPow)]
						: lutTable[int((TWO_PI+theta)*lutPow)];
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline TrigLUT valNormalized," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = lutTable[int(theta*lutPow)];
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline TrigLUT valNormalizedPositive," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp < -3.14159265)
						temp += 6.28318531;
					else
					if (temp >  3.14159265)
						temp -= 6.28318531;
 
					//compute sine
					if (temp < 0)
						result = 1.27323954 * temp + .405284735 * temp * temp;
					else
						result = 1.27323954 * temp - 0.405284735 * temp * temp;
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline polygonal.de val (low)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp >  3.14159265)
						temp -= 6.28318531;
 
					//compute sine
					result = 1.27323954 * temp - 0.405284735 * temp * temp;
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline polygonal.de valPositive (low)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					//compute sine
					if (theta < 0)
						result = 1.27323954 * theta + .405284735 * theta * theta;
					else
						result = 1.27323954 * theta - 0.405284735 * theta * theta;
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline polygonal.de valNormalized (low)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					result = 1.27323954 * theta - 0.405284735 * theta * theta;
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline polygonal.de valNormalizedPositive (low)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp < -3.14159265)
						temp += 6.28318531;
					else
					if (temp >  3.14159265)
						temp -= 6.28318531;
 
					//compute resulte
					if (temp < 0)
					{
						result = 1.27323954 * temp + .405284735 * temp * temp;
 
						if (result < 0)
							result = .225 * (result *-result - result) + result;
						else
							result = .225 * (result * result - result) + result;
					}
					else
					{
						result = 1.27323954 * temp - 0.405284735 * temp * temp;
 
						if (result < 0)
							result = .225 * (result *-result - result) + result;
						else
							result = .225 * (result * result - result) + result;
					}
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline polygonal.de val (high)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp >  3.14159265)
						temp -= 6.28318531;
 
					//compute resulte
					result = 1.27323954 * temp - 0.405284735 * temp * temp;
 
					if (result < 0)
						result = .225 * (result *-result - result) + result;
					else
						result = .225 * (result * result - result) + result;
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline polygonal.de valPositive (high)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//compute resulte
					if (temp < 0)
					{
						result = 1.27323954 * temp + .405284735 * temp * temp;
 
						if (result < 0)
							result = .225 * (result *-result - result) + result;
						else
							result = .225 * (result * result - result) + result;
					}
					else
					{
						result = 1.27323954 * temp - 0.405284735 * temp * temp;
 
						if (result < 0)
							result = .225 * (result *-result - result) + result;
						else
							result = .225 * (result * result - result) + result;
					}
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline polygonal.de valNormalized (high)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					result = 1.27323954 * temp - 0.405284735 * temp * temp;
 
					if (result < 0)
						result = .225 * (result *-result - result) + result;
					else
						result = .225 * (result * result - result) + result;
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline polygonal.de valNormalizedPositive (high)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp < -3.14159265)
						temp += 6.28318531;
					else
					if (temp >  3.14159265)
						temp -= 6.28318531;
					p3 = temp * temp * temp;
					p5 = p3 * temp * temp;
					result = temp - (p3 / f3) + (p5 / f5);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com val (low)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					p3 = theta * theta * theta;
					p5 = p3 * theta * theta;
					result = theta - (p3 / f3) + (p5 / f5);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valNormalized (low)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp >  3.14159265)
						temp -= 6.28318531;
					p3 = temp * temp * temp;
					p5 = p3 * temp * temp;
					result = temp - (p3 / f3) + (p5 / f5);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valPositive (low)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					p3 = theta * theta * theta;
					p5 = p3 * theta * theta;
					result = theta - (p3 / f3) + (p5 / f5);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valNormalizedPositive (low)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp < -3.14159265)
						temp += 6.28318531;
					else
					if (temp >  3.14159265)
						temp -= 6.28318531;
					p3 = temp * temp * temp;
					p5 = p3 * temp * temp;
					p7 = p5 * temp * temp;
					result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com val (acceptable)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					p3 = theta * theta * theta;
					p5 = p3 * theta * theta;
					p7 = p5 * theta * theta;
					result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valNormalized (acceptable)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp >  3.14159265)
						temp -= 6.28318531;
					p3 = temp * temp * temp;
					p5 = p3 * temp * temp;
					p7 = p5 * temp * temp;
					result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valPositive (acceptable)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					p3 = theta * theta * theta;
					p5 = p3 * theta * theta;
					p7 = p5 * theta * theta;
					result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valNormalizedPositive (acceptable)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp < -3.14159265)
						temp += 6.28318531;
					else
					if (temp >  3.14159265)
						temp -= 6.28318531;
					p3 = temp * temp * temp;
					p5 = p3 * temp * temp;
					p7 = p5 * temp * temp;
					p9 = p7 * temp * temp;
					result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com val (good)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					p3 = theta * theta * theta;
					p5 = p3 * theta * theta;
					p7 = p5 * theta * theta;
					p9 = p7 * theta * theta;
					result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valNormalized (good)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp >  3.14159265)
						temp -= 6.28318531;
					p3 = temp * temp * temp;
					p5 = p3 * temp * temp;
					p7 = p5 * temp * temp;
					p9 = p7 * temp * temp;
					result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valPositive (good)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					p3 = theta * theta * theta;
					p5 = p3 * theta * theta;
					p7 = p5 * theta * theta;
					p9 = p7 * theta * theta;
					result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valNormalizedPositive (good)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp < -3.14159265)
						temp += 6.28318531;
					else
					if (temp >  3.14159265)
						temp -= 6.28318531;
					p3 = temp * temp * temp;
					p5 = p3 * temp * temp;
					p7 = p5 * temp * temp;
					p9 = p7 * temp * temp;
					p11 = p9 * temp * temp;
					p13 = p11 * temp * temp;
					result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com val (excellent)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					p3 = theta * theta * theta;
					p5 = p3 * theta * theta;
					p7 = p5 * theta * theta;
					p9 = p7 * theta * theta;
					p11 = p9 * theta * theta;
					p13 = p11 * theta * theta;
					result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valNormalized (excellent)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					temp = theta;
					//always wrap input angle to -PI..PI
					if (temp >  3.14159265)
						temp -= 6.28318531;
					p3 = temp * temp * temp;
					p5 = p3 * temp * temp;
					p7 = p5 * temp * temp;
					p9 = p7 * temp * temp;
					p11 = p9 * temp * temp;
					p13 = p11 * temp * temp;
					result = temp - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valPositive (excellent)," + (afterTime-beforeTime));
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
				theta = 0;
				for (j = 0; j < NUM; ++j)
				{
					p3 = theta * theta * theta;
					p5 = p3 * theta * theta;
					p7 = p5 * theta * theta;
					p9 = p7 * theta * theta;
					p11 = p9 * theta * theta;
					p13 = p11 * theta * theta;
					result = theta - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13);
					theta += STEP;
				}
			}
			afterTime = getTimer();
			log("inline yoambulante.com valNormalizedPositive (excellent)," + (afterTime-beforeTime));
		}
	}
}

Here is the test environment I ran the test on:

Flex SDK (MXMLC) 4.1.0.16076, compiling in release mode (no debugging or verbose stack traces)
Release version of Flash Player 10.3.181.14
2.4 Ghz Intel Core i5
Mac OS X 10.6.7

And here are the results I got:

Function	Time
Math.sin	238
TrigLUT.val	217
TrigLUT.valPositive	200
TrigLUT.valNormalized	121
TrigLUT.valNormalizedPositive	106
inline TrigLUT val	165
inline TrigLUT valPositive	163
inline TrigLUT valNormalized	73
inline TrigLUT valNormalizedPositive	68
inline polygonal.de val (low)	26
inline polygonal.de valPositive (low)	25
inline polygonal.de valNormalized (low)	24
inline polygonal.de valNormalizedPositive (low)	22
inline polygonal.de val (high)	32
inline polygonal.de valPositive (high)	28
inline polygonal.de valNormalized (high)	26
inline polygonal.de valNormalizedPositive (high)	24
inline yoambulante.com val (low)	27
inline yoambulante.com valNormalized (low)	22
inline yoambulante.com valPositive (low)	24
inline yoambulante.com valNormalizedPositive (low)	22
inline yoambulante.com val (acceptable)	26
inline yoambulante.com valNormalized (acceptable)	22
inline yoambulante.com valPositive (acceptable)	24
inline yoambulante.com valNormalizedPositive (acceptable)	22
inline yoambulante.com val (good)	27
inline yoambulante.com valNormalized (good)	22
inline yoambulante.com valPositive (good)	24
inline yoambulante.com valNormalizedPositive (good)	23
inline yoambulante.com val (excellent)	27
inline yoambulante.com valNormalized (excellent)	22
inline yoambulante.com valPositive (excellent)	25
inline yoambulante.com valNormalizedPositive (excellent)	22

The differences between these various approaches are much more apparent as a graph:

Trig Performance Chart

We can easily see that the inline versions are far faster than the lookup table version, even in its inline form, due to their lack of a Vector access. They each benefit, albeit much more slightly, from the limited-scope versions such as valPositive and valNormalized. Between th two inline approaches, the polygonal.de one seems to be superior for these reasons:

On average, it is 8% faster.
It is more accurate, especially around ± 2π
It requires no temporary variables, which is a big advantage when not computing tons of trig values in one huge run
Its quality is quite good, even in the “low” version

For more on the above quality claims, let’s have a look at a small demo app. I’ve ported over the TrigLUT demo to support all of the approaches from the performance test. To demonstrate the accuracy downsides of the yoamulante.com version, I’ve adjusted the graphed range from [0,2π) to (-π,&pi). Here’s the demo’s source code followed by a live SWF version:

package
{
	import flash.display.*;
	import flash.filters.*;
	import flash.events.*;
	import flash.text.*;
	import flash.geom.*;
 
	/**
	*   Demonstration app for faster ways of doing trig
	*   @author Jackson Dunstan
	*/
	[SWF(width=640,height=480,backgroundColor=0xEEEAD9)]
	public class FastTrigDemo extends Sprite
	{
		private static const TEXT_FORMAT:TextFormat = new TextFormat("_sans", 11);
 
		private static const SINE_LUT_LOW:TrigLUT = new TrigLUT(2, Math.sin);
		private static const SINE_LUT_HIGH:TrigLUT = new TrigLUT(3, Math.sin);
 
		private static const APPROX_METHOD_CATEGORIES:Array = [
			[
				["TrigLUT"],
				["low","lutLow"],
				["high","lutHigh"]
			],
			[
				["polygonal.de"],
				["low","polygonalDeLow"],
				["high","polygonalDeHigh"]
			],
			[
				["yoambulante.com"],
				["low","yoambulanteLow"],
				["acceptable","yoambulanteAcceptable"],
				["good","yoambulanteGood"],
				["excellent","yoambulanteExcellent"]
			]
		]
 
		private var bmd:BitmapData;
		private var bmdRect:Rectangle;
		private var approxFunc:Function;
 
		public function FastTrigDemo()
		{
			stage.scaleMode = StageScaleMode.NO_SCALE;
			stage.align = StageAlign.TOP_LEFT;
 
			var x:Number;
			var y:Number = 0;
 
			var prompt:TextField = new TextField();
			prompt.y = y;
			prompt.defaultTextFormat = TEXT_FORMAT;
			prompt.htmlText = "<b>Approximation Method:</b>";
			prompt.autoSize = TextFieldAutoSize.LEFT;
			prompt.selectable = false;
			addChild(prompt);
 
			const PAD:Number = 3;
 
			var buttons:Array = [];
			y += prompt.height + PAD;
			for each (var category:Array in APPROX_METHOD_CATEGORIES)
			{
				var categoryName:String = category.shift();
				var categoryNameTF:TextField = new TextField();
				categoryNameTF.mouseEnabled = false;
				categoryNameTF.defaultTextFormat = TEXT_FORMAT;
				categoryNameTF.htmlText = "<b>" + categoryName + "</b>";
				categoryNameTF.autoSize = TextFieldAutoSize.LEFT;
				categoryNameTF.selectable = false;
				addChild(categoryNameTF);
 
				x = categoryNameTF.width + PAD;
				for each (var choice:Array in category)
				{
					var choiceName:String = choice[0];
					var choiceFuncName:String = choice[1];
 
					var tf:TextField = new TextField();
					tf.mouseEnabled = false;
					tf.defaultTextFormat = TEXT_FORMAT;
					tf.text = choiceName;
					tf.autoSize = TextFieldAutoSize.LEFT;
					tf.selectable = false;
					tf.x = tf.y = PAD;
 
					var button:Sprite = new Sprite();
					button.buttonMode = true;
					button.name = choiceFuncName;
					button.graphics.beginFill(0xE6E2D1);
					button.graphics.drawRect(0, 0, tf.width+PAD*2, tf.height+PAD*2);
					button.graphics.endFill();
					button.addChild(tf);
					button.addEventListener(
						MouseEvent.CLICK,
						function(ev:MouseEvent): void
						{
							var clickButton:Sprite = ev.currentTarget as Sprite;
							for each (var otherButton:Sprite in buttons)
							{
								otherButton.filters = [new BevelFilter()];
								otherButton.buttonMode = true;
								otherButton.mouseEnabled = true;
							}
							clickButton.filters = [];
							clickButton.buttonMode = false;
							clickButton.mouseEnabled = false;
 
							approxFunc = FastTrigDemo[clickButton.name];
							redraw();
						}
					);
					button.x = x;
					button.y = y;
 
					buttons.push(button);
					addChild(button);
					x += button.width + PAD;
				}
 
				categoryNameTF.y = y + (button.height-categoryNameTF.height) / 2;
				y += button.height + PAD;
			}
 
			var about:TextField = new TextField();
			about.defaultTextFormat = TEXT_FORMAT;
			about.y = y;
			about.htmlText = "Black line: Math.sin\n"
				+ "Red line: approximation\n"
				+ "Demo by <font color=\"#0071BB\"><a href=\""
				+ "http://jacksondunstan.com/articles/1213"
				+ "\">JacksonDunstan.com</a></font>";
			about.autoSize = TextFieldAutoSize.LEFT;
			about.selectable = false;
			addChild(about);
 
			var offset:int = stage.stageHeight - height;
			for (var i:int; i < numChildren; ++i)
			{
				getChildAt(i).y += offset;
			}
 
			this.bmd = new BitmapData(stage.stageWidth, stage.stageHeight, false, 0xffffffff);
			this.bmdRect = new Rectangle(0, 0, this.bmd.width, this.bmd.height);
			addChildAt(new Bitmap(bmd), 0);
			buttons[0].dispatchEvent(new MouseEvent(MouseEvent.CLICK));
		}
 
		private function redraw(): void
		{
			var bmd:BitmapData = this.bmd;
			bmd.lock();
			var w:int = bmd.width;
			var h:int = bmd.height;
			bmd.fillRect(this.bmdRect, 0xEEEAD9);
			var halfH:int = h / 2;
			var stepTheta:Number = TrigLUT.TWO_PI / w;
			var theta:Number = -Math.PI;
			for (var x:int = 0; x < w; ++x)
			{
				for (var y:int = 0; y < h; ++y)
				{
					bmd.setPixel(
						x,
						halfH + Math.sin(theta)*halfH,
						0xff000000
					);
					bmd.setPixel(
						x,
						halfH + this.approxFunc(theta)*halfH,
						0xffff0000
					);
				}
				theta += stepTheta;
			}
			bmd.unlock();
		}
 
		private static function lutLow(x:Number): Number
		{
			return SINE_LUT_LOW.valNormalized(x);
		}
 
		private static function lutHigh(x:Number): Number
		{
			return SINE_LUT_HIGH.valNormalized(x);
		}
 
		private static function polygonalDeLow(x:Number): Number
		{
			//always wrap input angle to -PI..PI
			if (x < -3.14159265)
				x += 6.28318531;
			else
			if (x >  3.14159265)
				x -= 6.28318531;
 
			//compute sine
			if (x < 0)
				return 1.27323954 * x + .405284735 * x * x;
			else
				return 1.27323954 * x - 0.405284735 * x * x;
		}
 
		private static function polygonalDeHigh(x:Number): Number
		{
			//always wrap input angle to -PI..PI
			if (x < -3.14159265)
				x += 6.28318531;
			else
			if (x >  3.14159265)
				x -= 6.28318531;
 
			//compute sine
			if (x < 0)
			{
				var sin:Number = 1.27323954 * x + .405284735 * x * x;
 
				if (sin < 0)
					return .225 * (sin *-sin - sin) + sin;
				else
					return .225 * (sin * sin - sin) + sin;
			}
			else
			{
				sin = 1.27323954 * x - 0.405284735 * x * x;
 
				if (sin < 0)
					return .225 * (sin *-sin - sin) + sin;
				else
					return .225 * (sin * sin - sin) + sin;
			}
		}
 
		private static function yoambulanteLow(rad:Number): Number
		{
			//code by Alex Nino, yoambulante.com
			//these are our factorial constant values
			const f3:Number = (1 * 2 * 3);
			const f5:Number = (f3 * 4 * 5);
			//these are the constiables where we store the powers,
			//note they have to be defined each time we execute the function
			const p3:Number = rad * rad * rad;
			const p5:Number = p3 * rad * rad;
			return rad - (p3 / f3) + (p5 / f5);
		}
 
		private static function yoambulanteAcceptable(rad:Number): Number
		{
			//code by Alex Nino, yoambulante.com
			//these are our factorial constant values
			const f3:Number = (1 * 2 * 3);
			const f5:Number = (f3 * 4 * 5);
			const f7:Number = (f5 * 6 * 7);
			//these are the constiables where we store the powers,
			//note they have to be defined each time we execute the function
			const p3:Number = rad * rad * rad;
			const p5:Number = p3 * rad * rad;
			const p7:Number = p5 * rad * rad;
			return rad - (p3 / f3) + (p5 / f5) - (p7 / f7);
		}
 
		private static function yoambulanteGood(rad:Number): Number
		{
			//code by Alex Nino, yoambulante.com
			//these are our factorial constant values
			const f3:Number = (1 * 2 * 3);
			const f5:Number = (f3 * 4 * 5);
			const f7:Number = (f5 * 6 * 7);
			const f9:Number = (f7 * 8 * 9);
			//these are the constiables where we store the powers,
			//note they have to be defined each time we execute the function
			const p3:Number = rad * rad * rad;
			const p5:Number = p3 * rad * rad;
			const p7:Number = p5 * rad * rad;
			const p9:Number = p7 * rad * rad;
			return rad - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9);
		}
 
		private static function yoambulanteExcellent(rad:Number): Number
		{
			//code by Alex Nino, yoambulante.com
			//these are our factorial constant values
			const f3:Number = (1 * 2 * 3);
			const f5:Number = (f3 * 4 * 5);
			const f7:Number = (f5 * 6 * 7);
			const f9:Number = (f7 * 8 * 9);
			const f11:Number = (f9 * 10 * 11);
			const f13:Number = (f11 * 12 * 13);
			//these are the constiables where we store the powers,
			//note they have to be defined each time we execute the function
			const p3:Number = rad * rad * rad;
			const p5:Number = p3 * rad * rad;
			const p7:Number = p5 * rad * rad;
			const p9:Number = p7 * rad * rad;
			const p11:Number = p9 * rad * rad;
			const p13:Number = p11 * rad * rad;
			return rad - (p3 / f3) + (p5 / f5) - (p7 / f7) + (p9 / f9) - (p11 / f11) + (p13 / f13);
		}
	}
}

Here is the live version for the above source code:

Demo

In short, the polygonal.de source code is simple, accurate, and extremely fast. I recommend using it either manually (copy/paste as I have) or via the FastMath class in Apparat. By doing so, you’ll speed up your trig code by over 10x! In the case that you need even greater precision, there is always TrigLUT if you can live with the 3x slowdown.

Comments

#1 by emrah on May 26th, 2011 · Reply

hi,
i have been fallowing your performance things, they looks cool but i have not understand well what is going on here.
please keep it simple. thanks

#2 by jackson on May 26th, 2011 · Reply

In a nutshell, the article shows that you’ll get a 10x performance increase by replacing this:

result = Math.sin(theta);

With this:

//always wrap input angle to -PI..PI
if (theta < -3.14159265)
	theta += 6.28318531;
else
if (theta >  3.14159265)
	theta -= 6.28318531;
 
//compute resulte
if (theta < 0)
{
	result = 1.27323954 * theta + .405284735 * theta * theta;
 
	if (result < 0)
		result = .225 * (result *-result - result) + result;
	else
		result = .225 * (result * result - result) + result;
}
else
{
	result = 1.27323954 * theta - 0.405284735 * theta * theta;
 
	if (result < 0)
		result = .225 * (result *-result - result) + result;
	else
		result = .225 * (result * result - result) + result;
}

Most of the rest of it is talking about why this is true and comparing it to other ways to improve the Math.sin call.

Feel free to post any specific questions. I’m happy to explain further.

#3 by Forrest on September 8th, 2011 · Reply

very nice!
can you please post the COS version of this inline code? thx :)

p.s. another thing you could investigate is ATAN2 function…it is very slow :(
- #4 by jackson on September 8th, 2011 · Reply
  
  The polygonal.de article has the code you’re looking for.
  
  There’s no inline atan2 available from this article or its links, but you could always use TrigLUT with Math.atan2 to make a quick lookup table.
  - #5 by Forrest on September 9th, 2011 · Reply
    
    can you please post the inline version of Math.cos? :)
    
    thx very much!
    - #6 by jackson on September 9th, 2011 · Reply
      
      It’s on polygonal.de, but sure:
      
      //compute cosine: sin(x + PI/2) = cos(x) x += 1.57079632; if (x > 3.14159265) x -= 6.28318531; if (x < 0) { cos = 1.27323954 * x + 0.405284735 * x * x; if (cos < 0) cos = .225 * (cos *-cos - cos) + cos; else cos = .225 * (cos * cos - cos) + cos; } else { cos = 1.27323954 * x - 0.405284735 * x * x; if (cos < 0) cos = .225 * (cos *-cos - cos) + cos; else cos = .225 * (cos * cos - cos) + cos; }
      - #7 by Forrest on September 9th, 2011 · Reply
        
        thank you :)

#8 by Daniel on June 3rd, 2011 · Reply

I was looking into using an approximated sine some time ago.

here’s my attempt http://blog.daspoda.com/2010-11-12/sine-calculation

I use

$angle %= 3.14159265358979323846

instead of

if (theta > 3.14159265) theta -= 6.28318531;

as the calculations are wrong with angles beyond 2PI. My solution was at about 60% of Math.sin, looking at your math, I can see why it would be faster, just haven't gotten the math behind it.
- #9 by jackson on June 3rd, 2011 · Reply
  
  I missed your article while researching for this article. I’ve yet to test the speed difference between the if branch and a simple % operator, but I’d be inclined to think that your version might be superior in at least the case where the if executes (when angle > 2π.
  
  Thanks for the link and the tip!
#10 by Carl Lydon on August 18th, 2012 · Reply

For use in games, maybe it would be even faster to generate an array of results that can be looked up by degree. For example, you can, at runtime run through 0-359 degrees (or 0-90) and get the sin results, put it in an array, then during the game get the sin from a given degree by just grabbing that item from the array. You can also reduce the precision of the results before putting it in the array to avoid long decimal values.

I did something similar to replace the Math.random function and it sped things up considerably. I generated about 1000 random numbers at runtime, reduced their precision, put them in an array, and cycled through them starting from a seed value.
- #11 by jackson on August 18th, 2012 · Reply
  
  I must be misunderstanding your idea. How is this different than the TrigLUT lookup table in the article?
- #12 by Carl Lydon on August 18th, 2012 · Reply
  
  cosine results, rather.
#13 by Carl Lydon on August 18th, 2012 · Reply

Oh, you’re right, i didn’t read the TrigLut article, sorry.

After looking at it, I am surprised that this is not the fastest way to do Trig functions, though. Looking something up in an array or vector is slower than doing floating point multiplication?

I think TrigLut is working at a disadvantage because you’re still doing floating point operations as part of the lookup. Instead of messing around with Pi and radians, you can just decide in your game how accurate you need your trig to be and work in degrees, which is what As3 needs for rotating objects anyway.
#14 by Carl Lydon on August 18th, 2012 · Reply

For example, a typical thing you need to do continuously in a game is calculate rise/run values based on a degree angle. Instead of doing math during the game to convert to radians and do trig I do this at start of game:

private function preCalcRiseRun ():void {
pRiseRunArray = [];
for (var i = 0; i< 361; i++) {
var rr:Array = calcRiseRun(i);
pRRArray.push (rr);
}
}
//
private function calcRiseRun (vAngle:int):Array {
var rads:Number=0.0174 * vAngle;
var dx1:Number=Math.cos(rads);
var dy1:Number=Math.sin(rads);
var dx2:int = Math.round(dx1 * 1000);
var dy2:int = Math.round(dy1 * 1000);
var dx3:Number = dx2/1000;
var dy3:Number = dy2/1000;
return [dx3,dy3];
}

and then during game, if vAngle = angle in degrees:

var rise:number = pRRArray[vAngle][0];
var run:number = pRRArray[vAngle][1];

I guess this stuff should be stored as vectors to be faster? Is this slower than your method I'm wondering? I guess I should do some tests.
- #15 by jackson on August 18th, 2012 · Reply
  
  It sounds like a good idea for a followup article. Thanks for the tip. Let me know if you do the tests and have some interesting results.
#16 by benjamin guihaire on July 29th, 2013 · Reply

the fastest solution for me is still a LUT , and it can work with any angle,
see with my code : http://guihaire.com/code/?p=520