Now that we know how to use textures with an alpha channel in rendering Stage3D scenes, let’s see if we can cut the performance cost so we can use them more often. Today’s article will show some tricks to optimize your rendering loop.

The following test app started with the test app from last time and has some modifications made to it:

  • New option to switch between “original” and “fast” sorting
  • Rendering is now in two stages. First, view frustum culling is done to form a Vector of visible objects. Second, the visible objects are drawn.
  • Opaque texture and sorting options removed for simplicity’s sake
  • enableErrorChecking no longer set on the Context3D

The “fast” sorting option is at the heart of this article’s optimization. You’ll recall that using alpha textures necessitates a back-to-front sort of the 3D objects in the scene. There are two ways that the “fast” sorting option speeds this up:

  1. Use Skyboy’s fastSort rather than Vector.sort to sort the 3D objects on a cached “distance from camera” field of the cube
  2. Sort only the 3D objects that pass the view frustum culling step. Don’t bother sorting objects that will never be drawn.

Both of these are important optimizations, but the second is the major algorithmic change. Here’s the difference between the “original” and “fast” sorts: (pseudo-code)

///////////
// Original
///////////
 
// Sort all cubes
allCubes.sort(backToFront);
 
// Draw all cubes that are in the view frustum
for each (cube in allCubes)
{
    if (cube.isInViewFrustum())
    {
        draw(cube);
    }
}
 
///////
// Fast
///////
 
// Make a list of all cubes that are in the view frustum
visibleCubes = [];
for each (cube in allCubes)
{
    if (cube.isInViewFrustum())
    {
        visibleCubes.push(cube);
    }
}
 
// Sort just those cubes
visibleCubes.sort(backToFront);

There are two main “wins” here. First, sorting fewer 3D objects is clearly going to be faster. Second, good sorting algorithms run N * log2(N) times where N is the number of objects to sort. So each 3D object that’s being sorted adds more than one step to the sorting algorithm, making the increase more and more important as the number of 3D objects increases.

Now let’s take a look at the test app:

package
{
	import skyboy.utils.fastSort;
	import com.adobe.utils.*;
 
	import flash.display.*;
	import flash.display3D.*;
	import flash.display3D.textures.*;
	import flash.events.*;
	import flash.geom.*;
	import flash.text.*;
	import flash.utils.*;
 
	/**
	*   Test of faster ways of drawing alpha textures with Stage3D
	*   @author Jackson Dunstan, http://JacksonDunstan.com
	*/
	public class FasterAlphaTextures extends Sprite 
	{
		/** UI Padding */
		private static const PAD:Number = 5;
 
		/** Number of cubes per dimension (X, Y, Z) */
		private static const NUM_CUBES:int = 32;
 
		/** Number of total cubes */
		private static const NUM_CUBES_TOTAL:int = NUM_CUBES*NUM_CUBES*NUM_CUBES;
 
		/** Positions of all cubes' vertices */
		private static const POSITIONS:Vector.<Number> = new <Number>[
			// back face - bottom tri
			-0.5, -0.5, -0.5,
			-0.5, 0.5, -0.5,
			0.5, -0.5, -0.5,
			// back face - top tri
			-0.5, 0.5, -0.5,
			0.5, 0.5, -0.5,
			0.5, -0.5, -0.5,
 
			// front face - bottom tri
			-0.5, -0.5, 0.5,
			-0.5, 0.5, 0.5,
			0.5, -0.5, 0.5,
			// front face - top tri
			-0.5, 0.5, 0.5,
			0.5, 0.5, 0.5,
			0.5, -0.5, 0.5,
 
			// left face - bottom tri
			-0.5, -0.5, -0.5,
			-0.5, 0.5, -0.5,
			-0.5, -0.5, 0.5,
			// left face - top tri
			-0.5, 0.5, -0.5,
			-0.5, 0.5, 0.5,
			-0.5, -0.5, 0.5,
 
			// right face - bottom tri
			0.5, -0.5, -0.5,
			0.5, 0.5, -0.5,
			0.5, -0.5, 0.5,
			// right face - top tri
			0.5, 0.5, -0.5,
			0.5, 0.5, 0.5,
			0.5, -0.5, 0.5,
 
			// bottom face - bottom tri
			-0.5, -0.5, 0.5,
			-0.5, -0.5, -0.5,
			0.5, -0.5, 0.5,
			// bottom face - top tri
			-0.5, -0.5, -0.5,
			0.5, -0.5, -0.5,
			0.5, -0.5, 0.5,
 
			// top face - bottom tri
			-0.5, 0.5, 0.5,
			-0.5, 0.5, -0.5,
			0.5, 0.5, 0.5,
			// top face - top tri
			-0.5, 0.5, -0.5,
			0.5, 0.5, -0.5,
			0.5, 0.5, 0.5
		];
 
		/** Texture coordinates of all cubes' vertices */
		private static const TEX_COORDS:Vector.<Number> = new <Number>[
			// back face - bottom tri
			1, 1,
			1, 0,
			0, 1,
			// back face - top tri
			1, 0,
			0, 0,
			0, 1,
 
			// front face - bottom tri
			0, 1,
			0, 0,
			1, 1,
			// front face - top tri
			0, 0,
			1, 0,
			1, 1,
 
			// left face - bottom tri
			0, 1,
			0, 0,
			1, 1,
			// left face - top tri
			0, 0,
			1, 0,
			1, 1,
 
			// right face - bottom tri
			1, 1,
			1, 0,
			0, 1,
			// right face - top tri
			1, 0,
			0, 0,
			0, 1,
 
			// bottom face - bottom tri
			0, 0,
			0, 1,
			1, 0,
			// bottom face - top tri
			0, 1,
			1, 1,
			1, 0,
 
			// top face - bottom tri
			0, 1,
			0, 0,
			1, 1,
			// top face - top tri
			0, 0,
			1, 0,
			1, 1
		];
 
		/** Triangles of all cubes */
		private static const TRIS:Vector.<uint> = new <uint>[
			2, 1, 0,    // back face - bottom tri
			5, 4, 3,    // back face - top tri
			6, 7, 8,    // front face - bottom tri
			9, 10, 11,  // front face - top tri
			12, 13, 14, // left face - bottom tri
			15, 16, 17, // left face - top tri
			20, 19, 18, // right face - bottom tri
			23, 22, 21, // right face - top tri
			26, 25, 24, // bottom face - bottom tri
			29, 28, 27, // bottom face - top tri
			30, 31, 32, // top face - bottom tri
			33, 34, 35  // top face - bottom tri
		];
 
		[Embed(source="flash_logo_alpha.png")]
		private static const TEXTURE:Class;
 
		private static const TEMP_DRAW_MATRIX:Matrix3D = new Matrix3D();
 
		private var context3D:Context3D;
		private var vertexBuffer:VertexBuffer3D;
		private var vertexBuffer2:VertexBuffer3D;
		private var indexBuffer:IndexBuffer3D; 
		private var program:Program3D;
		private var texture:Texture;
		private var camera:Camera3D;
		private var cubes:Vector.<Cube> = new Vector.<Cube>();
 
		private var fps:TextField = new TextField();
		private var lastFPSUpdateTime:uint;
		private var lastFrameTime:uint;
		private var frameCount:uint;
		private var driver:TextField = new TextField();
		private var draws:TextField = new TextField();
 
		private var tempCameraPosX:Number;
		private var tempCameraPosY:Number;
		private var tempCameraPosZ:Number;
 
		private var fastSorting:Boolean;
		private var visibleCubes:Vector.<Cube> = new <Cube>[];
 
		public function FasterAlphaTextures()
		{
			stage.align = StageAlign.TOP_LEFT;
			stage.scaleMode = StageScaleMode.NO_SCALE;
			stage.frameRate = 60;
 
			var stage3D:Stage3D = stage.stage3Ds[0];
			stage3D.addEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
			stage3D.requestContext3D(Context3DRenderMode.AUTO);
		}
 
		protected function onContextCreated(ev:Event): void
		{
			// Setup context
			var stage3D:Stage3D = stage.stage3Ds[0];
			stage3D.removeEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
			context3D = stage3D.context3D;            
			context3D.configureBackBuffer(
				stage.stageWidth,
				stage.stageHeight,
				0,
				true
			);
 
			// Setup camera
			camera = new Camera3D(
				0.1, // near
				100, // far
				stage.stageWidth / stage.stageHeight, // aspect ratio
				40*(Math.PI/180), // vFOV
				-6, -8, 6, // position
				0, 0, 0, // target
				0, 1, 0 // up dir
			);
 
			// Setup cubes
			for (var i:int; i < NUM_CUBES; ++i)
			{
				for (var j:int = 0; j < NUM_CUBES; ++j)
				{
					for (var k:int = 0; k < NUM_CUBES; ++k)
					{
						cubes.push(new Cube(i*2, j*2, -k*2));
					}
				}
			}
 
			// Setup UI
			fps.background = true;
			fps.backgroundColor = 0xffffffff;
			fps.autoSize = TextFieldAutoSize.LEFT;
			fps.text = "Getting FPS...";
			addChild(fps);
 
			driver.background = true;
			driver.backgroundColor = 0xffffffff;
			driver.text = "Driver: " + context3D.driverInfo;
			driver.autoSize = TextFieldAutoSize.LEFT;
			driver.y = fps.height;
			addChild(driver);
 
			draws.background = true;
			draws.backgroundColor = 0xffffffff;
			draws.text = "Getting draws...";
			draws.autoSize = TextFieldAutoSize.LEFT;
			draws.y = driver.y + driver.height;
			addChild(draws);
 
			var buttonsTopY:Number = makeButtons(
				"Move Forward", "Move Backward", null,
				"Move Left", "Move Right", null,
				"Move Up", "Move Down", null,
				"Yaw Left", "Yaw Right", null,
				"Pitch Up", "Pitch Down", null,
				"Roll Left", "Roll Right"
			);
 
			var fastSortingCB:Sprite = makeCheckBox(
				"Fast Sorting?:",
				fastSorting,
				onFastSortingChecked
			);
			fastSortingCB.x = PAD;
			fastSortingCB.y = buttonsTopY - fastSortingCB.height - PAD;
			addChild(fastSortingCB);
 
			var assembler:AGALMiniAssembler = new AGALMiniAssembler();
 
			// Vertex shader
			var vertSource:String = "m44 op, va0, vc0\nmov v0, va1\n";
			assembler.assemble(Context3DProgramType.VERTEX, vertSource);
			var vertexShaderAGAL:ByteArray = assembler.agalcode;
 
			// Fragment shader
			var fragSource:String = "tex oc, v0, fs0 <2d,linear,mipnone>";
			assembler.assemble(Context3DProgramType.FRAGMENT, fragSource);
			var fragmentShaderAGAL:ByteArray = assembler.agalcode;
 
			// Shader program
			program = context3D.createProgram();
			program.upload(vertexShaderAGAL, fragmentShaderAGAL);
 
			// Setup buffers
			vertexBuffer = context3D.createVertexBuffer(36, 3);
			vertexBuffer.uploadFromVector(POSITIONS, 0, 36);
			vertexBuffer2 = context3D.createVertexBuffer(36, 2);
			vertexBuffer2.uploadFromVector(TEX_COORDS, 0, 36);
			indexBuffer = context3D.createIndexBuffer(36);
			indexBuffer.uploadFromVector(TRIS, 0, 36);
 
			// Setup textures
			var bmd:BitmapData = (new TEXTURE() as Bitmap).bitmapData;
			texture = context3D.createTexture(
				bmd.width,
				bmd.height,
				Context3DTextureFormat.BGRA,
				true
			);
			texture.uploadFromBitmapData(bmd);
 
			// Start the simulation
			addEventListener(Event.ENTER_FRAME, onEnterFrame);
		}
 
		private function makeButtons(...labels): Number
		{
			var curX:Number = PAD;
			var curY:Number = stage.stageHeight - PAD;
			for each (var label:String in labels)
			{
				if (label == null)
				{
					curX = PAD;
					curY -= button.height + PAD;
					continue;
				}
 
				var tf:TextField = new TextField();
				tf.mouseEnabled = false;
				tf.selectable = false;
				tf.defaultTextFormat = new TextFormat("_sans");
				tf.autoSize = TextFieldAutoSize.LEFT;
				tf.text = label;
				tf.name = "lbl";
 
				var button:Sprite = new Sprite();
				button.buttonMode = true;
				button.graphics.beginFill(0xF5F5F5);
				button.graphics.drawRect(0, 0, tf.width+PAD, tf.height+PAD);
				button.graphics.endFill();
				button.graphics.lineStyle(1);
				button.graphics.drawRect(0, 0, tf.width+PAD, tf.height+PAD);
				button.addChild(tf);
				button.addEventListener(MouseEvent.CLICK, onButton);
				if (curX + button.width > stage.stageWidth - PAD)
				{
					curX = PAD;
					curY -= button.height + PAD;
				}
				button.x = curX;
				button.y = curY - button.height;
				addChild(button);
 
				curX += button.width + PAD;
			}
 
			return curY - button.height;
		}
 
		public static function makeCheckBox(
            label:String,
            checked:Boolean,
            callback:Function,
            labelFormat:TextFormat=null): Sprite
        {
            var sprite:Sprite = new Sprite();
 
            var tf:TextField = new TextField();
            tf.autoSize = TextFieldAutoSize.LEFT;
            tf.text = label;
            tf.background = true;
            tf.backgroundColor = 0xffffff;
            tf.selectable = false;
            tf.mouseEnabled = false;
            tf.setTextFormat(labelFormat || new TextFormat("_sans"));
            sprite.addChild(tf);
 
            var size:Number = tf.height;
 
            var background:Shape = new Shape();
            background.graphics.beginFill(0xffffff);
            background.graphics.drawRect(0, 0, size, size);
            background.x = tf.width + PAD;
            sprite.addChild(background);
 
            var border:Shape = new Shape();
            border.graphics.lineStyle(1, 0x000000);
            border.graphics.drawRect(0, 0, size, size);
            border.x = background.x;
            sprite.addChild(border);
 
            var check:Shape = new Shape();
            check.graphics.lineStyle(1, 0x000000);
            check.graphics.moveTo(0, 0);
            check.graphics.lineTo(size, size);
            check.graphics.moveTo(size, 0);
            check.graphics.lineTo(0, size);
            check.x = background.x;
            check.visible = checked;
            sprite.addChild(check);
 
            sprite.addEventListener(
                MouseEvent.CLICK,
                function(ev:MouseEvent): void
                {
                    checked = !checked;
                    check.visible = checked;
                    callback(checked);
                }
            );
 
            return sprite;
        }
 
		private function onButton(ev:MouseEvent): void
		{
			var mode:String = ev.target.getChildByName("lbl").text;
			switch (mode)
			{
				case "Move Forward":
					camera.moveForward(1);
					break;
				case "Move Backward":
					camera.moveBackward(1);
					break;
				case "Move Left":
					camera.moveLeft(1);
					break;
				case "Move Right":
					camera.moveRight(1);
					break;
				case "Move Up":
					camera.moveUp(1);
					break;
				case "Move Down":
					camera.moveDown(1);
					break;
				case "Yaw Left":
					camera.yaw(-10);
					break;
				case "Yaw Right":
					camera.yaw(10);
					break;
				case "Pitch Up":
					camera.pitch(-10);
					break;
				case "Pitch Down":
					camera.pitch(10);
					break;
				case "Roll Left":
					camera.roll(10);
					break;
				case "Roll Right":
					camera.roll(-10);
					break;
			}
		}
 
		private function onFastSortingChecked(checked:Boolean): void
		{
			fastSorting = !fastSorting;
		}
 
		private function sortByCameraDistance(a:Cube, b:Cube): int
		{
			var deltaX:Number = a.posX - tempCameraPosX;
			var deltaY:Number = a.posY - tempCameraPosY;
			var deltaZ:Number = a.posZ - tempCameraPosZ;
			var aDist:Number = deltaX*deltaX + deltaY*deltaY + deltaZ*deltaZ;
 
			deltaX = b.posX - tempCameraPosX;
			deltaY = b.posY - tempCameraPosY;
			deltaZ = b.posZ - tempCameraPosZ;
			var bDist:Number = deltaX*deltaX + deltaY*deltaY + deltaZ*deltaZ;
 
			return bDist - aDist;
		}
 
		private function sortFast(): void
		{
			// Cache camera position
			tempCameraPosX = camera.positionX;
			tempCameraPosY = camera.positionY;
			tempCameraPosZ = camera.positionZ;
 
			// Only add cubes that pass frustum culling to visible list
			var numVisibleCubes:int;
			visibleCubes.length = 0;
			for each (var cube:Cube in cubes)
			{
				if (camera.isSphereInFrustum(cube.sphere))
				{
					visibleCubes[numVisibleCubes++] = cube;
 
					// Compute distance of cube to camera
					var deltaX:Number = cube.posX - tempCameraPosX;
					var deltaY:Number = cube.posY - tempCameraPosY;
					var deltaZ:Number = cube.posZ - tempCameraPosZ;
					cube.camDist = deltaX*deltaX + deltaY*deltaY + deltaZ*deltaZ;
				}
			}
 
			// Sort all visible cubes
			fastSort(visibleCubes, "camDist", Array.NUMERIC);
		}
 
		private function sortOriginal(): void
		{
			// Sort all cubes
			tempCameraPosX = camera.positionX;
			tempCameraPosY = camera.positionY;
			tempCameraPosZ = camera.positionZ;
			cubes.sort(sortByCameraDistance);
 
			// Only add cubes that pass frustum culling to visible list
			var numVisibleCubes:int;
			visibleCubes.length = 0;
			for each (var cube:Cube in cubes)
			{
				if (camera.isSphereInFrustum(cube.sphere))
				{
					visibleCubes[numVisibleCubes++] = cube;
				}
			}
		}
 
		private function onEnterFrame(ev:Event): void
		{
			// Set up rendering
			context3D.setProgram(program);
			context3D.setVertexBufferAt(0, vertexBuffer, 0, Context3DVertexBufferFormat.FLOAT_3);
			context3D.setVertexBufferAt(1, vertexBuffer2, 0, Context3DVertexBufferFormat.FLOAT_2);
			context3D.setTextureAt(0, texture);
			context3D.clear(0.5, 0.5, 0.5);
			context3D.setBlendFactors(
				Context3DBlendFactor.SOURCE_ALPHA,
				Context3DBlendFactor.ONE_MINUS_SOURCE_ALPHA
			);
 
			// Cull and sort
			var beforeCullingTime:int = getTimer();
			if (fastSorting)
			{
				sortFast();
			}
			else
			{
				sortOriginal();
			}
			var afterCullingTime:int = getTimer();
 
			// Draw visible cubes
			var worldToClip:Matrix3D = camera.worldToClipMatrix;
			var drawMatrix:Matrix3D = TEMP_DRAW_MATRIX;
			var numDraws:int;
			for each (var cube:Cube in visibleCubes)
			{
				cube.mat.copyToMatrix3D(drawMatrix);
				drawMatrix.prepend(worldToClip);
				context3D.setProgramConstantsFromMatrix(
					Context3DProgramType.VERTEX,
					0,
					drawMatrix,
					false
				);
				context3D.drawTriangles(indexBuffer, 0, 12);
				numDraws++;
			}
			context3D.present();
 
			// Update stat displays
			draws.text = "Draws: " + numDraws + " / " + NUM_CUBES_TOTAL
				+ " (" + (100*(numDraws/NUM_CUBES_TOTAL)).toFixed(1) + "%)\n"
				+ "Culling Time: " + (afterCullingTime-beforeCullingTime);
			frameCount++;
			var now:int = getTimer();
			var elapsed:int = now - lastFPSUpdateTime;
			if (elapsed > 1000)
			{
				var framerateValue:Number = 1000 / (elapsed / frameCount);
				fps.text = "FPS: " + framerateValue.toFixed(1);
				lastFPSUpdateTime = now;
				frameCount = 0;
			}
			lastFrameTime = now;
		}
	}
}
import flash.geom.*;
class Cube
{
	private static var NEXT_ID:int = 0;
 
	public var id:int = NEXT_ID++;
 
	public var posX:Number;
	public var posY:Number;
	public var posZ:Number;
	public var mat:Matrix3D;
	public var sphere:Vector3D;
	public var camDist:Number;
 
	public function Cube(x:Number, y:Number, z:Number)
	{
		posX = x;
		posY = y;
		posZ = z;
 
		mat = new Matrix3D(
			new <Number>[
				1, 0, 0, x,
				0, 1, 0, y,
				0, 0, 1, z,
				0, 0, 0, 1
			]
		);
		sphere = new Vector3D(x, y, z, 2);
	}
}

Launch the test app

I ran this test app in the following environment:

  • Flex SDK (MXMLC) 4.6.0.23201, compiling in release mode (no debugging or verbose stack traces)
  • Release version of Flash Player 11.2.202.235
  • 2.4 Ghz Intel Core i5
  • Mac OS X 10.7.4
  • NVIDIA GeForce GT 330M 256 MB

And here are the results I got:

Num Cubes Original Sort Time Fast Sort Time
32768 53 30
0 40 10

Alpha Sort Times Graph

These two tests show the two optimizations in full effect. When all of the cubes are visible (first test), both approaches end up sorting all the cubes since they all pass the view frustum check. Therefore the only optimization being applied is the switch from sorting using Vector.sort (which uses a compare function) and sorting using Skyboy’s fastSort function (which uses a “distance from camera” field). This alone makes sorting twice as fast as it otherwise was.

The second case is where I’ve pointed the camera away from the cubes and none of them pass the view frustum check. In this case, zero cubes are being sorted in the “fast” method and all 32768 are being sorted in the “original” method. This results in a 3x speedup over the “fast” approach with all of the cubes present and a 4x speedup over the “original” method.

The above optimizations are just a couple of ways of improving performance when alpha textures are used in a 3D scene. If you have more techniques to suggest or have simply spotted a bug or have a suggestion, post a comment and let me know!