Since Flash Player 11’s new Stage3D allows us to utilize hardware-acceleration for 3D graphics, that entails a whole new set of performance we need to consider. Today’s article discusses the performance of uploading data from system memory (RAM) to video memory (VRAM), such as when you upload textures, vertex buffers, and index buffers. Is it faster to upload to one type rather than another? Is it faster to upload from a Vector, a ByteArray, or a BitmapData? Is there a significant speedup when using software rendering so that VRAM is the same as RAM? Find out the answers to all of these questions below.

The below performance test checks the upload speeds in both hardware and software mode of all of these types:

  • Texture from…
    • BitmapData
    • Vector
    • ByteArray
  • VertexBuffer3D from…
    • Vector
    • ByteArray
  • IndexBuffer3D from…
    • Vector
    • ByteArray

Check it out:

package
{
	import flash.display3D.*;
	import flash.display3D.textures.*;
	import flash.external.*;
	import flash.display.*;
	import flash.sampler.*;
	import flash.system.*;
	import flash.events.*;
	import flash.utils.*;
	import flash.text.*;
	import flash.geom.*;
 
	import com.adobe.utils.*;
 
	public class Stage3DUploadTester extends Sprite
	{
		private var __stage3D:Stage3D;
		private var __logger:TextField = new TextField();
		private var __context:Context3D;
		private var __driverInfo:String;
		private var __texture:Texture;
		private var __bmdNoAlpha:BitmapData;
		private var __bmdAlpha:BitmapData;
		private var __texBytes:ByteArray;
		private var __vertexBuffer:VertexBuffer3D;
		private var __vbVector:Vector.<Number>;
		private var __vbBytes:ByteArray;
		private var __indexBuffer:IndexBuffer3D;
		private var __ibVector:Vector.<uint>;
		private var __ibBytes:ByteArray;
 
		public function Stage3DUploadTester()
		{
			__stage3D = stage.stage3Ds[0];
 
			__logger.autoSize = TextFieldAutoSize.LEFT;
			addChild(__logger);
 
			// Allocate texture data
			__bmdNoAlpha = new BitmapData(2048, 2048, false, 0xffffffff);
			__bmdAlpha = new BitmapData(2048, 2048, true, 0xffffffff);
			__texBytes = new ByteArray();
			var size:int = __texBytes.length = 2048*2048*4;
			for (var i:int; i < size; ++i)
			{
				__texBytes[i] = 0xffffffff;
			}
 
			// Allocate vertex buffer data
			size = 65535*64;
			__vbVector = new Vector.<Number>(size);
			for (i = 0; i < size; ++i)
			{
				__vbVector[i] = 1.0;
			}
			__vbBytes = new ByteArray();
			__vbBytes.length = size*4;
			for (i = 0; i < size; ++i)
			{
				__vbBytes.writeFloat(1.0);
			}
			__vbBytes.position = 0;
 
			// Allocate index buffer data
			size = 524287;
			__ibVector = new Vector.<uint>(size);
			for (i = 0; i < size; ++i)
			{
				__ibVector[i] = 1.0;
			}
			__ibBytes = new ByteArray();
			__ibBytes.length = size*4;
			for (i = 0; i < size; ++i)
			{
				__ibBytes.writeFloat(1.0);
			}
			__ibBytes.position = 0;
 
			setupContext(Context3DRenderMode.AUTO);
		}
 
		private function setupContext(renderMode:String): void
		{
			__stage3D.addEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
			__stage3D.requestContext3D(renderMode);
		}
 
		private function onContextCreated(ev:Event): void
		{
			__stage3D.removeEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
 
			var first:Boolean = __logger.text.length == 0;
			if (first)
			{
				__logger.appendText("Driver,Test,Time,Bytes/Sec\n");
			}
 
			const width:int = stage.stageWidth;
			const height:int = stage.stageHeight;
 
			__context = __stage3D.context3D;
			__context.configureBackBuffer(width, height, 0, true);
			__driverInfo = __context.driverInfo;
			__texture = __context.createTexture(
				2048,
				2048,
				Context3DTextureFormat.BGRA,
				false
			);
			__vertexBuffer = __context.createVertexBuffer(65535, 64);
			__indexBuffer = __context.createIndexBuffer(524287);
 
			runTests();
 
			if (first)
			{
				__context.dispose();
				setupContext(Context3DRenderMode.SOFTWARE);
			}
		}
 
		private function runTests(): void
		{
			var beforeTime:int;
			var afterTime:int;
			var time:int;
 
			beforeTime = getTimer();
			__texture.uploadFromBitmapData(__bmdNoAlpha);
			afterTime = getTimer();
			time = afterTime - beforeTime;
			row("Texture from BitmapData w/o alpha", time, 2048*2048*4);
 
			beforeTime = getTimer();
			__texture.uploadFromBitmapData(__bmdAlpha);
			afterTime = getTimer();
			time = afterTime - beforeTime;
			row("Texture from BitmapData w/ alpha", time, 2048*2048*4);
 
			beforeTime = getTimer();
			__texture.uploadFromByteArray(__texBytes, 0);
			afterTime = getTimer();
			time = afterTime - beforeTime;
			row("Texture from ByteArray", time, 2048*2048*4);
 
			beforeTime = getTimer();
			__vertexBuffer.uploadFromVector(__vbVector, 0, 65535);
			afterTime = getTimer();
			time = afterTime - beforeTime;
			row("VertexBuffer from Vector", time, 65535*64*4);
 
			beforeTime = getTimer();
			__vertexBuffer.uploadFromByteArray(__vbBytes, 0, 0, 65535);
			afterTime = getTimer();
			time = afterTime - beforeTime;
			row("VertexBuffer from ByteArray", time, 65535*64*4);
 
			beforeTime = getTimer();
			__indexBuffer.uploadFromVector(__ibVector, 0, 524287);
			afterTime = getTimer();
			time = afterTime - beforeTime;
			row("IndexBuffer from Vector", time, 524287*4);
 
			beforeTime = getTimer();
			__indexBuffer.uploadFromByteArray(__ibBytes, 0, 0, 524287);
			afterTime = getTimer();
			time = afterTime - beforeTime;
			row("IndexBuffer from ByteArray", time, 524287*4);
		}
 
		private function row(name:String, time:int, bytes:int): void
		{
			__logger.appendText(
				__driverInfo + ","
				+ name + ","
				+ time + ","
				+ (bytes/time).toFixed(2) + "\n"
			);
		}
	}
}

Try out the test

I ran this performance test with the following environment:

  • Flex SDK (MXMLC) 4.5.1.21328, compiling in release mode (no debugging or verbose stack traces)
  • Release version of Flash Player 11.0.1.152
  • 2.4 Ghz Intel Core i5
  • Mac OS X 10.7.2

And got these results:

Driver Test Time Bytes/Sec
OpenGL (Direct blitting) Texture from BitmapData w/o alpha 22 762600.73
OpenGL (Direct blitting) Texture from BitmapData w/ alpha 18 932067.56
OpenGL (Direct blitting) Texture from ByteArray 18 932067.56
OpenGL (Direct blitting) VertexBuffer from Vector 42 399451.43
OpenGL (Direct blitting) VertexBuffer from ByteArray 5 3355392.00
OpenGL (Direct blitting) IndexBuffer from Vector 3 699049.33
OpenGL (Direct blitting) IndexBuffer from ByteArray 1 2097148.00
Software (Direct blitting) Texture from BitmapData w/o alpha 12 1398101.33
Software (Direct blitting) Texture from BitmapData w/ alpha 5 3355443.20
Software (Direct blitting) Texture from ByteArray 5 3355443.20
Software (Direct blitting) VertexBuffer from Vector 15 1118464.00
Software (Direct blitting) VertexBuffer from ByteArray 5 3355392.00
Software (Direct blitting) IndexBuffer from Vector 3 699049.33
Software (Direct blitting) IndexBuffer from ByteArray 2 1048574.00

Upload speeds graph (hardware)

Upload speeds graph (software)

There is a clear order of speed in all tests, regardless of hardware or software or type of GPU resource being uploaded to:

  1. ByteArray (fastest)
  2. Vector
  3. BitmapData (slowest)

Only the magnitude of the advantage changes with this. In particular, if you can manage to upload a vertex or index buffer from a ByteArray, you’re assured a huge performance win.

Uploading texture data seems much faster in software compared to hardware: a 3x improvement. As for vertex and index buffers, it’s more of a mixed bag. Software is faster when uploading vertex buffers from a Vector, hardware is faster when uploading index buffers from a ByteArray, and the rest are a tie. Vertex buffers are curiously quicker to upload than index buffers. The difference is more dramatic with software rendering (3x faster) than hardware rendering (50% faster).

More so than ever before in my performance articles is it important to keep in mind that the performance results posted above are valid only for the test environment that produced them. These numbers may change on Windows, which uses DirectX instead of OpenGL, or any of a number of mobile handsets using OpenGL ES.

Spot a bug? Have a suggestion? Different results on your environment? Post a comment!