At long last, Flash Player 11 has been released and carries with it a raft of exciting new features. Perhaps most exciting is the inclusion of the new Stage3D class (and related libraries) to enable GPU-accelerated graphics rendering. Today’s article will be the first to cover this new API and discusses one of its features: reading back the rendered scene into a BitmapData that you can put on the regular Stage. Surely this will be a popular operation for merging 3D and 2D, so let’s see how fast it is!

If hardware acceleration is being used, the pixels will need to be sent back from video card memory (VRAM) into main system memory (RAM), which can be a very expensive operation. If the software renderer is being used instead of hardware acceleration, the pixels will already be in RAM so the transfer will be—theoretically—a much quicker memory copy operation.

To test this theory, I wrote a little performance app. It draws absolutely nothing with the Stage3D API and only displays a little UI for controlling the app. This way, we can isolate the performance of Context3D.drawToBitmapData, which is responsible for reading the Stage3D‘s pixels into a BitmapData.

package
{
	import flash.display3D.*;
	import flash.external.*;
	import flash.display.*;
	import flash.sampler.*;
	import flash.system.*;
	import flash.events.*;
	import flash.utils.*;
	import flash.text.*;
	import flash.geom.*;
 
	import com.adobe.utils.*;
 
	[SWF(width=640,height=480,backgroundColor=0xEEEAD9)]
	public class Stage3DReadback extends Sprite
	{
		private static const PAD:Number = 3;
		private static const TEXT_FORMAT:TextFormat = new TextFormat("_sans", 11);
 
		private var __stage3D:Stage3D;
		private var __tf:TextField = new TextField();
		private var __context:Context3D;
		private var __bmdAlpha:BitmapData;
		private var __bmdNoAlpha:BitmapData;
		private var __mode:String;
		private var __enterFrameHandler:Function;
		private var __driverInfo:String;
 
		public function Stage3DReadback()
		{
			stage.align = StageAlign.TOP_LEFT;
			stage.scaleMode = StageScaleMode.NO_SCALE;
			__stage3D = stage.stage3Ds[0];
 
			makeButton("Toggle Hardware", onToggleHardware);
			makeButton("No Readback", onNoReadback);
			makeButton("Readback (no alpha)", onReadbackNoAlpha);
			makeButton("Readback (alpha)", onReadbackAlpha);
 
			var about:TextField = new TextField();
			about.autoSize = TextFieldAutoSize.LEFT;
			about.defaultTextFormat = TEXT_FORMAT;
			about.htmlText = '<font color="#0071BB">'
				+ '<a href="http://JacksonDunstan.com/articles/1446">'
				+ 'JacksonDunstan.com'
				+ '</a></font>\n'
				+ 'October 2011';
			about.x = stage.stageWidth - PAD - about.width;
			about.y = PAD;
			addChild(about);
 
			var logger:TextField = __tf;
			logger.autoSize = TextFieldAutoSize.LEFT;
			logger.y = this.height;
			addChild(logger);
 
			__mode = "No Readback";
			__enterFrameHandler = onEnterFrameNoReadback;
			setupContext(Context3DRenderMode.AUTO);
		}
 
		private function setupContext(renderMode:String): void
		{
			__tf.text = "Setting up context with render mode: " + renderMode;
			__stage3D.addEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
			__stage3D.requestContext3D(renderMode);
		}
 
		private function onContextCreated(ev:Event): void
		{
			__stage3D.removeEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
 
			const width:int = stage.stageWidth;
			const height:int = stage.stageHeight;
 
			__context = __stage3D.context3D;
			__context.configureBackBuffer(width, height, 0, true);
			__driverInfo = __context.driverInfo;
 
			// First time only
			if (!__bmdNoAlpha)
			{
				__bmdNoAlpha = new BitmapData(width, height, false);
				__bmdAlpha = new BitmapData(width, height, true);
			}
 
			setMode(__mode, __enterFrameHandler);
		}
 
		private function removeAllEnterFrameHandlers(): void
		{
			removeEventListener(Event.ENTER_FRAME, onEnterFrameNoReadback);
			removeEventListener(Event.ENTER_FRAME, onEnterFrameReadbackNoAlpha);
			removeEventListener(Event.ENTER_FRAME, onEnterFrameReadbackAlpha);
		}
 
		private function setMode(name:String, enterFrameHandler:Function): void
		{
			removeAllEnterFrameHandlers();
 
			__mode = name;
			__enterFrameHandler = enterFrameHandler;
			addEventListener(Event.ENTER_FRAME, enterFrameHandler);
		}
 
		private function onToggleHardware(ev:MouseEvent): void
		{
			removeAllEnterFrameHandlers();
			__context.dispose();
			__tf.text = "Toggling hardware...";
			setupContext(
				__driverInfo.toLowerCase().indexOf("software") >= 0
					? Context3DRenderMode.AUTO
					: Context3DRenderMode.SOFTWARE
			);
		}
 
		private function onNoReadback(ev:MouseEvent): void
		{
			setMode("No Readback", onEnterFrameNoReadback);
		}
 
		private function onReadbackNoAlpha(ev:MouseEvent): void
		{
			setMode("Readback (no alpha)", onEnterFrameReadbackNoAlpha);
		}
 
		private function onReadbackAlpha(ev:MouseEvent): void
		{
			setMode("Readback (alpha)", onEnterFrameReadbackAlpha);
		}
 
		private function reportTime(name:String, time:int): void
		{
			__tf.text = __driverInfo + " - " + name + ": " + time + " ms";
		}
 
		private function onEnterFrameNoReadback(ev:Event): void
		{
			var beginTime:int = getTimer();
			__context.clear(0xEE/255, 0xEA/255, 0xD9/255, 1.0);
			__context.present();
			var endTime:int = getTimer();
			var drawTime:int = endTime - beginTime;
 
			reportTime("No readback", drawTime);
		}
 
		private function onEnterFrameReadbackNoAlpha(ev:Event): void
		{
			var beginTime:int = getTimer();
			__context.clear(0xEE/255, 0xEA/255, 0xD9/255, 1.0);
			__context.drawToBitmapData(__bmdNoAlpha);
			__context.present();
			var endTime:int = getTimer();
			var drawTime:int = endTime - beginTime;
 
			reportTime("Readback (no alpha)", drawTime);
		}
 
		private function onEnterFrameReadbackAlpha(ev:Event): void
		{
			var beginTime:int = getTimer();
			__context.clear(0xEE/255, 0xEA/255, 0xD9/255, 1.0);
			__context.drawToBitmapData(__bmdAlpha);
			__context.present();
			var endTime:int = getTimer();
			var drawTime:int = endTime - beginTime;
 
			reportTime("Readback (alpha)", drawTime);
		}
 
		private function makeButton(label:String, callback:Function): void
		{
			var tf:TextField = new TextField();			
			tf.defaultTextFormat = TEXT_FORMAT;
			tf.name = "label";
			tf.text = label;
			tf.autoSize = TextFieldAutoSize.LEFT;
			tf.selectable = false;
			tf.x = tf.y = PAD;
 
			var button:Sprite = new Sprite();
			button.name = label;
			button.graphics.beginFill(0xE6E2D1);
			button.graphics.drawRect(0, 0, tf.width+PAD*2, tf.height+PAD*2);
			button.graphics.endFill();
			button.graphics.lineStyle(1, 0x000000);
			button.graphics.drawRect(0, 0, tf.width+PAD*2, tf.height+PAD*2);
			button.addChild(tf);
			button.addEventListener(MouseEvent.CLICK, callback);
 
			button.x = PAD + this.width;
			button.y = PAD;
			addChild(button);
		}
	}
}

I ran this performance test with the following environment:

  • Flex SDK (MXMLC) 4.5.1.21328, compiling in release mode (no debugging or verbose stack traces)
  • Release version of Flash Player 11.0.1.152
  • 2.4 Ghz Intel Core i5
  • Mac OS X 10.7.1
  • NVIDIA GeForce GT 330M 256 MB

And got these results:

Hardware
Resolution No Readback Readback (no alpha) Readback (alpha)
640×480 0 3 3
800×600 0 4 4
1024×768 0 6 6
1280×720 0 8 8
1920×1080 0 15 15
Software
Resolution No Readback Readback (no alpha) Readback (alpha)
640×480 1 2 2
800×600 1 4 4
1024×768 3 6 6
1280×720 3 7 7
1920×1080 7 15 15

Stage3D Readback Performance (hardware) Chart

Stage3D Readback Performance (software) Chart

Software rendering is clearly slower overall, even with a blank scene. Unfortunately, it seems no faster at reading the scene back into the BitmapData than the hardware-accelerated version. This would have been one of software rendering’s only performance advantages over hardware-accelerated rendering, but it seems as though this optimization is not (yet) in place.

Nonetheless, this test points out an important fact: reading the scene’s pixels back into a BitmapData is very expensive and possibly not feasible in real time with large scenes. For example, a game attempting to run at a smooth 30 frames-per-second has only 33 milliseconds per frame to do its work. If reading the 3D scene back into RAM takes 15 milliseconds, the rest of the game (e.g. physics, sound, 2D rendering, networking) must be quite fast to accommodate it. Also, it’s a good idea to think of older systems than my test machine, which is a relatively new MacBook Pro. Still, if adding 3D content to a 2D stage scene is very important, it seems like it can be accomplished so long as you limit the resolution of the 3D scene.

Spot a bug? Have a suggestion? Different results on a different OS or video card? Post a comment!