Flash Player 11’s Stage3D gives you only a few opcodes to do conditional logic and none of them are as effective as the good old if in AS3. The most unique one of all is the KIL opcode that discards the pixel being drawn. How does it work? How does it perform? Today we’ll find out!

Recall from Stage3D Pipeline In A Nutshell the required steps of how the pixels of a 3D scene are drawn:

  1. Indexed triangles are uploaded to the GPU
  2. A shader program (vertex shader + fragment/pixel shader) is uploaded to the GPU
  3. The GPU is told to render the indexed triangles from step #1 with the shader program from step #2
  4. The vertex shader from step #2 is called for every vertex of the triangles from step #1
  5. The triangles are clipped against the view volume (usually a frustum)
  6. The X, Y, and Z components of the output vertices from step #4 are divided by the W component
  7. The triangles are rasterized. For each pixel, the fragment shader from step #2 is called.
  8. The fragment shader either discards/kills the fragment/pixel with a KIL opcode or draws it by moving to the op register

Notice how the KIL opcode only comes into play at the very last step of the process. This means that pixels you discard with KIL have to go through a lot of processing only to be ultimately discarded and not drawn. To see what effect this has on performance, I’ve written a little test app:

package
{
	import com.adobe.utils.*;
	import flash.display3D.textures.*;
	import flash.display3D.*;
	import flash.display.*;
	import flash.filters.*;
	import flash.events.*;
	import flash.text.*;
	import flash.utils.*;
 
	public class TestKIL extends Sprite 
	{
		private static const VERT_DATA:Vector.<Number> = new <Number>[
			-1, -1, 0,
			1, -1, 0,
			1, 1, 0,
			-1, 1, 0,
		];
		private static const TRIS:Vector.<uint> = new <uint>[
			0, 1, 2,
			0, 2, 3
		];
		private static const FRAG_CONST:Vector.<Number> = new <Number>[
			0, 1, 0, 1, // color
			-1, -1, -1, -1 // value to KIL on
		];
 
		private var context3D:Context3D;
		private var vertexBuffer:VertexBuffer3D;
		private var indexBuffer:IndexBuffer3D; 
		private var program:Program3D;
		private var texture:Texture;
 
		private var fps:TextField = new TextField();
		private var lastFPSUpdateTime:uint;
		private var lastFrameTime:uint;
		private var frameCount:uint;
		private var driver:TextField = new TextField();
		private var extraRendersText:TextField = new TextField();
 
		private static const MODE_NOKIL:String = "No KIL";
		private static const MODE_KILEARLY:String = "KIL Early";
		private static const MODE_KILLATE:String = "KIL Late";
		private var mode:String = MODE_NOKIL;
 
		private var extraRenders:int;
 
		public function TestKIL()
		{
			stage.align = StageAlign.TOP_LEFT;
			stage.scaleMode = StageScaleMode.NO_SCALE;
			stage.frameRate = 60;
			setupContext(Context3DRenderMode.AUTO);
		}
 
		private function setupContext(renderMode:String): void
		{
			driver.text = "Setting up context with render mode: " + renderMode;
			var stage3D:Stage3D = stage.stage3Ds[0];
			stage3D.addEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
			stage3D.requestContext3D(renderMode);
		}
 
		protected function onContextCreated(ev:Event): void
		{
			var firstTime:Boolean = context3D == null;
 
			// Setup context
			var stage3D:Stage3D = stage.stage3Ds[0];
			stage3D.removeEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
			context3D = stage3D.context3D;			
			context3D.configureBackBuffer(
				stage.stageWidth,
				stage.stageHeight,
				0,
				true
			);
 
			// Setup UI
			driver.text = "Driver: " + context3D.driverInfo;
			if (firstTime)
			{
				makeButtons(
					MODE_NOKIL, MODE_KILEARLY, MODE_KILLATE,
					"Toggle Hardware", "Extra Renders +", "Extra Renders -"
				);
 
				fps.autoSize = TextFieldAutoSize.LEFT;
				fps.text = "Getting FPS...";
				addChild(fps);
 
				driver.autoSize = TextFieldAutoSize.LEFT;
				driver.y = fps.height;
				addChild(driver);
 
				extraRendersText.autoSize = TextFieldAutoSize.LEFT;
				extraRendersText.y = driver.y + driver.height;
				addChild(extraRendersText);
 
				setExtraRenders(extraRenders);
			}
 
			setMode(mode);
 
			// Setup buffers
			if (vertexBuffer)
			{
				vertexBuffer.dispose();
				indexBuffer.dispose();
			}
			vertexBuffer = context3D.createVertexBuffer(4, 3);
			vertexBuffer.uploadFromVector(VERT_DATA, 0, 4);
			indexBuffer = context3D.createIndexBuffer(6);
			indexBuffer.uploadFromVector(TRIS, 0, 6);
			texture = context3D.createTexture(
				2048,
				2048,
				Context3DTextureFormat.BGRA,
				true
			);
 
			// Begin rendering every frame
			if (firstTime)
			{
				addEventListener(Event.ENTER_FRAME, onEnterFrame);
			}
			else
			{
				frameCount = 0;
				lastFPSUpdateTime = lastFrameTime = getTimer();
			}
		}
 
		private function makeProgram(): void
		{
			var assembler:AGALMiniAssembler = new AGALMiniAssembler();
 
			// Vertex shader
			var vertSource:String = "mov op, va0\nmov v0, vc0\n";
			assembler.assemble(Context3DProgramType.VERTEX, vertSource);
			var vertexShaderAGAL:ByteArray = assembler.agalcode;
 
			// Fragment shader
			var fragSource:String = "";
			switch (mode)
			{
				case MODE_NOKIL:
					fragSource = "mov oc, fc0\n";
					break;
				case MODE_KILEARLY:
					fragSource += "mov ft0, fc1\n";
					fragSource += "kil ft0.x\n";
					fragSource += "mov oc, fc0\n";
					break;
				case MODE_KILLATE:
					fragSource = "mov ft0, fc0\n";
					for (var i:int; i < 150; ++i)
					{
						fragSource += "add ft0, ft0, ft0\n";
					}
					fragSource += "mov ft0, fc1\n";
					fragSource += "kil ft0.x\n";
					fragSource += "mov oc, fc0\n";
					break;
			}
			assembler.assemble(Context3DProgramType.FRAGMENT, fragSource);
			var fragmentShaderAGAL:ByteArray = assembler.agalcode;
 
			// Shader program
			if (program)
			{
				program.dispose();
			}
			program = context3D.createProgram();
			program.upload(vertexShaderAGAL, fragmentShaderAGAL);
		}
 
		private function makeButtons(...labels): void
		{
			const PAD:Number = 5;
 
			var curX:Number = PAD;
			var curY:Number = stage.stageHeight - PAD;
			for each (var label:String in labels)
			{
				var tf:TextField = new TextField();
				tf.mouseEnabled = false;
				tf.selectable = false;
				tf.defaultTextFormat = new TextFormat("_sans", 16, 0x0071BB);
				tf.autoSize = TextFieldAutoSize.LEFT;
				tf.text = label;
				tf.name = "lbl";
 
				var button:Sprite = new Sprite();
				button.buttonMode = true;
				button.graphics.beginFill(0xF5F5F5);
				button.graphics.drawRect(0, 0, tf.width+PAD, tf.height+PAD);
				button.graphics.endFill();
				button.graphics.lineStyle(1);
				button.graphics.drawRect(0, 0, tf.width+PAD, tf.height+PAD);
				button.addChild(tf);
				button.addEventListener(MouseEvent.CLICK, onButton);
				if (curX + button.width > stage.stageWidth - PAD)
				{
					curX = PAD;
					curY -= button.height + PAD;
				}
				button.x = curX;
				button.y = curY - button.height;
				addChild(button);
 
				curX += button.width + PAD;
			}
		}
 
		private function onButton(ev:MouseEvent): void
		{
			var mode:String = TextField(Sprite(ev.target).getChildByName("lbl")).text;
			switch (mode)
			{
				case "Toggle Hardware":
					var oldRenderMode:String = context3D.driverInfo;
					context3D.dispose();
					driver.text = "Toggling hardware...";
					setupContext(
						oldRenderMode.toLowerCase().indexOf("software") >= 0
							? Context3DRenderMode.AUTO
							: Context3DRenderMode.SOFTWARE
					);
					break;
				case "Extra Renders +":
					setExtraRenders(extraRenders+1);
					break;
				case "Extra Renders -":
					setExtraRenders(extraRenders-1);
					break;
				case MODE_NOKIL:
				case MODE_KILEARLY:
				case MODE_KILLATE:
					setMode(mode);
					break;
			}
		}
 
		private function setMode(mode:String): void
		{
			this.mode = mode;
 
			for (var i:int; i < numChildren; ++i)
			{
				var child:DisplayObject = getChildAt(i);
				if (child is Sprite)
				{
					var spr:Sprite = child as Sprite;
					var lbl:TextField = spr.getChildByName("lbl") as TextField;
					if (lbl.text == mode)
					{
						spr.filters = [new GlowFilter(0x261C13)];
					}
					else
					{
						spr.filters = [];
					}
				}
			}
 
			makeProgram();
		}
 
		private function setExtraRenders(extra:int): void
		{
			extraRenders = extra;
			extraRendersText.text = "Extra Renders: " + extra;
		}
 
		private function onEnterFrame(ev:Event): void
		{
			if (!context3D)
			{
				return;
			}
 
			// Render scene
			context3D.setProgram(program);
			context3D.setVertexBufferAt(
				0,
				vertexBuffer,
				0,
				Context3DVertexBufferFormat.FLOAT_3
			);
			context3D.setProgramConstantsFromVector(
				Context3DProgramType.FRAGMENT,
				0,
				FRAG_CONST
			);
 
			if (extraRenders)
			{
				context3D.setRenderToTexture(texture);
				context3D.clear(0.5, 0.5, 0.5);
				for (var i:int; i < extraRenders; ++i)
				{
					context3D.drawTriangles(indexBuffer, 0, 2);
				}
			}
			context3D.setRenderToBackBuffer();
			context3D.clear(1, 0, 0);
			context3D.drawTriangles(indexBuffer, 0, 2);
			context3D.present();
 
			// Update frame rate display
			frameCount++;
			var now:int = getTimer();
			var elapsed:int = now - lastFPSUpdateTime;
			if (elapsed > 1000)
			{
				var framerateValue:Number = 1000 / (elapsed / frameCount);
				fps.text = "FPS: " + framerateValue.toFixed(4);
				lastFPSUpdateTime = now;
				frameCount = 0;
			}
			lastFrameTime = now;
		}
	}
}

Launch the test app

When I run the test app on my mid-2010 MacBook Pro I end up with results that look very similar regardless of the mode chosen and number of extra draws. This matches up with the above description of the 3D pipeline and shows that the actual work of drawing the pixel is roughly comparable (in my testing environment) to the discarding of the fragment/pixel. In other testing environments, such as mobile devices, the KIL opcode may be even more expensive (c.f. PowerVR’s notes on their popular mobile GPUs). When in software, doing a lot of fragment shader instructions before ultimately killing the pixel (i.e. the “KIL Late” button) results in a 2-10x slowdown. In any case, the results show that drawing no pixels takes just as long as drawing tons of pixels. It is clearly wasteful to do so many calculations with literally zero results to show for it.

All that said, the KIL opcode does have uses in some scenarios. If you use it sparingly and make sure to test it on all of your target GPUs, you can achieve all sorts of effects ranging from the everyday alpha testing to paraboloid cameras for real-time point light shadowing.

Spot a bug? Have a question? Post a comment!