Stage3D KIL Command
Flash Player 11’s Stage3D
gives you only a few opcodes to do conditional logic and none of them are as effective as the good old if
in AS3. The most unique one of all is the KIL
opcode that discards the pixel being drawn. How does it work? How does it perform? Today we’ll find out!
Recall from Stage3D Pipeline In A Nutshell the required steps of how the pixels of a 3D scene are drawn:
- Indexed triangles are uploaded to the GPU
- A shader program (vertex shader + fragment/pixel shader) is uploaded to the GPU
- The GPU is told to render the indexed triangles from step #1 with the shader program from step #2
- The vertex shader from step #2 is called for every vertex of the triangles from step #1
- The triangles are clipped against the view volume (usually a frustum)
- The X, Y, and Z components of the output vertices from step #4 are divided by the W component
- The triangles are rasterized. For each pixel, the fragment shader from step #2 is called.
- The fragment shader either discards/kills the fragment/pixel with a
KIL
opcode or draws it by moving to theop
register
Notice how the KIL
opcode only comes into play at the very last step of the process. This means that pixels you discard with KIL
have to go through a lot of processing only to be ultimately discarded and not drawn. To see what effect this has on performance, I’ve written a little test app:
package { import com.adobe.utils.*; import flash.display3D.textures.*; import flash.display3D.*; import flash.display.*; import flash.filters.*; import flash.events.*; import flash.text.*; import flash.utils.*; public class TestKIL extends Sprite { private static const VERT_DATA:Vector.<Number> = new <Number>[ -1, -1, 0, 1, -1, 0, 1, 1, 0, -1, 1, 0, ]; private static const TRIS:Vector.<uint> = new <uint>[ 0, 1, 2, 0, 2, 3 ]; private static const FRAG_CONST:Vector.<Number> = new <Number>[ 0, 1, 0, 1, // color -1, -1, -1, -1 // value to KIL on ]; private var context3D:Context3D; private var vertexBuffer:VertexBuffer3D; private var indexBuffer:IndexBuffer3D; private var program:Program3D; private var texture:Texture; private var fps:TextField = new TextField(); private var lastFPSUpdateTime:uint; private var lastFrameTime:uint; private var frameCount:uint; private var driver:TextField = new TextField(); private var extraRendersText:TextField = new TextField(); private static const MODE_NOKIL:String = "No KIL"; private static const MODE_KILEARLY:String = "KIL Early"; private static const MODE_KILLATE:String = "KIL Late"; private var mode:String = MODE_NOKIL; private var extraRenders:int; public function TestKIL() { stage.align = StageAlign.TOP_LEFT; stage.scaleMode = StageScaleMode.NO_SCALE; stage.frameRate = 60; setupContext(Context3DRenderMode.AUTO); } private function setupContext(renderMode:String): void { driver.text = "Setting up context with render mode: " + renderMode; var stage3D:Stage3D = stage.stage3Ds[0]; stage3D.addEventListener(Event.CONTEXT3D_CREATE, onContextCreated); stage3D.requestContext3D(renderMode); } protected function onContextCreated(ev:Event): void { var firstTime:Boolean = context3D == null; // Setup context var stage3D:Stage3D = stage.stage3Ds[0]; stage3D.removeEventListener(Event.CONTEXT3D_CREATE, onContextCreated); context3D = stage3D.context3D; context3D.configureBackBuffer( stage.stageWidth, stage.stageHeight, 0, true ); // Setup UI driver.text = "Driver: " + context3D.driverInfo; if (firstTime) { makeButtons( MODE_NOKIL, MODE_KILEARLY, MODE_KILLATE, "Toggle Hardware", "Extra Renders +", "Extra Renders -" ); fps.autoSize = TextFieldAutoSize.LEFT; fps.text = "Getting FPS..."; addChild(fps); driver.autoSize = TextFieldAutoSize.LEFT; driver.y = fps.height; addChild(driver); extraRendersText.autoSize = TextFieldAutoSize.LEFT; extraRendersText.y = driver.y + driver.height; addChild(extraRendersText); setExtraRenders(extraRenders); } setMode(mode); // Setup buffers if (vertexBuffer) { vertexBuffer.dispose(); indexBuffer.dispose(); } vertexBuffer = context3D.createVertexBuffer(4, 3); vertexBuffer.uploadFromVector(VERT_DATA, 0, 4); indexBuffer = context3D.createIndexBuffer(6); indexBuffer.uploadFromVector(TRIS, 0, 6); texture = context3D.createTexture( 2048, 2048, Context3DTextureFormat.BGRA, true ); // Begin rendering every frame if (firstTime) { addEventListener(Event.ENTER_FRAME, onEnterFrame); } else { frameCount = 0; lastFPSUpdateTime = lastFrameTime = getTimer(); } } private function makeProgram(): void { var assembler:AGALMiniAssembler = new AGALMiniAssembler(); // Vertex shader var vertSource:String = "mov op, va0\nmov v0, vc0\n"; assembler.assemble(Context3DProgramType.VERTEX, vertSource); var vertexShaderAGAL:ByteArray = assembler.agalcode; // Fragment shader var fragSource:String = ""; switch (mode) { case MODE_NOKIL: fragSource = "mov oc, fc0\n"; break; case MODE_KILEARLY: fragSource += "mov ft0, fc1\n"; fragSource += "kil ft0.x\n"; fragSource += "mov oc, fc0\n"; break; case MODE_KILLATE: fragSource = "mov ft0, fc0\n"; for (var i:int; i < 150; ++i) { fragSource += "add ft0, ft0, ft0\n"; } fragSource += "mov ft0, fc1\n"; fragSource += "kil ft0.x\n"; fragSource += "mov oc, fc0\n"; break; } assembler.assemble(Context3DProgramType.FRAGMENT, fragSource); var fragmentShaderAGAL:ByteArray = assembler.agalcode; // Shader program if (program) { program.dispose(); } program = context3D.createProgram(); program.upload(vertexShaderAGAL, fragmentShaderAGAL); } private function makeButtons(...labels): void { const PAD:Number = 5; var curX:Number = PAD; var curY:Number = stage.stageHeight - PAD; for each (var label:String in labels) { var tf:TextField = new TextField(); tf.mouseEnabled = false; tf.selectable = false; tf.defaultTextFormat = new TextFormat("_sans", 16, 0x0071BB); tf.autoSize = TextFieldAutoSize.LEFT; tf.text = label; tf.name = "lbl"; var button:Sprite = new Sprite(); button.buttonMode = true; button.graphics.beginFill(0xF5F5F5); button.graphics.drawRect(0, 0, tf.width+PAD, tf.height+PAD); button.graphics.endFill(); button.graphics.lineStyle(1); button.graphics.drawRect(0, 0, tf.width+PAD, tf.height+PAD); button.addChild(tf); button.addEventListener(MouseEvent.CLICK, onButton); if (curX + button.width > stage.stageWidth - PAD) { curX = PAD; curY -= button.height + PAD; } button.x = curX; button.y = curY - button.height; addChild(button); curX += button.width + PAD; } } private function onButton(ev:MouseEvent): void { var mode:String = TextField(Sprite(ev.target).getChildByName("lbl")).text; switch (mode) { case "Toggle Hardware": var oldRenderMode:String = context3D.driverInfo; context3D.dispose(); driver.text = "Toggling hardware..."; setupContext( oldRenderMode.toLowerCase().indexOf("software") >= 0 ? Context3DRenderMode.AUTO : Context3DRenderMode.SOFTWARE ); break; case "Extra Renders +": setExtraRenders(extraRenders+1); break; case "Extra Renders -": setExtraRenders(extraRenders-1); break; case MODE_NOKIL: case MODE_KILEARLY: case MODE_KILLATE: setMode(mode); break; } } private function setMode(mode:String): void { this.mode = mode; for (var i:int; i < numChildren; ++i) { var child:DisplayObject = getChildAt(i); if (child is Sprite) { var spr:Sprite = child as Sprite; var lbl:TextField = spr.getChildByName("lbl") as TextField; if (lbl.text == mode) { spr.filters = [new GlowFilter(0x261C13)]; } else { spr.filters = []; } } } makeProgram(); } private function setExtraRenders(extra:int): void { extraRenders = extra; extraRendersText.text = "Extra Renders: " + extra; } private function onEnterFrame(ev:Event): void { if (!context3D) { return; } // Render scene context3D.setProgram(program); context3D.setVertexBufferAt( 0, vertexBuffer, 0, Context3DVertexBufferFormat.FLOAT_3 ); context3D.setProgramConstantsFromVector( Context3DProgramType.FRAGMENT, 0, FRAG_CONST ); if (extraRenders) { context3D.setRenderToTexture(texture); context3D.clear(0.5, 0.5, 0.5); for (var i:int; i < extraRenders; ++i) { context3D.drawTriangles(indexBuffer, 0, 2); } } context3D.setRenderToBackBuffer(); context3D.clear(1, 0, 0); context3D.drawTriangles(indexBuffer, 0, 2); context3D.present(); // Update frame rate display frameCount++; var now:int = getTimer(); var elapsed:int = now - lastFPSUpdateTime; if (elapsed > 1000) { var framerateValue:Number = 1000 / (elapsed / frameCount); fps.text = "FPS: " + framerateValue.toFixed(4); lastFPSUpdateTime = now; frameCount = 0; } lastFrameTime = now; } } }
When I run the test app on my mid-2010 MacBook Pro I end up with results that look very similar regardless of the mode chosen and number of extra draws. This matches up with the above description of the 3D pipeline and shows that the actual work of drawing the pixel is roughly comparable (in my testing environment) to the discarding of the fragment/pixel. In other testing environments, such as mobile devices, the KIL
opcode may be even more expensive (c.f. PowerVR’s notes on their popular mobile GPUs). When in software, doing a lot of fragment shader instructions before ultimately killing the pixel (i.e. the “KIL Late” button) results in a 2-10x slowdown. In any case, the results show that drawing no pixels takes just as long as drawing tons of pixels. It is clearly wasteful to do so many calculations with literally zero results to show for it.
All that said, the KIL
opcode does have uses in some scenarios. If you use it sparingly and make sure to test it on all of your target GPUs, you can achieve all sorts of effects ranging from the everyday alpha testing to paraboloid cameras for real-time point light shadowing.
Spot a bug? Have a question? Post a comment!
#1 by Volgogradetzzz on June 18th, 2012 ·
Hi. Thanks for the post. I made some experiments and found that division by the W component goes after fragment shader. I.e. fragment shader receive NON-perspective corrected values from vertex shader. So I think that 5-th step in your description of the pipelene is in wrong place.
#2 by jackson on June 18th, 2012 ·
You’re right that the perspective divide (by W) comes after clipping, so I’ve reversed steps #5 and #6. You’re also right that the values you get in the fragment shader are perspective correct. Thanks for pointing this out; I’ve updated the article.
#3 by ben w on June 18th, 2012 ·
Hey Jackson, when I run the demo with 50 extra renders here’s what I get:
no kill : 30 – 34 fps
kill early : 60 fps solid
kill late: 59-60 fps
on the other hand in software the kill just seems to slow things down, but that’s pretty much a x2 speed up on my machine in hardware mode
#4 by jackson on June 18th, 2012 ·
Hey Ben. Thanks for sharing your results. As mentioned in the article (though probably not strongly enough), the results you get will vary wildly by testing environment. Your specific GPU (or CPU in the case of software rendering), complexity of your shader, and other factors will weigh heavily on the results you see. The software case (which will happen on “real world” machines) should be enough to alert you to the potential performance loss that
KIL
can cause. I wish I had some mobile tests to augment…#5 by Tyler on June 18th, 2012 ·
I may be misunderstanding the results. For me, in hardware mode, KIL early and KIL late increase the frame rate significantly. With 40 extra renders for example, I’m seeing about 20fps with no KIL and just under 40fps with either KIL Early or KIL Late (very little variation between these in hardware mode). In software mode, I think I see the same thing you do – KIL Late makes things worse.
#6 by jackson on June 18th, 2012 ·
Hey Tyler, thanks for posting your results. The results you see will be incredibly environment-specific. I tried to clear this up in a comment above.
#7 by phillip on December 12th, 2012 ·
hi and thanks for this. i can display an atf of my intro graphic before starting up starling but how do i dispose/clear the atf gx after. I’ve tried context3D.clear and dispose etc but it seems to sit on top of the starling layer. thanks in advance, phillip
#8 by jackson on December 12th, 2012 ·
Context3D.clear
will clear the screen, but if you draw the screen again (i.e.Context3D.drawTriangles
thenContext3D.present
) then it won’t be cleared anymore. You basically want toContext3D.clear
and then don’t draw the triangles with your ATF texture again.You might also want to consider drawing the ATF texture with Starling since it’s built for that sort of thing. Here’s a wiki article about about how to use an ATF texture on a Starling
Image
.#9 by phillip on December 13th, 2012 ·
thanks for the quick reply, I’m trying to use ATF for my load image before starling starts up, (I’m having real problems getting ipad1 to keep afloat with large background images as pngs) and with you help got the image loaded as ATF but using clear will clear the load image but then I’ll get black or red (ex: context3D.clear(1, 0, 0)) instead. Is there something else I need to call to redraw the area? I can hear the game running underneath but the ATF is on top. if I try to context3D.dispose I get problems because starling uses it as well. any thoughts. thanks again
#10 by jackson on December 13th, 2012 ·
Perhaps you have some kind of conflict with your ATF-using code and Starling. The best option is probably to just use Starling to show the ATF. It shouldn’t take long to start up, so there shouldn’t be much of a delay before your load image shows.