String-Sortable Integers
Strings and integers sort differently. Unfortunately, this became a problem for me during some recent experiments with Starling. It could be a problem for you too in a variety of situations. Today we’ll look at a workaround I’ve developed to solve this problem, which isn’t nearly as straightforward as you might think.
To illustrate the problem some more, consider sorting 0-100 as strings:
var a:Array = ["0", "1", "2", "99", "100"]; a.sort(); trace(a); // output: 0,1,100,2,99
Everything’s correct except that 100 is put in with 1 because it starts with the same character. To solve this we’d normally set the sorting mode:
var a:Array = ["0", "1", "2", "99", "100"]; a.sort(Array.NUMERIC); trace(a); // output: 0,1,2,99,100
But what if we don’t have control over the sorting? What if the sorting is being done by a code library or engine we’re using? What if it’s being done by a server we have no control over? Don’t think this can happen? Check out Starling’s TextureAtlas.getTextures: (source)
/** Returns all texture names that start with a certain string, sorted alphabetically. */ public function getNames(prefix:String="", result:Vector.<String>=null):Vector.<String> { if (result == null) result = new <String>[]; for (var name:String in mTextureRegions) if (name.indexOf(prefix) == 0) result.push(name); result.sort(Array.CASEINSENSITIVE); return result; } /** Returns all textures that start with a certain string, sorted alphabetically * (especially useful for "MovieClip"). */ public function getTextures(prefix:String="", result:Vector.<Texture>=null):Vector.<Texture> { if (result == null) result = new <Texture>[]; for each (var name:String in getNames(prefix, sNames)) result.push(getTexture(name)); sNames.length = 0; return result; }
Would you modify the parameter to Array.sort
? Would you remember to re-modify this every time you got a new version of Starling? This is just one example. In many cases you may not be able to simply modify the source code.
So, how do you go about creating a sortable string for your integer? One approach is to try to use built-in functionality like the radix
parameter of int.toString
to convert to base-26 for letters:
function convert(i:int): String { return i.toString(26); } var a:Array = [convert(0), convert(1), convert(2), convert(99), convert(100)]; trace(a); // output: 0,1,2,3l,3m
Well that didn’t work at all. Unless I’ve missed something, we’ll need to do it ourselves. The steps of the algorithm I used are as follows: (note: I use “digits” because I lack a good name for base-26)
- Determine the number of digits in the output string (e.g. 2)
- Determine the value of the the most-significant digit (e.g. 26*26=676 for 2 digits)
- Loop from the most-significant digit to the least-significant digit. At each iteration, determine the biggest digit and subtract its value from the input integer.
Here’s the AS3 code to do this:
/** * Convert an integer to a string that can be sorted with Array.CASEINSENSITIVE * @param val Integer to convert * @param maxValue Maximum value of integers that will be sorted * @return The sortable string * @author Jackson Dunstan, JacksonDunstan.com */ public static function intToSortableStr(val:uint, maxValue:uint): String { // Get the number of digits in the string and the value of the most-significant digit var digitValue:uint = 1; var digits:uint; while (maxValue /= 26) { digits++; digitValue *= 26; } digitValue /= 26; // Build the string from most-signficant to least-signifcant digit var ret:String = ""; for (var i:uint; i < digits; ++i) { var digit:int = val / digitValue; if (digit == 0) ret += "A"; else if (digit == 1) ret += "B"; else if (digit == 2) ret += "C"; else if (digit == 3) ret += "D"; else if (digit == 4) ret += "E"; else if (digit == 5) ret += "F"; else if (digit == 6) ret += "G"; else if (digit == 7) ret += "H"; else if (digit == 8) ret += "I"; else if (digit == 9) ret += "J"; else if (digit == 10) ret += "K"; else if (digit == 11) ret += "L"; else if (digit == 12) ret += "M"; else if (digit == 13) ret += "N"; else if (digit == 14) ret += "O"; else if (digit == 15) ret += "P"; else if (digit == 16) ret += "Q"; else if (digit == 17) ret += "R"; else if (digit == 18) ret += "S"; else if (digit == 19) ret += "T"; else if (digit == 20) ret += "U"; else if (digit == 21) ret += "V"; else if (digit == 22) ret += "W"; else if (digit == 23) ret += "X"; else if (digit == 24) ret += "Y"; else ret += "Z"; val -= digit * digitValue; digitValue /= 26; } return ret; }
And here’s a little test app to test it out:
package { import flash.display.Sprite; import flash.display.StageAlign; import flash.display.StageScaleMode; import flash.text.TextField; import flash.text.TextFieldAutoSize; public class IntToSortableStr extends Sprite { private var tf:TextField = new TextField(); private function row(...cols): void { tf.appendText(cols.join(",")+"\n"); } public function IntToSortableStr() { stage.align = StageAlign.TOP_LEFT; stage.scaleMode = StageScaleMode.NO_SCALE; tf.width = stage.stageWidth; tf.height = stage.stageHeight; addChild(tf); row("Original", "Sorted", "Match"); const NUM_VALS:int = 26*26+1; var original:Vector.<String> = new Vector.<String>(NUM_VALS); for (var i:int = 0; i < NUM_VALS; ++i) { original[i] = intToSortableStr(i, NUM_VALS); } var sorted:Vector.<String> = original.slice().sort(Array.CASEINSENSITIVE); for (i = 0; i < NUM_VALS; ++i) { row(original[i], sorted[i], (original[i]==sorted[i]?"true":"false")); } } /** * Convert an integer to a string that can be sorted with Array.CASEINSENSITIVE * @param val Integer to convert * @param maxValue Maximum value of integers that will be sorted * @return The sortable string * @author Jackson Dunstan, JacksonDunstan.com */ public static function intToSortableStr(val:uint, maxValue:uint): String { // Get the number of digits in the string and the value of the most-significant digit var digitValue:uint = 1; var digits:uint; while (maxValue /= 26) { digits++; digitValue *= 26; } digitValue /= 26; // Build the string from most-signficant to least-signifcant digit var ret:String = ""; for (var i:uint; i < digits; ++i) { var digit:int = val / digitValue; if (digit == 0) ret += "A"; else if (digit == 1) ret += "B"; else if (digit == 2) ret += "C"; else if (digit == 3) ret += "D"; else if (digit == 4) ret += "E"; else if (digit == 5) ret += "F"; else if (digit == 6) ret += "G"; else if (digit == 7) ret += "H"; else if (digit == 8) ret += "I"; else if (digit == 9) ret += "J"; else if (digit == 10) ret += "K"; else if (digit == 11) ret += "L"; else if (digit == 12) ret += "M"; else if (digit == 13) ret += "N"; else if (digit == 14) ret += "O"; else if (digit == 15) ret += "P"; else if (digit == 16) ret += "Q"; else if (digit == 17) ret += "R"; else if (digit == 18) ret += "S"; else if (digit == 19) ret += "T"; else if (digit == 20) ret += "U"; else if (digit == 21) ret += "V"; else if (digit == 22) ret += "W"; else if (digit == 23) ret += "X"; else if (digit == 24) ret += "Y"; else ret += "Z"; val -= digit * digitValue; digitValue /= 26; } return ret; } } }
There’s a lot of output from this test app so I’ll just show the first part:
Original,Sorted,Match AAA,AAA,true AAB,AAB,true AAC,AAC,true AAD,AAD,true AAE,AAE,true AAF,AAF,true AAG,AAG,true AAH,AAH,true AAI,AAI,true AAJ,AAJ,true AAK,AAK,true AAL,AAL,true AAM,AAM,true AAN,AAN,true AAO,AAO,true AAP,AAP,true AAQ,AAQ,true AAR,AAR,true AAS,AAS,true AAT,AAT,true AAU,AAU,true AAV,AAV,true AAW,AAW,true AAX,AAX,true AAY,AAY,true AAZ,AAZ,true ABA,ABA,true ABB,ABB,true
Rest assured that all of the results match. Feel free to use this little helper function the next time your integers are being sorted as strings.
Spot a bug? Have a suggestion or a question? Post a comment!
#1 by Jonathan on January 14th, 2013 ·
Hi, maybe you can simplify your test of digit value in intToSortableStr by:
ret += String.fromCharCode(65 + digit);
#2 by jackson on January 14th, 2013 ·
This would definitely be a cleaner approach, but it takes about 30% longer than the manual approach in the article:
Like most things, it’s a trade-off.
#3 by henke37 on January 14th, 2013 ·
Or you could just zero pad the numbers and call it a day.
#4 by jackson on January 14th, 2013 ·
Do you know of an easy way to do that? Otherwise it’s converting to a base-10 string and cuts out the conversion to base-26:
That’ll result in longer strings but should execute faster and be more readable than the base-26 version. Another trade-off. :)
#5 by skyboy on January 31st, 2013 ·
An alternative to using String.fromCharCode would be to use a ByteArray (reset length to 0 at end, cache it): for 2+ character strings you get a benefit over concatenation, compounded with not needing the if/else ladder.
Or, to maintain compatibility with other languages, rearrange the if/else ladder into an if/else tree: from O(n) to O(log n);
It’s less pretty, but tail cases are no longer more than 10 jumps and the first few cases take slightly longer: over all possibilities it’s faster, but a lot of cases for “0” will take longer.
#6 by jackson on January 31st, 2013 ·
Both of these are great tips! The terms “if-else ladder” and “if-else tree” are new to me and intriguing. I’ll have to do some tests around these…