A little too much time on my hands yesterday let me finish up something I’ve started a while back - I’ve been meaning to add Ruby 1.9 support to Langusta, but couldn’t think of an elegant way to do it. The problem I’ve been having was with supporting both 1.8 and 1.9-style strings in the entire app. Previously, I used UCS-2 encoded strings everywhere and had to add an abstraction layer to be able to iterate over characters in them - it worked but it didn’t feel elegant. So instead I decided to use the lowest common denominator between 1.8 and 1.9 and represent characters as Unicode codepoints as integer arrays.
It took me a while to realize that what I’ve devised is not a bad way to represent characters at all. Fixnums are represented as immediate values in the interpreter - sure, we’re still wasting a lot (32 bits per character, twice as much on 64-bit), but operations on codepoints are the gist of what this library does, and this makes them trivial to implement.
I think I’ll try using NArray next time to see how much I can gain in terms of memory consumption.