Just Another Blog

Are you thinking what I'm thinking?

Saturday, May 15, 2004

Firebird is... hmm... semi-semantic...

I'm trying out the string processing of Firebird database. Unicode and character semantics are what I am looking out. Firebird handles unicode very well, but for characters semantics, hmm.. it doesn't really have... Let me explain:

Firebird only have one character set for unicode code: "UNICODE_FSS". It is variable-length, ranging from 1 byte (e.g. ASCII) to 3 bytes (e.g. Asian characters). Assume your database default character set is UNICODE_FSS, what Firebird does when you said "char (3)" is: allocates 9 bytes of space. So if you enter English characters, you can enter at most 9 characters! Kinda weird... But at least, any text you entered won't get truncated. So that's the good thing...

So, what character semantics is? From my understand, it is about measuring string in characters, not bytes. e.g. in JavaScript, try window.alert("abc".length); and window.alert("一二三".length);. Both returns you the number 3. If you are using Mozilla, you can do that simply by opening the JavaScript console, entering the code and clicking [evaluate].

In Oracle, you can do something like char(3 char), which allocate a space for 3 characters, no matter what your charset is. For more, see the article Globalize with Character Semantics.

Another cool thing is that Firebird won't waste you much space when your fixed-length columns are not fulled filled, e.g. entering "abc" to a column of char(10). The padding spaces after the strings are compressed! So actually char is no different to varchar in Firebird. (Of course, it is a good practice to use varchar when your strings are not fixed in length, as other DBMS may not do such compression.)


Note that troll and spam comments will be deleted without any notification.

Post a Comment

<< Home