Why are unicode characters outside the BMP called astral?

By: | Post date: 2017-06-05 | Comments: 2 Comments
Posted in categories: Other Languages, Writing Systems

Thank you for the A2A, Jelle Zijlstra, and why do I suspect that you’ve read my page Astral Planes?

There’s 17 * 65536 characters in Unicode. Each 65536 characters is called a Plane. The first plane, the BMP, is the plane that most characters you will ever encounter are in. Only two other planes are used (or indeed likely to be used), and they contain obsolete, archaic scripts or characters in scripts that won’t get used much at all, and that most people will rarely encounter.

Or, per Plane (Unicode) – Wikipedia

In the Unicode standard, a plane is a continuous group of 65,536 (= [math]2^{16}[/math]) code points. There are 17 planes, identified by the numbers 0 to 16 decimal, which corresponds with the possible values 00–10 hexadecimal of the first two positions in six position format (hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly-used characters. The higher planes 1 through 16 are called “supplementary planes”, or humorously “astral planes“.

Thank you Wikipedia.

[citation needed]

Actually, you know what? I’ll cite me. Astral Planes

So as of Unicode 3.0.1 (August 2000), Unicode is organised into 16 planes, each of 64K; this gives over a million codepoints, which should be enough for all needs, past present and future. The Basic Multilingual Plane (BMP), or Plane 0, is the first 64K, which is what was in use until 2000, and where just about everything useful will still reside. The other planes are termed Supplementary.

The supplementary planes are an innovation in how characters are internally represented—programmers have to assume a character can have a million possible values, not just 64K, which means they often have to change their existing code. Furthermore, they are not drastically common in use: most ‘real’ scripts (though not all) are ensconced in the BMP. […]

The informal name for the supplementary planes of Unicode is “astral planes”, since (especially in the late ’90s) their use seemed to be as remote as the theosophical “great beyond”. There has been objection to this jocular usage (see “string vs. char” and subsequent discussion on Unicode list); and as Planes 1 and 2 spread in use there will be less occasion to feel that the planes really are ‘astral’. But the jocular reference is harmless, and it serves as a reminder that we’re not quite there yet.

Astral plane is a joke on Astral plane: they’re “planes” of characters, but they were inaccessible and immaterial, you’d never get to them, your software would never get to them, and you’d never need to get to them: they were abstruse and obscure. The joke was coined on the Unicode mailing list.

The term is still in use; e.g. https://youtrack.jetbrains.com/i… . And the term still means something: legacy products still fail to support them (such as… oh, the Quora text editor).

There’s a simple reason why those planes aren’t particularly astral any more. In amongst the Deseret and Nabataean and Egyptian hieroglyphs, there is one set of characters in the supplementary planes that sees a *lot* of usage now, and that users have come to expect all their platforms to support. Those characters weren’t in Unicode when I wrote my page in 2003, but they’re there in the Astral Planes now.

Those characters are, of course, Emoji.

2 Comments

  • John Cowan says:

    I think I invented the term, but I’m perfectly willing to be shown otherwise.

    • opoudjis says:

      That is certainly plausible. Missed you. And I’m sorry I have not tagged all my Quora answers, but I did not want to spend another week going through 2000 answers manually.

Leave a Reply

  • Subscribe to Blog via Email

  • March 2024
    M T W T F S S
    « Jul    
     123
    45678910
    11121314151617
    18192021222324
    25262728293031
%d bloggers like this: