- Unicode defines a set of character code points between U+0000 and U+10FFFF, organized into 17 planes of 64K each.
0 BMP Basic Multilingual Plane 1 SMP Supplementary Multilingual Plane 2 SIP Supplementary Ideographic Plane 14 SSP Supplementary Special-purpose Plane 15 Private Use Plane 16 Private Use Plane
There are 3 encoding schemes. Each is able to represent any sequence of characters. They differ in their relative efficiency and convenience.
byte UTF-8 1-4 short UTF-16 1-2 int UTF-32 1
When converting UTF-8 to UTF-16, an extended character must be converted from a 4 byte sequence to a 2 short surrogate pair. When converting UTF-16 to UTF-8, a 2 short surrogate pair must be converted into a 4 byte sequence.
[Non-text portions of this message have been removed]