In Ruby 1.9, double-quoted strings can include arbitrary Unicode characters with \u escapes. In its simplest form, \u is followed by exactly four hexadecimal digits (letters can be upper- or lowercase), which represent a Unicode codepoint between 0000 and FFFF. For example:

“\u00D7” # => “x”: leading zeros cannot be dropped
“\u20ac” # => “€”: lowercase letters are okay

A second form of the \u escape is followed by an open curly brace, one to six hexadecimal digits, and a close curly brace. The digits between the braces can represent any Unicode codepoint between 0 and 10FFFF, and leading zeros can be dropped in this form:

“\u{A5}” # => “¥”: same as “\u00A5”
“\u{3C0}” # Greek lowercase pi: same as “\u03C0”
“\u{10ffff}” # The largest Unicode codepoint

Finally, the \u{} form of this escape allows multiple codepoints to be embedded within a single escape. Simply place multiple runs of one to six hexadecimal digits, separated by a single space or tab character, within the curly braces. Spaces are not allowed after the opening curly brace or before the closing brace:

money = “\u{20AC A3 A5}” # => “€£¥”

Note that spaces within the curly braces do not encode spaces in the string itself. You can, however, encode the ASCII space character with Unicode codepoint 20:

money = “\u{20AC 20 A3 20 A5}” # => “€ £ ¥”

Strings that use the \u escape are encoded using the Unicode UTF-8 encoding. (See Section 3.2.6 for more on the encoding of strings.)

\u escapes are usually, but not always, legal in strings. If the source file uses an encoding other than UTF-8, and a string contains multibyte characters in that encoding (literal characters, not characters created with escapes), then it is not legal to use \u in that string—it is just not possible for one string to encode characters in two different encodings. You can always use \u if the source encoding (see Section 2.4.1) is UTF-8. And you can always use \u in a string that only contains ASCII characters.

\u escapes may appear in double-quoted strings, and also in other forms of quoted text (described shortly) such as regular expressions, characters literals, %- and %Q-delimited strings, %W-delimited arrays, here documents, and backquote-delimited command strings. Java programmers should note that Ruby’s \u escape can only appear in quoted text, not in program identifiers.

Advertisements