Perl Unicode Cookbook: Unicode Named Characters

℞ 8: Unicode named characters

Use the \N{charname} notation to get the character by that name for use in interpolated literals (double-quoted strings and regexes). In v5.16, there is an implicit

 use charnames qw(:full :short);

But prior to v5.16, you must be explicit about which set of charnames you want. The :full names are the official Unicode character name, alias, or sequence, which all share a namespace.

 use charnames qw(:full :short latin greek);

 "\N{MATHEMATICAL ITALIC SMALL N}"      # :full
 "\N{GREEK CAPITAL LETTER SIGMA}"       # :full

Anything else is a Perl-specific convenience abbreviation. Specify one or more scripts by names if you want short names that are script-specific.

 "\N{Greek:Sigma}"                      # :short
 "\N{ae}"                               #  latin
 "\N{epsilon}"                          #  greek

The v5.16 release also supports a :loose import for loose matching of character names, which works just like loose matching of property names: that is, it disregards case, whitespace, and underscores:

 "\N{euro sign}"                        # :loose (from v5.16)

(You do not have to use the charnames pragma to interpolate Unicode characters by number into literals with the \N{...} sequence.)

Previous: ℞ 7: Get Character Number by Name

Series Index: The Standard Preamble

Next: ℞ 9: Unicode Named Character Sequences

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub