Perl Unicode Cookbook: Custom Character Properties
℞ 26: Custom character properties
Match Unicode Properties in Regex explained that ever Unicode character has one or more properties, specified by the Unicode consortium. You may extend these rule to define your own properties such that Perl can use them.
A custom property is a function given a name beginning with In
or Is
which returns a string conforming to a special format. The “User-Defined Character Properties” section of perldoc perlunicode describes this format in more detail.
To define at compile-time your own custom character properties for use in regexes:
# using private-use characters
sub In_Tengwar { "E000\tE07F\n" }
if (/\p{In_Tengwar}/) { ... }
# blending existing properties
sub Is_GraecoRoman_Title {<<'END_OF_SET'}
+utf8::IsLatin
+utf8::IsGreek
&utf8::IsTitle
END_OF_SET
if (/\p{Is_GraecoRoman_Title}/ { ... }
Previous: ℞ 25: Match Unicode Properties in Regex
Series Index: The Standard Preamble
Tags
Feedback
Something wrong with this article? Help us out by opening an issue or pull request on GitHub