Perl Unicode Cookbook: Unicode Locale Collation
℞ 37: Unicode locale collation
As you’ve already seen, Unicode-aware sorting respects Unicode character properties. You can’t sort by codepoint and expect to get accurate results, not even if you stick with pure ASCII.
The world is a complicated place. Some locales have their own special sorting rules.
The module Unicode::Collate::Locale provides a sort()
method which supports locale-specific rules:
use Unicode::Collate::Locale;
my $col = Unicode::Collate::Locale->new(locale => "de__phonebook");
my @list = $col->sort(@old_list);
This module is part of the Perl 5 core distribution as of Perl 5.12. If you’re using an older version of Perl, install the Unicode::Collate distribution to take advantage of it.
The ucsort program mentioned in Perl Unicode recipe 35 accepts a --locale
parameter.
Previous: ℞ 36: Case- and Accent-insensitive Sorting
Series Index: The Standard Preamble
Tags
Feedback
Something wrong with this article? Help us out by opening an issue or pull request on GitHub