Perl Unicode Cookbook: Unicode Locale Collation

℞ 37: Unicode locale collation

As you’ve already seen, Unicode-aware sorting respects Unicode character properties. You can’t sort by codepoint and expect to get accurate results, not even if you stick with pure ASCII.

The world is a complicated place. Some locales have their own special sorting rules.

The module Unicode::Collate::Locale provides a sort() method which supports locale-specific rules:

 use Unicode::Collate::Locale;

 my $col  = Unicode::Collate::Locale->new(locale => "de__phonebook");
 my @list = $col->sort(@old_list);

This module is part of the Perl 5 core distribution as of Perl 5.12. If you’re using an older version of Perl, install the Unicode::Collate distribution to take advantage of it.

The ucsort program mentioned in Perl Unicode recipe 35 accepts a --locale parameter.

Previous: ℞ 36: Case- and Accent-insensitive Sorting

Series Index: The Standard Preamble

Next: ℞ 38: Make cmp Work on Text instead of Codepoints

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub

TPRF Gold Sponsor
TPRF Silver Sponsor