Perl Unicode Cookbook: Unicode Text in Stubborn Libraries
℞ 42: Unicode text in DBM hashes, the tedious way
While Perl 5 has long been very careful about handling Unicode correctly inside the world of Perl itself, every time you leave the Perl internals, you cross a boundary at which something may need to handle decoding and encoding. This happens when performing IO across a network or to files, when speaking to a database, or even when using XS to use a shared library from Perl.
For example, consider the core module DB_File, which allows you to use Berkeley DB files from Perl—persistent storage for key/value pairs.
Using a regular Perl string as a key or value for a DBM hash will trigger a wide character exception if any codepoints won’t fit into a byte. Here’s how to manually manage the translation: use DB_File; use Encode qw(encode decode); tie %dbhash, “DB_File”, “pathname”;
# STORE
# assume $uni_key and $uni_value are abstract Unicode strings
my $enc_key = encode("UTF-8", $uni_key, 1);
my $enc_value = encode("UTF-8", $uni_value, 1);
$dbhash{$enc_key} = $enc_value;
# FETCH
# assume $uni_key holds a normal Perl string (abstract Unicode)
my $enc_key = encode("UTF-8", $uni_key, 1);
my $enc_value = $dbhash{$enc_key};
my $uni_value = decode("UTF-8", $enc_key, 1);
By performing this manual encoding and decoding yourself, you know that your storage file will have a consistent representation of your data. The correct encoding depends on the type of data you store and the capabilities of the external code, of course.
Previous: ℞ 41: Unicode Linebreaking
Series Index: The Standard Preamble
Tags
Feedback
Something wrong with this article? Help us out by opening an issue or pull request on GitHub