Perl Unicode Cookbook: Explicit encode/decode
℞ 12: Explicit encode/decode
While the standard Perl Unicode preamble makes Perl’s filehandles use UTF-8 encoding by default, filehandles aren’t the only sources and sinks of data. On rare occasions, such as a database read, you may be given encoded text you need to decode.
The core Encode module offers two functions to handle these conversions. (Remember that decode()
means to convert octets from a known encoding into Perl’s internal Unicode form and encode()
means to convet from Perl’s internal form into a known encoding.)
use Encode qw(encode decode);
# given $bytes, containing octets in a known encoding
my $chars = decode("shiftjis", $bytes, 1);
# given $chars, a string encoded in Perl's internal format
my $bytes = encode("MIME-Header-ISO_2022_JP", $chars, 1);
For streams all in the same encoding, don’t use encode/decode; instead set the file encoding when you open the file or immediately after with binmode
as described in a future reference. Remember the canonical rule of Unicode: always encode/decode at the edges of your application.
Previous: ℞ 11: Names of CJK Codepoints
Series Index: The Standard Preamble
Tags
Feedback
Something wrong with this article? Help us out by opening an issue or pull request on GitHub