This Week on p5p 2000/12/03
- Notes
- Tests
- Charnames
- Regular Expression Bug
- xsubpp
- Perlipc Examples Buggy
- PerlIO news
- Dodgy Function Names
- Lvalue Subs
- Various
Notes
You can subscribe to an e-mail version of this summary by sending an empty message to p5p-digest-subscribe@plover.com
.
Please send corrections and additions to simon@cozens.net
.
Tests
I opted not to mention this last week, but Casey Tweten pointed out that it was quite important for module authors: regression tests. You write regression tests, right? Of course, you do. The problem is that there are umpteen gazillion ways to write regression tests, which makes it horrible to debug them and find out what’s really happening when a test fails. There’s a core module called Test
that gives a neat framework for writing tests. There was some noise on perl5-porters to the effect that people wanted the core regression tests to use Test
, but the counter-argument was that the core tests should be kept as free from outside interference as possible - the Test
module may contain some constructs that the core tests are trying to test. It’s no good loading a module to help with your tests if one of your tests is whether you can load a module or not! Nevertheless, it would be nice if some of the more advanced tests were converted to the Test
interface. (That was a hint, by the way, for anyone who fancies doing that.)
The real outcome of this was a patch by Casey to convert the standard module template generated by h2xs
to use the Test
module, and to encourage (i.e. force) module authors to use it. So, module authors, if you don’t know about Test.pm
, you will soon.
Charnames
This patch from Ilya took me a while to get my head around, but now I have and I think it’s beautiful. When doing Unicode testing and entering Unicode data without a Unicode editor, we have to resort to things like
$x =
"\x{395}\x{3CD}\x{3B1}\x{3B3}\x{3B3}\x{3B5}\x{3BB}\x{3CA}\x{3B1}";
or
$x =
v917.973.945.947.947.949.955.970.945;
or even
$x =
"\N{GREEK CAPITAL LETTER EPSILON}\N{GREEK SMALL LETTER UPSILON}...";
This is a nightmare.
Ilya’s solution allows you to enter Unicode texts in foreign languages as Latin transliterations. He gives a module that provides Russian transliterations, so with Ilya’s module you can now do:
use Charnames qw(cyrillic);
$x = "\N{Il'ya Zakharevich}";
and Perl will do the right thing. The suggestion is to have a few transliteration modules in the core for testing and to have less-commonly used ones on CPAN.
However, in many non-Latin languages, transliteration to the Latin alphabet is vague at best, and there are usually several different methods of doing so; worse, the mappings are sometimes nonreversable and/or non-one-to-one. Ilya’s module for Russian is neat, but doesn’t cover everything.
Regular Expression Bug
Jarkko has been turning up all sorts of wonders with his experiments in UTF8 regular-expression land. This time, he has found that
use utf8; @a=("b" =~ /(.)/)
will cause a segmentation fault, which is horrid. Worse, this only seems to fail on 64-bit platforms, regardless of the setting of use64bitint
, which suggested some hidden assumption. Eventually, it was traced to a careless read in sv_utf8_downgrade
; Jarkko says:
Why the different platforms behave so differently (core dump vs. no core dump) on this bug is a but of a mystery, but if I had to guess I would mumble something like ‘alignment.’
This is why being the Configure pumpkin is such a demanding job.
Another core dump came from
use utf8; "," =~ /([^,]*,)*/
and another from
use utf8;
$x = $^R = 67;
"foot" =~ /foo(?{ $^R + 12 })((?{ +$x = 12; $^R + 17 })[xy])?/;
which was traced to a failure to save and restore the parantheses count. Again, the symptoms were confusingly different on different machines.
xsubpp
Ilya produced a patch for xsubpp
that allows the OUT
and IN_OUT
keywords; this is in addition to the old IN_OUTLIST
and OUTLIST
keywords.
These are somewhat confusing, but here’s my understanding of what they do: A parameter in a C function marked OUTLIST
will have its value at the end of the function added to the list of return values to Perl. A parameter labeled IN_OUT
will be read from a Perl variable at the beginning of the C function, and the value of the C variable at the end of the C function will be put back into the Perl variable. In effect, IN_OUT
gives you a pointer to write through, which is “tied” to a Perl variable. [IN_OUTLIST] does the same, but instead of writing the value back to the Perl variable, it goes onto the list of return values.
An OUT
value is set to the return value of the C function - I think. Decide for yourself.
Perlipc Examples Buggy
Nicholas Clark gave what I shall call an “impassioned appeal” about the state of the perlipc
documentation; some of the examples didn’t even compile, much less do what they claimed to do. This also turned up a problem with Net::hostent
, which was particularly embarrassing since Net::hostent
didn’t have a regression test. Nicholas wiped up the worst of the perlipc
bugs, and provided a basic regression test, which Robert Spier expanded. As Jarkko pointed out, writing a portable test for it is tricky, but any test is better than none … .
(Hey, maybe someone would like to try writing a program that automatically extracts example code from the documentation and makes sure it compiles?)
PerlIO news
Using my magic crystal ball, I found that this week saw 500 patches to the Perl repository. Naturally, the bulk of them - a massive 400 - were the development main line, 32 were Sarathy integrating bunches of patches into 5.6.1-to-be, but the remaining 67 were Nick beavering away on the PerlIO branch. This should remind you that most of the PerlIO improvements happen without much advertisement, and it’s easy to be unaware of exactly how much work is going on there.
Here’s what Nick says about how PerlIO is going:
-Duseperlio
now works as a replacement for stdio on UNIX platforms. As of last weekend, it was also working in “same functions as before” mode on Win32 in Win32’s “simple” configuration. There has been some progress, but not success, in getting OS/2 in line. (Nothing on VMS yet.)This week’s target is the
PERL_IMPLICIT_SYS
scheme on Win32 that is needed forfork()
emulation. Once that is built, the plan is to replace low-level pseudo-unixread()
on Win32 with our own version.The other area of work is to turn on use of PerlIO to allow files to be read/written as utf8 under programmer control.
Once that works, then we hook PerlIO to Encode - and we are “done” ;-) (This is actually a bit messy right now as PerlIO is deep under the core, and Encode.pm is an external XS module.)
Since Nick is going to be allowing layers to be accessible under programmer control, we need to know what layers ought to do, and this was Nick’s question: “So would anyone care to remind me what the Unicode issues were that we want to solve?”
Briefly, we want to be able to read in UTF8-encoded text into UTF8-encoded SVs, and have the same output. One of the other uses of layers would be the CRLF translation magic used on DOS-derived systems and to replace the source filter mechanism.
Dodgy Function Names
This causes a syntax error:
sub f {}
$x-f($y);
This is because Perl assumes that -f
is a file-test operator, and wonders what it’s doing next to a variable with no binary operator in the middle. Some people, including Jarkko, thought that was silly; if I define a sub f
, Perl should know that I’m trying to call that subroutine.
This naturally applies not just for the file tests, but any other operators that look like functions, such as y
and s
. Several solutions were proposed, such as forcing Perl to use the subroutine, or outlawing subroutines with “reserved” names. In the end, Jarkko produced a patch for the file tests that spits out a warning in the above case - I think the y
, m
, and other cases are still on the loose. The whole thread (36 messages) is worth reading, if only so you can get an idea of what nefarious things Perl porters get up to when syntax goes bad.
Lvalue subs
Casey asked for more useful lvalue subroutines. At the moment, you can say things like:
package Person;
sub new { bless { name => $name }, shift }
sub name : lvalue { $_[0]->{name} }
package main;
my $p = Person->new;
$p->name = "casey";
print $p->name . "\n";
Note that $p->name
on the left-hand side of that assignment is actually a method call returning an lvalue. Cool, huh?
Casey mentioned that he’d really like some way of getting at the right-hand side of the assignment as well, in order to do things like implementing substr
in pure Perl. Rick Delaney suggested that you could return a tied lvalue, but Casey replied that that was slow; the alternative was yet another global. Piers appealed for a faster tie system, which is fine, but someone has to design it, code it and make it better than the current one while doing all the same things.
Yitzchak pointed out that there’s more to lvalues than just the assignment context, and having a way to get at the rvalue would probably break in nonassignment cases.
Various
Yes, floating-point numbers are imprecise. We know. However, thanks to Nick Clarke, there’s now far fewer of them.
Until next week, I remain, your humble and obedient servant,
Tags
Feedback
Something wrong with this article? Help us out by opening an issue or pull request on GitHub