Fork yeah!
Recently at work I had to speed up a Perl script that processed files. Perl can spawn multiple processes with the fork
function, but things can go awry unless you manage the subprocesses correctly. I added forking to the script and was able to improve the script’s throughput rate nearly 10x, but it took me a few attempts to get it right. In this article I’m going to show you how to use fork
safely and avoid some common mistakes.
N.B. Windows users: as the fork
system call is unavailable on Windows, these examples may not work as described, as the behavior is emulated by Perl.
A simple example
#!/usr/bin/perl
my $pid = fork;
# now two processes are executing
if ($pid == 0) {
sleep 1;
exit;
}
waitpid $pid, 0;
This script creates a child process with fork
which returns the process id of the child to the parent process, and 0 to the (newly created) child process. At this point two processes are executing the remainder of the code, the parent and the child. The clause if ($pid == 0)
will be only be true for the child, causing it to execute the if block. The if block simply sleeps for 1 second and the exit
function causes the child process to terminate. Meanwhile the parent has skipped over the if
block and calls waitpid
which will not return until the child exits.
N.B. I can replace the sleep
calls with any arbitrary processing I want the subprocesses to do, but sleep is a good stand in, as it makes analyzing the program easier.
This is such a simple example, what could go wrong with it? Well for one thing, the fork
call may fail if the machine doesn’t have enough spare memory. So we need to check for that condition:
#!/usr/bin/perl
my $pid = fork;
die "failed to fork: $!" unless defined $pid;
# now two processes are executing
if ($pid == 0) {
sleep 1;
exit;
}
waitpid $pid, 0;
I’ve inserted a conditional die statement which will be thrown if fork
fails. But is there a deeper problem here? What if instead of sleeping for one second, the child called a function which returned immediately? We might have a race between the parent and the child - if the child exits before the parent calls waitpid
what could happen?
It wouldn’t be unreasonable to think that the operating system might reuse the child’s process id for a different program, and our parent process would suddenly be waiting for an arbitrary process to exit. Not what we had intended at all!
Fortunately this is not a risk: when a child process exits, the operating system is not allowed to reclaim its resources until the parent calls wait
(or waitpid
) on it, which “reaps” the child. Secondly waitpid
only works on child processes of the calling process: if I pass a pid of a completely separate process, waitpid
returns immediately with -1.
Multiple workers
As far as concurrency goes, the simple example isn’t very good. It only spawns one subprocess and we’re unable to scale it with additional processes without re-writing the code. Here’s my new version:
#!/usr/bin/perl
my $max_workers = shift || 1;
for (1..$max_workers) {
my $pid = fork;
die "failed to fork: $!" unless defined $pid;
next if $pid;
sleep 1;
exit;
}
my $kid;
do {
$kid = waitpid -1, 0;
} while ($kid > 0);
This script reads an argument for the number of workers, or defaults to 1. It then forks $max_workers
number of child processes. Notice how next if $pid
causes the parent to jumps to the next loop iteration, where it forks another worker over and over until it exits the loop. Meanwhile the child processes sleep for 1 second and exit.
So whilst the child processes are sleeping, the parent process has to wait for them. Unfortunately now we have more than one child $pid
to monitor, so which value should I pass to waitpid
? Luckily waitpid has a shortcut for this, I can pass -1
as the process id, and it will block until any child process exits, returning the pid of the exiting child. So I wrap this in a do..while
loop, which will call waitpid
over and over until it returns -1 or zero, both of which indicate there are no more children to reap.
This code is better than the simple example as it can scale to an arbitrary number of worker subprocesses. But it contains (at least) two issues.
Imagine we run this script with 5 workers, it’s possible that the fork
call may fail as the machine runs out of memory. The parent would then call die
printing the error and exiting, but that would leave several child processes still running, with no parent process. These become zombie processes, given the parent process id 1 (init), which calls wait on them cleaning them up.
The second issue is related to using waitpid -1, 0
to catch any exiting child process. Imagine this script is run by a wrapper program, which captures its output and streams it to another process. The wrapper program forks a child, which will stream the script’s output, then it execs the script in its own parent process, effectively injecting a child process into the script. That will cause my script to hang permanently, as the injected child won’t exit until the script finishes.
Multiple workers, redux
#!/usr/bin/perl
use strict;
use warnings;
$SIG{INT} = $SIG{TERM} = sub { exit };
my $max_workers = shift || 1;
my $parent_pid = "$$";
my @children;
for (1..$max_workers) {
my $pid = fork;
if (!defined $pid) {
warn "failed to fork: $!";
kill 'TERM', @children;
exit;
}
elsif ($pid) {
push @children, $pid;
next;
}
sleep 1;
exit;
}
wait_children();
sub wait_children {
while (scalar @children) {
my $pid = $children[0];
my $kid = waitpid $pid, 0;
warn "Reaped $pid ($kid)\n";
shift @children;
}
}
END {
if ($parent_pid == $$) {
wait_children();
}
}
This is an improved version of my multiple workers script. I’ve added signal handlers for INT (press Ctrl-C on the keyboard) and TERM that cause Perl to exit cleanly. If fork
fails, the parent sends a TERM to all child processes and then exits itself. I figure that if fork
fails, the machine is probably out of memory, and the OOM Killer can’t be far away, so it’s better to shutdown orderly than have processes meet an untimely end from the Grim (process) Reaper.
The sub wait_children
performs a blocking wait call on the pids forked by the parent. This avoids the issue of waiting for child processes not created by the script itself. Note that it doesn’t remove any element from @children
until the reap is successful. That avoids the error where the script starts running, the parent forks the child processes and shifts @children
, starts a blocking waitpid call, then receives an INT/TERM signal, which would cause wait_children
to return immediately, and then be called again in the END
block, however one of of the pids will now be missing from @children
and become a zombie process.
The END
block fires when every process exits. If the exiting process is the parent, it will call wait_children
again to cleanup any resident subprocesses. In a Real World™ script, with workers that do more than sleep
, this might be a good place to add any additional cleanup needed for the child process; such deleting any temporary files created.
Wrap up
Perl makes it easy to write concurrent code, and easy to make mistakes. If you’re not worried about fork
failing, I recommend using Parallel::ForkManager, which has a nice interface, tracks the pids it creates for you, and provides a data-sharing mechanism for subprocesses.
If you’re writing concurrent Perl and struggling, run your code with:
$ strace -e process,signal /path/to/your/program
so you can see precisely when child processes are exiting and what signals are being sent.
Tags
David Farrell
David is a professional programmer who regularly tweets and blogs about code and the art of programming.
Browse their articles
Feedback
Something wrong with this article? Help us out by opening an issue or pull request on GitHub