Perl Code Cleanup

Created Mar 23, 2011 · Modified Mar 2, 2012

A couple of small items got me thinking enough this week that I decided they would make a good blog post. Disclaimer: I am a terrible hack of a perl programmer. I write horrible sysadmin scripts in perl. At the same time I love the language and am always striving to learn more. One thing I love about my code review fetish is it gives me a lot of opportunities to learn from others.

Loop Constructs

First up: what’s the best way to write a loop that could execute an unknown number of times? I think that if you know a loop will execute 5 times, you should do this:

for ($i=1; $i<6; $i++)
{
  # stuff gets REAL
}

Pretty simple, right? $i starts at 1, gets incremented each time, and the loop exits after the fifth iteration.

But what if you want to increment $i on each iteration, but not use it as the exit condition? Say for example you are going to do something in a loop forever, until some external event kicks you out? I was codereviewing a coworker’s code the other day, and he used this:

for ($i=0; 1; $i++)
{
  # some stuff goes down
}

That just looks weird as hell to me. To be fair, that might must be because I first learned about for loops in C and don’t recall ever constructing any loops like that.

Of course, what he’s really doing is using a for loop to increment $i forever. The test condition (1) will always be true, so the loop will just continue on ad infinitum. This works fine because this look actually gets called inside an alarm timer, so after 30 seconds it stops no matter what.

To me, a much more natural way to express this is via a while loop:

while($i++)
{
  # stuff just got REALER
}

I like this sort of loop because is it is more compact and it’s easier to read. It’s very clear that every iteration is incrementing $i by 1.

I think the key advantage here comes down to readability. I just always expect a for loop to have a terminating condition. One that doesn’t ever terminate just seems odd, and it’s easy to miss when you are scanning the code.

Anyway maybe I’m totally wrong. I would love to hear what others think of this.

Non-capturing Groupings

Here’s a problem that always confuses me in perl: how do you group without matching in regexes? Consider a file like this:

add yinst package apache-2.2
add yinst pkg mod_perl-1.2
10 add yinst package nagios-1.7

Packages are listed one per line, with either the pkg or package keyword. The two are equivalent. Also, there can be an optional numeric priority at the beginning of each line. So, here’s my first try at a regex to extract just the packages names:

($package) = (/^\d*\W*add yinst (pkg|package) (\S+)/);

that doesn’t work because grouping and match extraction in perl regexes both use parenthesis. The result of the above regex is that for each line of the file $package is assigned the pkg or package keyword, not the actual package name that I wanted.

In the past I would often do something like this:

($dummy,$package) = (/^\d*\W*add yinst (pkg|package) (\S+)/);

and then just throw away the first match in the unused $dummy variable. That works fine but it sure is ugly.

Finally I got wise and asked a coworker about this. He told me about non-capturing groupings. This is actually explained in the perlre man page but it’s easy to miss. You can just put a ?: at the beginning of a grouping to tell the perl regex engine to not extract the match from the grouping. Here’s the correct way to do things:

($package) = (/^\d*\W*add yinst (?:pkg|package) (\S+)/);

With that ?: modifier you don’t have to use the dummy variable, resulting in cleaner code. Problem solved!

Wrapup

I didn’t realize it when I started this post, but the common theme with both these perl features is code readability. Like many other sysadmin perl hackers, I’ve developed a set of not so great habits over the years, like the superfluous use of dummy variables. Now that I do a lot of perl code review, I find that I’m much more conscious of overall code readability. As you probably already know, the challenge in perl is not writing the code - the challenge is writing readable code. I think absolutely the best way to improve your own code readability is to spend time analyzing how others write their code. I guarantee it will make you start questioning your own assumptions and habits.

Comments

justarobert March 24, 2011

On loop constructs: there is some chance that while($i++) will terminate when you don't expect. $i might start out as a negative integer, or something in your code might set $i to 0. I prefer while(1) { ... $i++}. The while(1) makes it clear that we do not expect the while condition to terminate the loop. Also, for your for loops, foreach my $i (1..6) is somewhat more Perl-idiomatic.

Have you read Damian Conway's Perl Best Practices? I don't agree with all of his recommendations, but it is great food for thought.

@justarobert

kgoess March 25, 2011

If you want to think about readability and regular expressions, think about the /x flag, which lets you break up the regex into readable chunks as well as add comments. So you'd have something like this: http://www.pastie.org/1713930 (I'm assuming IntenseDebate won't preserve formatting).

philote March 25, 2011

I prefer the following for loops that execute a set number of times:
for my $i (1..5) {...} or even for (1..5) { }

As for your regex, if the package name is always last and doesn't contain spaces, use split for something like the following:
my $package = (split(/s+/, $line))[-1];

Keith Thompson May 26, 2011

For your first example, I find this:
for ($i=0; $i<5; $i++)
much clearer. And I'd use "my" so $i is local to the loop:
for (my $i = 0; $i < 5; $i ++)

Or you can write:
foreach my $i (1..5)
or
foreach my $i (0..4)

("for" and "foreach" are synonymous, but I prefer to use "for" for the C-style 3-clause loop and "foreach" when iterating over a list.)

I find your
while($i++)
misleading. It's supposed to be an infinite loop, but you're using $i++ as a condition. I'd write:
while (1) {
$i++;
...
}

Say what you mean.

(Hope my formatting doesn't get messed up when I submit this.)

Keith Thompson May 26, 2011

Ok, it kept the line breaks but lost the indentation. There might be a way to add markup to avoid that, but I'm not going to learn the syntax for every web form in the world. Not a big deal, I think it's clear enough without the indentation.

Keith Thompson May 26, 2011

And if I'd been paying more attention, I would have realized that justarobert covered most of what I said.

Perl Code Cleanup

Loop Constructs

Non-capturing Groupings

Wrapup

Comments

Search

Navigation

Categories

Recent Changes