PerlMusings
Perl Code Cleanup
A couple of small items got me thinking enough this week that I decided they would make a good blog post. Disclaimer: I am a terrible hack of a perl programmer. I write horrible sysadmin scripts in perl. At the same time I love the language and am always striving to learn more. One thing I love about my code review fetish is it gives me a lot of opportunities to learn from others.
Loop Constructs
First up: what's the best way to write a loop that could execute an unknown number of times? I think that if you know a loop will execute 5 times, you should do this:
for ($i=1; $i<6; $i++) { # stuff gets REAL }
Pretty simple, right? $i
starts at 1, gets incremented each time, and the loop exits after the fifth iteration.
But what if you want to increment $i
on each iteration, but not use it as the exit condition? Say for example you are going to do something in a loop forever, until some external event kicks you out? I was codereviewing a coworker's code the other day, and he used this:
for ($i=0; 1; $i++) { # some stuff goes down }
That just looks weird as hell to me. To be fair, that might must be because I first learned about for loops in C and don't recall ever constructing any loops like that.
Of course, what he's really doing is using a for loop to increment $i
forever. The test condition (1
) will always be true, so the loop will just continue on ad infinitum. This works fine because this look actually gets called inside an alarm timer, so after 30 seconds it stops no matter what.
To me, a much more natural way to express this is via a while loop:
while($i++) { # stuff just got REALER }
I like this sort of loop because is it is more compact and it's easier to read. It's very clear that every iteration is incrementing $i
by 1.
I think the key advantage here comes down to readability. I just always expect a for loop to have a terminating condition. One that doesn't ever terminate just seems odd, and it's easy to miss when you are scanning the code.
Anyway maybe I'm totally wrong. I would love to hear what others think of this.
Non-capturing Groupings
Here's a problem that always confuses me in perl: how do you group without matching in regexes? Consider a file like this:
add yinst package apache-2.2 add yinst pkg mod_perl-1.2 10 add yinst package nagios-1.7
Packages are listed one per line, with either the pkg or package keyword. The two are equivalent. Also, there can be an optional numeric priority at the beginning of each line. So, here's my first try at a regex to extract just the packages names:
($package) = (/^\d*\W*add yinst (pkg|package) (\S+)/);
that doesn't work because grouping and match extraction in perl regexes both use parenthesis. The result of the above regex is that for each line of the file $package
is assigned the pkg
or package
keyword, not the actual package name that I wanted.
In the past I would often do something like this:
($dummy,$package) = (/^\d*\W*add yinst (pkg|package) (\S+)/);
and then just throw away the first match in the unused $dummy
variable. That works fine but it sure is ugly.
Finally I got wise and asked a coworker about this. He told me about non-capturing groupings. This is actually explained in the perlre man page but it's easy to miss. You can just put a ?:
at the beginning of a grouping to tell the perl regex engine to not extract the match from the grouping. Here's the correct way to do things:
($package) = (/^\d*\W*add yinst (?:pkg|package) (\S+)/);
With that ?:
modifier you don't have to use the dummy variable, resulting in cleaner code. Problem solved!
Wrapup
I didn't realize it when I started this post, but the common theme with both these perl features is code readability. Like many other sysadmin perl hackers, I've developed a set of not so great habits over the years, like the superfluous use of dummy variables. Now that I do a lot of perl code review, I find that I'm much more conscious of overall code readability. As you probably already know, the challenge in perl is not writing the code - the challenge is writing readable code. I think absolutely the best way to improve your own code readability is to spend time analyzing how others write their code. I guarantee it will make you start questioning your own assumptions and habits.