Custom scripting gives users a safe-du
originally posted on Linux.com December 7, 2005.
As a system administrator, there are two ways you can interact with users: force them to follow the rules or encourage them with tools and guidelines. I prefer the second approach, as I think people generally want to do the right thing. Also, if people don't follow the rules at your company, that is a management problem, not a computer problem. Therefore, I prefer to concentrate my attention on helpful tools and scripts, which is exactly what I did recently to solve a typical system administrator problem.
My company has a Linux cluster with a terabyte of attached storage. Over time we noticed the head node was becoming more overloaded. Inspection of the system showed that users were starting dozens of copies of the du utility to determine disk space usage. This was a natural thing for them to do, because they had a need to know how much disk space was available. A lack of disk space would cause their software builds and tests to fail. The problem was that it takes five to seven hours for a du of the entire shared filesystem. Thus, when the filesystem was nearly full (as it of course usually was), the number of du processes would increase almost exponentially.
To address this problem, we first set up automated nightly disk space reports, so that users could check the status without running du. This still did not solve the problem, as the amount of used space could fluctuate dramatically over the course of 24 hours. Users still wanted and needed to run their own du processes throughout the workday.
While adding more disk space would have solved the problem, we are using a large disk array that is already filled to maximum capacity. In general, users tend to fill up all available disk space anyway, no matter how much you give them.
We then developed a policy: users could run du on any directory they owned. In addition, user du processes would be allowed to run for a maximum of one hour of wall time. Users in the wheel group would be exempt from these restrictions.
I was given the task of developing a tool to implement this policy. Some sort of wrapper around the existing du seemed like an obvious choice: the script could validate the input, abort if an invalid path was given, and terminate the du process if it ran too long.
I wrote a basic bash script in perhaps an hour's time. Then I thought about how to run it, and that is where I ran into trouble. I had thought that I would make the script set user id (setuid) or set group id (setgid) root, i.e. when run by any user it would actually run in the root group. Then, I could change the permissions on the real du so that only root could run it. The result would be that normal users could only access the real du through the wrapper script.
Of course that would make a pretty boring article, and in reality it didn't turn out to be that simple: you actually can't create setuid or setgid shell scripts on Linux. You can set the bit (
chmod g+s script ) that tells the system to execute the script setgid, but the system ignores this on shell scripts. This behavior varies across different Unix and Unix-like platforms, but generally setuid/setgid shell scripts are frowned on due to security risks.
Thus my script was useless, except as an exercise to develop the logic necessary for the process. I just had to find a script language that allowed setgid scripts. Next I turned to the sysadmin's best friend, Perl. A little Web research showed that Perl fully supports setgid scripts and does so consistently across platforms. The key is the automatic use of taint mode, which forces programmers to deal with script inputs in a secure manner.
After a few more hours of programming I had a basic script, which I will now attempt to explain (and perhaps justify). For reference, here is the original script and a version with line numbers if you want to follow along.
First, everything up to line 32 sets up the initial environment. Notice lines 21 and 22: taint mode forces us to sanitize the environment. Note also that it doesn't matter if taint mode is enabled with the
-T switch (on line 1) or not. The Perl interpreter automatically detects that a script is setuid or setgid and forces taint mode on.
Line 27 points to the system-supplied du. This brings up a security issue: the astute reader will notice that the du executable needs no special privileges to run. There's nothing to prevent a user from copying the real du to another location (like /tmp), changing the permissions and running it from there. The user could also copy a working du executable from another Linux system. That's okay -- as I said before, this script is designed to assist users, not completely lock them up. If you really want to restrict users then you need to look into something like SELinux.
Lines 30 through 35 save the command line arguments and strip all the options off the command line (thus leaving only file or directory arguments). I utilize a side-effect of getopt to do this: if you call it with only a colon as an parameter it removes all the options from ARGV (an option is any command-line argument which starts with a dash). This only works with getopt from the Getopt::Std package, not with Getopts from the package or with Getopt::Long. I could do the argument parsing myself, but why not rely on the standard facility?
Next I untaint the script input in lines 37 to 42. The Perl taint mechanism forces you to examine every script argument and strip it using regular expressions. This is where you do things like check that the arguments contain only alphanumeric characters and try to ensure that nothing funny (or malicious) slips through.
This script doesn't do any of that checking. Remember that I (mostly) trust the users of this script. Thus, all I do is run a regular expression that matches the entirety of each argument. This is the minimum necessary to satisfy Perl's taint mode. Any problem the Perl interpreter detects while in taint mode will just cause the script to abort.
This script also assumes that sysadmins know what they are doing. The check performed in lines 44 through 55 determines two things: If the userid is 0 (i.e. the script is being run by root), and if the user is a member of group wheel.
If either of these checks succeeds, the user is granted full access to the real du program. That's why I exec $RealDu on line 53 -- there's nothing more for the script to do, so there's no point in continuing -- just jump to the real du.
I'm not entirely happy with the way I determine if a user is a member of group wheel (doing a getpwuid lookup and a getgrnam lookup), but I was unable to determine a cleaner way to do this. I suspect a little thought would produce something more elegant. On the upside, this check should be fairly portable to other Unix platforms.
One of my design goals was to duplicate the behavior of normal du as much as possible, so users don't notice that they are actually running this script (unless they do something wrong, of course). That is the reason for lines 60 to 67. The default behavior of du is to operate on the current directory if the user fails to supply any directory argument so
du is the same as
du .. Without the check that's performed on these lines, my script would just silently exit. By pushing the current directory onto the directory list the script emulates the default behavior of du in this case and ensures any user used to that behavior doesn't get tripped up.
Now we begin to get to the real meat of the script. Lines 69 through 74 perform the permission check. If the user doesn't own any path specified on the command line, the script aborts with an error. Again, the simple way to implement this would be to assume that there is only one argument to du. This is how du is run probably 99% of the time. However, you are allowed to supply multiple path arguments to du, in which case it checks disk usage for each path. The script will iterate over the list of command-line arguments instead of assuming there is just one path specified. The actual file test is with one of Perl's file tests,
-O, which will return true if the file is owned by the real user, rather than the
-o test, which tests the user the script is running under. Since this is a setgid script, we want to know the real user has permission to read the file.
Remember I said there were two restrictions in my version of du: directory ownership and runtime limit. Lines 76 through 89 implement the runtime limit and then actually invoke the real du. This invocation took some thought. I first tried to set an alarm signal handler at the top of the script and then set the alarm timeout before invoking $RealDu. That kind of worked, but it resulted in both the child (the real du) and the parent (this script) dying at the same time. I wanted this script to continue after the child du failed. Again, in reality this was a small point, as the most the script would ever do afterward was print the error message about killing $RealDu because it ran too long.
A bit of Googling and head-scratching produced the answer: you have to wrap the whole thing in an eval and set the signal handler as shown. Then it's a simple matter to start my timer and fire off the real du command. If the real du exceeds $TimeOut seconds, the alarm signal fires and the kill command kills all the children of my script (just the real du, in this case). I ignore SIGHUP so my script doesn't catch it.
And that is the script in its entirety. To install it, move the real du to another location, such as
/usr/bin/unsafe-du, and remove other execute permissions on it (
chmod a-x /usr/bin/unsafe-du). Then install this script, owned as root, as /usr/bin/du with the setgid flag set.
The final permissions should look something like this:
-rwxr-sr-x 1 root root 2424 Sep 23 16:10 /usr/bin/du -rwxr-xr-- 1 root root 25884 Mar 14 2001 /usr/bin/unsafe-du
Note that you may need to install the perl-suidperl package to enable Perl setuid scripts on some distros. This could all be done with setuid, but setgid is a lesser permission, so it seems appropriate to use it instead. A compromise would lead to the running in the root group, not running as uid 0.
As I said, the goal with this script was to assist users, not lock them out, while protecting system performance. People can still run du, as long as it is on their own directories or files. Meanwhile, the time limit helps limit the number of long-running du processes on the system. The limit of one hour is somewhat arbitrary: we just picked a number that seemed like it would work for most legitimate cases. It seems like that is enough, because I haven't heard any complaints from users.
Users, if they are so inclined, can circumvent these protections. In practice, this hasn't been a problem. I installed this script on the system several months ago. No users have complained, or even commented on it. At the same time, the number of random du processes being run on the system have dropped dramatically. Thus this script is a complete success, and a great example of how those everyday sysadmin tasks get solved -- one Perl script at a time.