Say No to mkdir -p

I spend most of my time these days working on large software releases incorporating thousands of separate packages. Thus I have had an opportunity to become an expert on the use and abuse of install scripts. This last week Iwas bitten by a particular problem which I've seen before and I though was worthy of a blog posting. That problem is the use of mkdir -p in install scripts.

mkdir -p seems like a great idea in package install scripts at first glance. Packages usually have to create directories as part of their install process, and using the -p switch ensures that all intermediary directories are created too. "Problem solved!" says the naive package creator.

Sadly this is not the case, because use of mkdir -p eventually leads to very difficult to detect and fix failures. Consider package A which creates and owns /dir1. Package B creates and owns /dir1/dir2, and uses mkdir -p to create that directory tree.

As long as package B has an explicit dependency on package A, this will work ok - the dependency will cause the install script for package A to always be executed first, so package B will only create dir2, not dir1. That is what the original package B creator intended.

However, if B does not have an explicit dependency on A, things get dicey. Due to other dependency orderings, A may always be installed before B, and you will never see a problem. At some later date, you decide A is no longer needed and you remove it from your release. Since B lacks an explicit dependency on A, this will not be flagged as a problem by your packaging system. Package B will probably continue to work just fine in any case, since it is using mkdir -p to create it's directories.

The problem is you forgot about poor little package C. It's another package with a flaw - no explicit dependency on package A. It's been writing a file in dir1 based on package A owning that directory and setting permissions. That is an easy trap to fall into if package A has been around a long time - the creator of package C might not even realize that he should explicitly depend on A.

Now you have just broken package C and again your packaging system can't detect the failure. The daemon in C tries to write to dir1 at runtime and fails, because it no longer has permission - your mkdir -p in package B has secretly changed dir1. The result of this is baffling production failures, if you are always depending on your packaging system to do the right thing. These failures are also dependent on whether a host was installed fresh from bare metal, or incrementally upgraded. If the host was incrementally upgraded, the removal of package A didn't change ownership of the existing /dir1. An incrementally upgraded host will continue to operate just fine. You won't see problems until you do a bare metal install of a host.

The solution to this is simple - don't allow the use of mkdir -p in install scripts. Instead, scripts must always check for their base directory (/dir1) and die if it doesn't exist. This way even if packages are missing dependency information, the removal of package A will immediately cause package installs of B to fail. Obvious failures like that are easy to correct. Package C shouldn't silently depend on package A either, but that error condition is harder to detect. My recommendation is that as much as possible packages only write in directories they own. Obviously that is sometimes impossible, for example in the case of shared logdirs. In general though, this is bad:

mkdir -p /dir1/dir2/dir3

and this is good:

if -w /dir1
  for i in dir2 dir2/dir3
    mkdir /dir1/$i || exit 1
  echo "expected /dir1 not found / not writable!" && exit 1

There's actually another way this problem is even nastier, and that's the case of packages executing scripts in different install phases. Typical packaging systems don't just allow one script per package, they allow several, in different stages of the install. Often you need some small amount of work done at install time (copying files, creating directories). After all packages have been installed, there is a post phase where another set of package scripts run to do the bulk of install work after the system is installed and presumably all directories have been created by packages. If package A creates dir1 in the install phase, package B can safely use mkdir -p to create dir2 in the post phase (because all install phase scripts will have already finished, by design). If however package A is modified at some point to instead run it's script in post phase, the lack of dependency ordering between the scripts can suddenly result in A running after B for the first time, and the ownership or permissions of dir1 suddenly being swapped.

Don't even get me started about how the use of recursive chown (chown -R) can cause similar but even more difficult to troubleshoot failures. A package install ordering swap suddenly results in a different package owning a directory, with similarly baffling failures.

The moral of this packaging story is in two parts:

First, never allow package install scripts which indiscriminately perform recursive operations. mkdir -p and chown -R are the worst offenders, but there are many others out there. Install scripts must always check for existence of parent directories and abort if those parents aren't found. In fact ideally the packaging system itself could detect and warn about this, although I have no idea if any do.

Second, always test bare-metal installs of your software releases. It's the only way to detect if changed package ordering have changed how your system operates. You won't find all the problems, but you can catch a lot more than if you always just blindly incrementally install software.



Our Founder
ToolboxClick to hide/show