Thoughts about the Unix configuration file nightmare

Emile van Bergen

About me

Software
   RADIUS for pppd
   i386 debugger
   HTML menus
   OpenRADIUS...

Technical articles
   Configuration data
   Non-recursive make
   Signals and Select
   Linux GUIs
      Library problems
      The GUI terminal

Work
   E-Advies...
   CV (NL, pdf)...
   Resume (pdf)...

Thoughts about the Unix configuration file nightmare

Summary: A proposal to extend the Unix file system to help converging configuration file formats.

Introduction

There's a long history of discussions about whether Unix should have a common configuration database for services and applications, if we want to make it more user friendly. At any rate, it seems that writing graphical configuration management tools would be much easier if applications would use a uniform representation of configuration data.

As an example of such a discussion, see eg. Matthew Arnison's article, How to fix the Unix configuration nightmare. It has a few interesting ideas, and I agree wholeheartedly with the notion that the "master copy" of the configuration data should reside in the configuration file that's used by the managed application or service itself, in order to avoid getting out of sync when hand-editing configuration files generated by tools that keep their own databases.

However, even if a configuration management system that can read and write all configuration files losslessly, using application-specific plugins as suggested in the article, that does does not solve the long term issue that we would ultimately like applications to gradually move to using a unified configuration database natively.

Also, the interface between the plugin that understands the configuration file and the actual configuration editor, whether interactive or not, will be something that will be as difficult to agree upon as on a common format or library to access their configuration data.

The problem

In my opinion it's not realistic to think that all developers of Unix software will accept a unified database format. It's been said that herding developers is like herding cats, and people will keep bickering about LDAP and XML and ASCII attribute = value files and brittle Windows registry-like filesystems-in-filesystems, about which library to use to access them, and so on.

Put bluntly, it's just not going to happen in our lifetime, and we'll be stuck writing those management tool plugins for new formats forever, unless there's an API and a format so familiar and compelling that every developer will want to switch to it immediately.

A solution?

For that I see just one candidate. If the API is Unix, and the database is the filesystem.

Of course, in a sense we already do that, but we descend into application-specific data much to soon. It would be great if we could push back the boundary of opaque data from complete configuration files to individual configuration items, and then only the ones that don't use standard data types. Considering that 99.9% of all configurations files can be modelled as nested trees of attribute/value pairs, why not?

Why on earth would we want inefficient and complex XML files that would have to be adopted by everyone, and some library that implements MyFancyUnixRegistryGetKeyHandle(), MyFancyUnixRegistryAddKey(), MyFancyUnixRegistryEnumerateKeys(), MyFancyUnixRegistrySetValue(), MyFancyUnixRegistryPollChanges() and so forth (yuck!), when we have open(), create(), read(), write(), readdir(), unlink(), select(), close()?

The unix model has served us well, from block devices to mice, soundcards, and hierarchical storage databases (the filesystem); why would it suddenly not be useful when we need a configuration registry? The problem is not the model, the problem is that we don't take the model far enough, and extend it to the individual configuration items.

Implementing it

'A few' things are needed on the kernel side if we want the filesystem to accomodate individual configuration items, but those modifications are probably better aligned with unix philosophy than 'just add another syscall', which seems to be the trend in Linux these days.

Of course, what we need at least is a filesystem optimized for lots of very small files. ReiserFS gets us a long way in that, but I think we need to go even further if we are to tackle backwards compatibility during the all but indefinite transition period as well.

If we could have an efficient channel to talk to user space filesystems, we could easily write filesystems that use an existing configuration file format as their 'on-disk' filesystem layout, and distribute them together with the applications that uses the configuration files, just like the 'plugins' of the configuration management application as envisioned by Matthew Arnison.

If those filesystems could be used without loopback block devices (i.e. if the distinction between block devices and real files can be removed when it comes to having a filesystem access it), then we've both solved the problem of allowing certain configuration files to have certain formats optimized for them, and the backwards compatibility problem, allowing applications a smooth transition to open(), read(), write(), close().

Given that advantage, I think that as an application developer you'd be more interested in developing the filesystem drivers for your application's configuration files (hopefully reusing your own code) than writing a module for a random configuration management system like Webmin that you don't use yourself and hardly like.

This may all sound a bit revolutionary perhaps, but keep in mind that if these things could be implemented on Linux, we also realize one of the most important dreams of the Hurd (user space filesystems, without special privileges), without having to give up the good parts of Linux (maturity, drivers, developers, momentum).

Looking closer

Of course, there are a few issues. It's probably needed to use a form of mandatory locking at the open() level between opening the file itself and the files in the filesystem contained in it.

Also, if we want enough flexibility for configuration data, we may want to get rid of the distinction between files and directories. Working out sensible semantics for that also make it easier for the filesystem stored in the file to be mounted on demand over the file itself. Hans Reiser seems to have similar ideas for Reiser4.

Perhaps it's possible to use a field in the inode to tell the kernel which /lib/fs/stdrcfile or /usr/lib/apache/conf-fs/linux filesytem to run on demand when the file is referred to as a directory in a path. The Hurd also has a mechanism like this, they're called passive translators there.

And beyond that?

One of the other things that would benefit the Unix filesystem tremendously is getting rid of the requirement that directories may only have one parent. That would make hard links to directories possible, and combined with user space filesystems, it would open up powerful ways to make files accessible using multiple paths.

Of course, it should still be possible to traverse the whole tree from a particular (root) node without infinite loops. That can be solved though by going from a tree to a directed acyclic graph; just divide all (sub)directories in a directory in two classes: either going further from the root or back towards it (or pointing to a sibling) and mark them accordingly the inode. The find utility would only descend down to subdirectories going further from the root.

Allowing user space programs to manage the filesystem namespace and the operations on open files also paves the way for creating userspace drivers, at least for devices that don't need insanely short interrupt latencies. Unix can already accomodate them, look at your X server; it's one big device driver that provides a stream interface to your graphics card (and a few other things).

The only thing that then still misses is a way to open something like /dev/irq/1, or even /proc/bus/pci/00/0a.0 for a little higher level interface, and use the file descriptor in select() to wait for an interrupt.

If we have that as well, then we can really start towards turning Linux slowly inside out, exploring microkernel concepts from a working foundation. I think it's possible; I wouldn't even be surprised if Debian 6.4 delivers userspace drivers and userspace filesystems using Linux instead of the Hurd.

Emile van Bergen, 2002/11/07

Generated on Sun Feb 23 17:20:55 2014 by decorate.pl / menuize.pl