Thoughts about the Unix configuration file nightmare
Summary: A proposal to extend the Unix file system to help converging
configuration file formats.
Introduction
There's a long history of discussions about whether Unix should have a common
configuration database for services and applications, if we want to make it
more user friendly. At any rate, it seems that writing graphical configuration
management tools would be much easier if applications would use a uniform
representation of configuration data.
As an example of such a discussion, see eg. Matthew Arnison's article,
How to fix the Unix
configuration nightmare. It has a few interesting ideas, and I agree
wholeheartedly with the notion that the "master copy" of the configuration data
should reside in the configuration file that's used by the managed application
or service itself, in order to avoid getting out of sync when hand-editing
configuration files generated by tools that keep their own databases.
However, even if a configuration management system that can read and write all
configuration files losslessly, using application-specific plugins as suggested
in the article, that does does not solve the long term issue that we would
ultimately like applications to gradually move to using a unified configuration
database natively.
Also, the interface between the plugin that understands the configuration file
and the actual configuration editor, whether interactive or not, will be
something that will be as difficult to agree upon as on a common format or
library to access their configuration data.
The problem
In my opinion it's not realistic to think that all developers of Unix
software will accept a unified database format. It's been said that herding
developers is like herding cats, and people will keep bickering about LDAP and
XML and ASCII attribute = value files and brittle Windows
registry-like filesystems-in-filesystems, about which library to use to access
them, and so on.
Put bluntly, it's just not going to happen in our lifetime, and we'll be stuck
writing those management tool plugins for new formats forever, unless there's
an API and a format so familiar and compelling that every developer will want
to switch to it immediately.
A solution?
For that I see just one candidate. If the API is Unix, and the database is
the filesystem.
Of course, in a sense we already do that, but we descend into
application-specific data much to soon. It would be great if we could push back
the boundary of opaque data from complete configuration files to individual
configuration items, and then only the ones that don't use standard data types.
Considering that 99.9% of all configurations files can be modelled as nested
trees of attribute/value pairs, why not?
Why on earth would we want inefficient and complex XML files that would have
to be adopted by everyone, and some library that implements
MyFancyUnixRegistryGetKeyHandle(), MyFancyUnixRegistryAddKey(),
MyFancyUnixRegistryEnumerateKeys(), MyFancyUnixRegistrySetValue(),
MyFancyUnixRegistryPollChanges() and so forth (yuck!), when we have open(),
create(), read(), write(), readdir(), unlink(), select(), close()?
The unix model has served us well, from block devices to mice, soundcards, and
hierarchical storage databases (the filesystem); why would it suddenly not be
useful when we need a configuration registry? The problem is not the model,
the problem is that we don't take the model far enough, and extend it to the
individual configuration items.
Implementing it
'A few' things are needed on the kernel side if we want the filesystem to
accomodate individual configuration items, but those modifications are probably
better aligned with unix philosophy than 'just add another syscall', which
seems to be the trend in Linux these days.
Of course, what we need at least is a filesystem optimized for lots of very
small files. ReiserFS gets us a long way in that, but I think we need to go
even further if we are to tackle backwards compatibility during the all but
indefinite transition period as well.
If we could have an efficient channel to talk to user space filesystems, we
could easily write filesystems that use an existing configuration file format
as their 'on-disk' filesystem layout, and distribute them together with the
applications that uses the configuration files, just like the 'plugins' of the
configuration management application as envisioned by Matthew Arnison.
If those filesystems could be used without loopback block devices (i.e. if the
distinction between block devices and real files can be removed when it comes
to having a filesystem access it), then we've both solved the problem of
allowing certain configuration files to have certain formats optimized for
them, and the backwards compatibility problem, allowing applications a smooth
transition to open(), read(), write(), close().
Given that advantage, I think that as an application developer you'd be more
interested in developing the filesystem drivers for your application's
configuration files (hopefully reusing your own code) than writing a module for
a random configuration management system like Webmin that you don't use
yourself and hardly like.
This may all sound a bit revolutionary perhaps, but keep in mind that
if these things could be implemented on Linux, we also realize one of
the most important dreams of the Hurd (user space filesystems, without
special privileges), without having to give up the good parts of Linux
(maturity, drivers, developers, momentum).
Looking closer
Of course, there are a few issues. It's probably needed to use a form of
mandatory locking at the open() level between opening the file itself and the
files in the filesystem contained in it.
Also, if we want enough flexibility for configuration data, we may want to get
rid of the distinction between files and directories. Working out sensible
semantics for that also make it easier for the filesystem stored in the file to
be mounted on demand over the file itself. Hans Reiser seems to have similar
ideas for Reiser4.
Perhaps it's possible to use a field in the inode to tell the kernel which
/lib/fs/stdrcfile or /usr/lib/apache/conf-fs/linux filesytem to run on
demand when the file is referred to as a directory in a path. The Hurd also has
a mechanism like this, they're called passive translators there.
And beyond that?
One of the other things that would benefit the Unix filesystem tremendously is
getting rid of the requirement that directories may only have one parent. That
would make hard links to directories possible, and combined with user space
filesystems, it would open up powerful ways to make files accessible using
multiple paths.
Of course, it should still be possible to traverse the whole tree from a
particular (root) node without infinite loops. That can be solved though by
going from a tree to a directed acyclic graph; just divide all (sub)directories
in a directory in two classes: either going further from the root or back
towards it (or pointing to a sibling) and mark them accordingly the inode. The
find utility would only descend down to subdirectories going further
from the root.
Allowing user space programs to manage the filesystem namespace and the
operations on open files also paves the way for creating userspace drivers, at
least for devices that don't need insanely short interrupt latencies. Unix
can already accomodate them, look at your X server; it's one big device
driver that provides a stream interface to your graphics card (and a few other
things).
The only thing that then still misses is a way to open something like
/dev/irq/1, or even /proc/bus/pci/00/0a.0 for a little higher level interface,
and use the file descriptor in select() to wait for an interrupt.
If we have that as well, then we can really start towards turning Linux slowly
inside out, exploring microkernel concepts from a working foundation. I think
it's possible; I wouldn't even be surprised if Debian 6.4 delivers userspace
drivers and userspace filesystems using Linux instead of the Hurd.