This was posted as a comment on the excellent Legal / anti-FUD / FOSS research site Groklaw in a discussion about Linux kernel design. Slightly off topic, I know, but I couldn't resist responding to the parent comment there.

Problems and solutions in Unix GUI design

There is something that's increasingly bothering me about the current GUI efforts on Linux. We don't learn from our competitor's mistakes, and we use design principles that are not only flawed, but completely foreign to Unix.

The main usability problems are related to not regognising our own abilities. Developers are too fearful of not following established Windows practice, afraid of alienating users who may want to switch to Linux. So, interaction design is copied whenever possible, often without reflecting as much about it. However, there's lots of good, solid knowledge available about user interface design (see eg. Raskin), which Windows violates time and again, to the detriment of users — modal task bar buttons spring to mind; automatically changing menu layouts based on use frequency is another, or things like a separate place to edit menus and tool bars rather than allowing them to be manipulated directly, using large numbers of different pictograms instead of words and a few abstract icons — all of which are thoughtlessly copied.

Familiar is better than simply better, seems to be the philosophy. This may be true, but only to a point. More importantly: there's nothing so frustrating as something that works almost the same as something you're familiar to, enough to trigger automated behaviour, but which causes that behaviour to produce unexpected results. In those cases it's actually easier to learn something that is different.

There are also some technical design problems that the current Unix GUIs have in common with the others:

the toolkits are all-pervasive frameworks, rather than humbly serving libraries. Your application is kindly allowed a corner in the great and mighty framework. Either you design your application to a particular framework from the get go, or you create a client server model. Enhancing an existing tool with more closely coupled, rich GUI is not a viable option.
GUI toolkits tend to offer programmatic interfaces rather than data (stream) interfaces. The problem with those are 1. it's not language neutral, 2. programmatic interfaces are harder to transport across networks, and most importantly, 3. you need to work at the source code level to string things together in custom ways. Unix has always been great for the smooth curve it offers in what you can achieve when going from end user, to administrator, to script writer, to systems programmer. With most toolkits it's all or nothing: either you are a end user who lives with the features as they are, or you are a programmer versed in all the details the toolkit and application requires.
the GUI toolkit dictates a completely event-driven application design, shredding it to bits with endless callbacks, one for every action the user may take, or requiring you to derive your own objects from GUI objects, following their relationships.
the toolkit abstracts away OS features rather than using them. Most prominent example: your main loop with your multiplexing select() call is taken away from you, and in return you get at best a half baked set of waitevent/getevent calls and some wrapper functions to include some classic Unix events in your event mask. Sometimes you don't even get your own main loop at all, just the callbacks.

Lastly, there is a problem of internet enabling applications. Currently you can put the network between interface and application either at the pixel level, so to speak, with X, Citrix and VNC as most prominent examples. Because every interaction needs a round trip, this creates huge demands to the network in terms of latency, more than the internet can deliver, even if it literally worked at the speed of light, if you are accessing an application on another continent. There's nothing worse than delayed feedback to key presses or button clicks.

Or you can go completely the other way with HTTP + HTML, but then that was started as a document distribution and logical document layout service, rather than something you can create rich, interactive UIs with. I don't imagine a general purpose word processor user interface distributed over the web using even the snazziest of CSS, XHTML and Javascript tricks. At least the result would be orders of magnitudes more complex than a wordprocessor with a local GUI.

There is an alternative approach though, which I've contemplated for a while and that I've finally found the time to start implement.

The graphical terminal, with commands that work at the widget level.

Some aspects of the concept are not new. The TEK terminals could plot high level graphics from commands it received through its standard serial stream interface. X has essentially a stream interface, but at a lower level.

The standard response that you get when you complain about X being so low level and not containing any widgets itself, is that it's a blessing, and that we would still be stuck with those rectangular, black and white Athena widgets had X specified a widget set.

This is a red herring, if you ask me. The proper solution is creating a proper path for terminal side evolution and extensibility, rather than punting and completely shifting the problem to the host application.

If the host application can simply state its requirements (I need font service XYZ, version 14 or higher; I need OpenGL command set A, version 3 or higher; I need audio sample output, version 2 or higher; I need standard widgets 3, version 5 or higher); if you have some strict rules in that if an extension breaks compatibility, it must get a new identifier, so that new and old versions can be used side by side; if there is no way for host applications to interrogate terminal make and versison, then you can put a well defined network protocol between user interface elements and application without slowing incremental innovation to a standstill.

If then the terminal connects to the application (instead of the other way around as with X), then distributing applications over the internet will actually become viable.

I have spent some time designing this thing and written a good amount of design notes and, lately, code. Hopefully I'll also find some time to put this on the web somewhere. In any case, you'll hear more about this. I think it's the "right" solution, for the following reasons.

Imagine being able to write a simple shell script that can put up dialogs, ask simple questions, and so on, using nothing more than echo and read. Instead of the standard ASCII art after questions like "do you want to continue? [Y/n]" you now see the native widgets of the OS the client runs on: Aqua buttons on MAC OS X, Windows buttons on Windows, GTK+ buttons in GNOME, QT buttons in KDE.
You get your main loop back. This means you can go event driven all the way, or be completely sequential in your UI design, or anything in the middle. You choose, depending on what's best for your particular application, and depending on whether you're enhancing an existing application or writing a new one.
Communications between application and GUI happens through one abstract bi-directional stream. A TCP socket, a pair of pipes, a tty, it doesn't matter: in the application, the whole GUI is simply two open file descriptors you include in your select() loop, waiting on one for things that you are interested in, such as buttons pressed, menu entries picked, dialogs closed, reading the strings that you associated with the widgets you set up, and writing commands to the other. The flexibility of that cannot be overstated. Telnet, ssh, pipes, any current transport to log in to a remote system can suddenly be used to transport complete user interfaces, without requiring new transport protocols, new firewall policies, nothing. It simply runs on top of the stream services that are already there.
In principle, you can offload simple interaction completely to the terminal, by allowing widgets to write arbitrary strings back to the terminal instead of the host application when activated. A scroll bar with up- and down arrows and a long document next to it could theoretically interact without requiring any round trips to the host application. This does wonders for creating responsive user interfaces for applications that actually run on an overloaded server at the other end of the world!

In short, this allows every developer familiar with writing the simplest Unix programs to create direct interaction over the web, taking full advantage of the elegance that Unix brought: abstract, multiplexed stream I/O.

Incidentally, I think the Unix concept (select, seek, read, write, ioctl), should not just be used for allowing applications to talk to GUIs, but for the kernel to talk to device drivers, and even filesystems. After all, it's just proven digital electronics design applied to software. Synchronisation (select), addressing (seek/mmap), in-band data (read/write/mmap), and control data (ioctl) strings almost all components in a digital system together. The HURD uses similar ideas; user space servers populate the file system, and the filesystem is the rendez-vous point for all services.

The big guys at Bell Labs also seemed to think that the core Unix concepts should be used more, not less as Unix evolves; see Plan 9.

Let's be aware of our great foundation, our Unix heritage, and not reinvent the OS badly, or a bad OS, in our GUI toolkits. Let's not create the same ad-hoc dynamic library based mess as everyone and their dog has done so far.

A good UTF-8 based graphical terminal application will allow graphical Unix applications to become real Unix applications again. And hopefully, it will also allow more Unix applications to gain good GUIs.

Cheers,

Emile.

Emile van Bergen, 2005/03/31