This was posted as a comment
on the excellent Legal / anti-FUD / FOSS research site
Groklaw in a discussion about Linux kernel
design. Slightly off topic, I know, but I couldn't resist responding to the
parent comment there.
Problems and solutions in Unix GUI design
There is something that's increasingly bothering me about the current GUI
efforts on Linux. We don't learn from our competitor's mistakes, and we use
design principles that are not only flawed, but completely foreign to Unix.
The
main usability problems are related to not regognising our own
abilities.
Developers are too fearful of not following established Windows practice,
afraid of
alienating users who may want to switch to Linux. So, interaction design is
copied
whenever possible, often without reflecting as much about it. However,
there's
lots of good, solid knowledge available about user interface design
(see eg.
Raskin), which Windows violates time and again, to the detriment of
users — modal
task bar buttons spring to mind; automatically changing menu layouts
based on
use frequency is another, or things like a separate place to edit menus
and tool
bars rather than allowing them to be manipulated directly, using large
numbers of
different pictograms instead of words and a few abstract icons — all of
which
are thoughtlessly copied.
Familiar is better than simply better, seems to be
the philosophy. This may be true, but only to a point. More
importantly: there's
nothing so frustrating as something that works almost the same
as
something you're familiar to, enough to trigger automated behaviour,
but which
causes that behaviour to produce unexpected results.
In those cases it's
actually easier to learn something that is different.
There are also some
technical design problems that the current Unix GUIs have in common
with the
others:
the toolkits are all-pervasive frameworks, rather than
humbly
serving libraries. Your application is kindly allowed a corner in the
great and
mighty framework. Either you design your application to a particular
framework
from the get go, or you create a client server model. Enhancing an
existing tool
with more closely coupled, rich GUI is not a viable option.
GUI toolkits
tend to offer programmatic interfaces rather than data (stream)
interfaces. The
problem with those are 1. it's not language neutral, 2. programmatic
interfaces are
harder to transport across networks, and most importantly, 3. you need
to work
at the source code level to string things together in custom ways. Unix
has
always been great for the smooth curve it offers in what you can
achieve when
going from end user, to administrator, to script writer, to systems
programmer.
With most toolkits it's all or nothing: either you are a end user who
lives with
the features as they are, or you are a programmer versed in all the
details the
toolkit and application requires.
the GUI toolkit dictates a
completely event-driven application design, shredding it to bits with
endless
callbacks, one for every action the user may take, or requiring you to
derive
your own objects from GUI objects, following their
relationships.
the toolkit abstracts away OS features rather than
using them. Most prominent example: your main loop with your
multiplexing
select() call is taken away from you, and in return you get at best a
half baked
set of waitevent/getevent calls and some wrapper functions to
include some classic Unix events in your event mask. Sometimes you don't even
get your own main loop at all, just the callbacks.
Lastly, there is a problem of internet enabling
applications. Currently you can put the network between interface and
application either at the pixel level, so to speak, with X, Citrix and
VNC as
most prominent examples. Because every interaction needs a round trip,
this
creates huge demands to the network in terms of latency, more than the
internet
can deliver, even if it literally worked at the speed of light, if you
are
accessing an application on another continent. There's nothing worse
than
delayed feedback to key presses or button clicks.
Or you can go completely
the other way with HTTP + HTML, but then that was started as a document
distribution and logical document layout service, rather than something
you can
create rich, interactive UIs with. I don't imagine a general purpose
word processor user interface distributed over the web using even the snazziest
of CSS, XHTML and Javascript tricks. At least the result would be orders of
magnitudes more complex than a wordprocessor with a local GUI.
There is an alternative
approach though, which I've contemplated for a while and that I've
finally found
the time to start implement.
The graphical terminal, with commands that work
at the widget level.
Some aspects of the concept are not new. The TEK
terminals could plot high level graphics from commands it received
through its standard serial stream interface. X has essentially a
stream interface, but at a
lower level.
The standard response that you get when you complain about X
being so low level and not containing any widgets itself, is that it's
a
blessing, and that we would still be stuck with those rectangular,
black and
white Athena widgets had X specified a widget set.
This is a red herring, if
you ask me. The proper solution is creating a proper path for terminal
side
evolution and extensibility, rather than punting and completely
shifting the
problem to the host application.
If the host application can simply state
its requirements (I need font service XYZ, version 14 or higher; I need
OpenGL
command set A, version 3 or higher; I need audio sample output, version
2 or
higher; I need standard widgets 3, version 5 or higher); if you have
some strict
rules in that if an extension breaks compatibility, it must get a new
identifier, so that new and old versions can be used side by side; if
there is
no way for host applications to interrogate terminal make and versison,
then you
can put a well defined network protocol between user interface elements
and
application without slowing incremental innovation to a standstill.
If then the
terminal connects to the application (instead of the other way around
as with
X), then distributing applications over the internet will actually
become
viable.
I have spent some time designing this thing and written
a
good amount of design notes and, lately, code. Hopefully I'll also find
some
time to put this on the web somewhere. In any case, you'll hear more
about this.
I think it's the "right" solution, for the following
reasons.
Imagine being able to write a simple shell script that can
put up dialogs, ask simple questions, and so on, using nothing more
than echo
and read. Instead of the standard ASCII art after questions like "do
you want to
continue? [Y/n]" you now see the native widgets of the OS the client
runs on:
Aqua buttons on MAC OS X, Windows buttons on Windows, GTK+ buttons in
GNOME, QT
buttons in KDE.
You get your main loop back. This means you can go
event driven all the way, or be completely sequential in your UI
design, or anything in the middle. You
choose, depending on what's best for your particular application, and
depending
on whether you're enhancing an existing application or writing a new
one.
Communications between application and GUI happens through
one abstract bi-directional stream. A TCP socket, a pair of pipes, a
tty, it
doesn't matter: in the application, the whole GUI is simply two open
file
descriptors you include in your select() loop, waiting on one for
things that
you are interested in, such as buttons pressed, menu entries picked,
dialogs
closed, reading the strings that you associated with the widgets you
set up, and
writing commands to the other. The flexibility of that cannot be
overstated.
Telnet, ssh, pipes, any current transport to log in to a remote system
can
suddenly be used to transport complete user interfaces, without
requiring new
transport protocols, new firewall policies, nothing. It simply runs on
top of
the stream services that are already there.
In principle, you can
offload simple interaction completely to the terminal, by allowing
widgets to
write arbitrary strings back to the terminal instead of the host
application
when activated. A scroll bar with up- and down arrows and a long
document next
to it could theoretically interact without requiring any round trips to
the host
application. This does wonders for creating responsive user interfaces
for
applications that actually run on an overloaded server at the other end
of the
world!
In short, this allows every developer familiar with
writing the simplest Unix programs to create direct interaction over
the web,
taking full advantage of the elegance that Unix brought: abstract,
multiplexed
stream I/O.
Incidentally, I think the Unix concept (select, seek, read,
write, ioctl), should not just be used for allowing applications to
talk to
GUIs, but for the kernel to talk to device drivers, and even
filesystems. After
all, it's just proven digital electronics design applied to software.
Synchronisation (select), addressing (seek/mmap), in-band data
(read/write/mmap), and control data (ioctl) strings almost all
components in a
digital system together. The HURD uses similar ideas; user space
servers
populate the file system, and the filesystem is the rendez-vous point
for all
services.
The big guys at Bell Labs also seemed to think that the core
Unix
concepts should be used more, not less as Unix evolves; see Plan 9.
Let's be
aware of our great foundation, our Unix heritage, and not reinvent the
OS badly,
or a bad OS, in our GUI toolkits. Let's not create the same ad-hoc
dynamic
library based mess as everyone and their dog has done so far.
A good UTF-8
based graphical terminal application will allow graphical Unix
applications to
become real Unix applications again. And hopefully, it will also allow
more Unix
applications to gain good GUIs.