A Partial Tour Through the UNIX Shell

A Partial Tour Through the UNIX Shell
Geoff Collyer
Usenix Conference Proceedings, Winter 1989

I printed this paper’s pages from a PDF of the whole conference proceedings.

Geoff Collyer gives a rundown of some things he learned while doing a “protracted surgery” on the Ninth Edition Unix shell. Collyer writes that it’s derived from the System V release 2 shell.

Apparently this shell was very close to the Bourne Shell, in that it did it’s own memory management via catching SEGV signals. This has been widely noted as a very advanced and difficult thing to do. This paper explains a consequence of that type of memory management: the instruction that causes the segmentation violation has to be restarted. Apparently the Motorola 68010 and 68020 CPUs used in Sun Microsystems workstations of the day could not restart instructions in this situation. Collyer reworked how this shell did memory allocation so that instruction restarts weren’t mandatory.

In the process of explaining the intricacies of the Bourne shell memory management, Collyer notes a few things:

  • “One can think of the shell as a macro processor which also interprets commands.”
  • The Bourne shell “places no arbitrary limits on lengths of strings or input lines”.
  • The Bourne shell “makes no use of the C library beyond system calls”
  • here documents were added to the 7th Edition Unix shell with code that is “careless about error checking and performance”.

Thinking of the Unix shell as a macro processor with command interpretation thrown in is a viewpoint distinct from Louis Pouzin’s original Multics SHELL. Pouzin’s document talks about a shell that only interprets commands. Unfortunately, Collyer doesn’t elaborate on this comment. One could see a macro processor in Unix shells generally because of the extensive use of string interpolation, and wildcard expansion. Traditional macro processors often have looping constructs that can produce similar output text for a list or array of values. Unix shells traditionally implement that capability with for loops. Collyer worked at Bell Labs before doing this protracted surgery and writing this paper. I have to consider it a valid view of Unix shells.

Placing no arbitrary limits on the lengths of inputs and strings is of interest. In 2015, Steven Bourne discussed his shell and noted that “strings are first class and only citizen”. In contrast, Collyer notes other places in the shell’s code where small, fixed-size buffers got used to copy “here documents”. As Collyer explains the efforts made when composing and handling strings internally to the shell reinforces Bourne’s “first class citizen” assertion.

Making no use of the C library is intriguing in at least a historical sense. Collyer mentions that the Bourne Shell worked on PDP-11 computers that had split instruction and data spaces, 64Kb of instructions, and 64Kb of data, which is process stack and heap combined. That’s not a whole lot. One way to cut executable size is to not have very general code. C library code is extra-general.

The “here document” notes are interesting because they show that the graybeards of Unix, Steven Bourne, Ken Thompson and Dennis Ritchie among others, made dumb coding errors too. Collyer notes a couple of problems with here document implementation that would be considered potential Denial of Service security vulnerabilities today. Collyer notes that some other code dealing with “negated character classes”, I believe used in file name globbing, “was incorrect and only worked by chance”. This is all doubly interesting because Collyer notes that debugging the shell itself is quite difficult. “Initial debugging was largely by inspired guesses and tedious experimentation” during his protracted surgery. Collyer mentions inserting magic numbers into each instance of each relevant data structure, which sounds a lot like the struct tagging that Tom Van Vleck says was used to improve Multics file system code. Collyer doesn’t cite or mention Van Vleck or Multics. Perhaps this is an independent invention of the same technique.

Sources

I could only find the Winter 1989 Usenix Conference Proceedings in archive.org. Usenix used to be so good at making all their archives accessible to everyone, even non-members. Now, they’re just not offering access to any conference proceedings from before 1993. Another indicator of the fall of civilization.