This document pertains to version 1.1 of cl. You can download version 1.6 of cl here, Version 1.6 is substantially similar to v1.1, but I wrote a document for v1.6 specifically.
cl (Combinatory Logic) interprets a simple programming language similar to Haskell Curry's Combinatory Logic formal system. It includes enough built-in combinators to use {S, K, I}, {B, C, K, W} or {B, M, T, K} as bases.
The programming language has no input or output facilities, depending solely on the user's ability to attach meaning to normal forms of Combinatory Logic expressions.
You could use cl to understand or experiment with Combinatory Logic. It may help you with Raymond Smullyan's book To Mock a Mockingbird.
-B alg | Choose a default bracket abstraction algorithm, one of "curry", "turner", "tromp", "grz", "btmk" |
-C N | treat N (one of S, K, I, B, C, W, M, T) as a non-primitive combinator |
-d | print debug output during graph reduction |
-e | elaborate output during graph reduction |
-L filename | Load and interpret a file named filename |
-m | on exit, print memory usage summary on stderr. |
-N number | Perform no more than number reductions, then return to top level prompt |
-p | Do not prompt for user input. Default prompts. |
-s | single-step reductions |
-T number | evaluate input expressions for number seconds, then return to top level prompt |
-t | trace reductions |
The -L flag allows a user to "pre-load" one or more files full of cl code before the interpreter starts its main interactive loop.
Using a -C flag with an argument of
one of "S", "K", "I", "B", "C", "W", "M" or "T"
turns off the
use of the respective letter as a built-in primitive.
If the letter in question appears in a context where it comprises
an identifier, it gets recognized as a non-primitive "combinator", and doesn't do any reductions.
You can use more than one -C flag on a given command line
to turn off more than one combinator.
The -B flag allows you to choose which a default bracket abstraction
algorithm. Ordinarily, the plain Curry-Feys algorithm gets used as a
default, but using -B with "tromp", "turner", "grz" or "btmk"
will use the selected algorithm in bracket abstractions where the
algorithm doesn't appear explicitly.
Nearly all of the flags have a corresponding interpreter command that allows the users of cl to invoke that behavior from inside the interpreter.
This document describes an interpreted computer programming language greatly similar to Combinatory Logic. It also describes the design and implementation of the interpreter.
This document ends up as a conflation of man page, README and INSTALL files, ex post facto design document, and bibliography. At least it's all in one place.
The world needs another programming language like it needs a hole in the head. Why on earth did I choose to write this?
During late 2005 and the first half of 2006, I wrote an interpreter for a language close to untyped lambda calculus. I know many people have written lambda calculators, and made them available. Most of those implementations have crippling bugs, and I wanted the understanding of lambda calculus that I believed would come from writing an interpreter for it.
My lambda calculator ran fairly slowly. In looking around for ways to speed it up, I discovered graph reduction. A desire to understand graph reduction led me to write this interpreter.
I intend to get the lambda calculator in a form I don't feel ashamed to give away. At that point, I will make its source code publicly available. This interpreter will have to do for now.
I chose to license cl under the GNU General Public License (GPL).
If you want to use cl, you have to download the source code, compile an executable, and (possibly) copy it somewhere.
Click on the following link to download source code ← That's v1.1 of the software.
Click on the following link to download source code ← That's v1.4 of the software. You should probably use v1.4, as it has some bug fixes, and some user interface improvements.
cl source code will probably only compile under Solaris, Linux and BSD operating systems. You may have some luck under other operating systems. cl will probably not build or work under any "Windows" variant.
You have to compile cl from its source. I do not provide any pre-compiled packages.
To do this you will need the following command line programs installed:
./runtests (Tests 45 and 46 run 30 seconds each.)
The unit test inputs, in directory tests.in, actually contain
some amusing examples, including Church numerals, Scott numerals,
and several examples of Y combinators.
The interpreted language has similarities to Combinatory Logic formal systems, at least the "non-illiative" part. This constitutes the part analogous to lambda calculus. It does not have quantifier operators like Ξ, or implication operators like P, so the interpreted language does not contain first order predicate calculus.
I wrote "systems" above, as not every author describes exactly the same thing. Usually, "Combinatory Logic" (or sometimes "Combinator Calculus") consists of a set of "combinators" (for example the set {S, K}) and an applicative structure where adjacent combinators get treated as either operators or operands. Individual combinators have some set number of arguments, and once that number of arguments appear, the combinator acts by deleting, duplicating or rearranging the arguments. See Michael J. O'Donnell's The SKI Combinator Calculus a universal formal system for an example of a formal definition, together with an informal definition.
Different authors choose different "bases", or sets of combinators to use. Modern authors tend to use {S, K} or {S, K, I} as a basis. In the past, {B, C, K, W}, {B, T, M, K} or even {I, J} got used. Authors also differ in allowing non-basis combinators to appear in expressions. Some disallow non-basis combinators on strictly formal grounds: Combinatory Logic does not need them. Some authors allow them even formally. Other authors disallow non-basis combinators formally, but use them informally as abbreviations or place holders.
I intend people to use cl interactively. It does read from stdin and write to stdout, so you could use it as a "filter", but I didn't really design it to work that way. In the following example session, what the user types appears in a bold typeface.
The interpreter prints a CL> prompt when waiting for the user's input.
4:06PM 5 % ./cl
CL> S I I x
S I I x Interpreter prints parsed expression
x x Interpreter prints normal form of expression
CL> W I x
W I x
x x
CL> def myT (C I)
CL> myT a b
C I a b Abbreviation expanded
b a
CL> Ctrl-D End-of-file causes interpreter to exit
4:07PM 6 %
Note that each of the CL expressions the user enters gets reduced to normal form right away, and the interpreter prints that normal form. The user must assign "meaning" to an expression and its normal form. No input or output facilities exist in the language interpreted.
The language interpreted consists of lines of ASCII-encoded textual input. Each line gets subdivided into tokens. Tokens consist of:
[x]. The only place brackets can appear.timertimeoutcountdebugsteptraceelaborateabstractionloadInput expressions can contain
as many matched pairs of parentheses as the user deems necessary,
with the condition that parentheses pairs have to contain at least two
combinators. The parser gives syntax errors on expressions like (S).
Each input expression can contain pre-defined names. The interpreter inserts a pre-constructed parse tree for the abbreviation. This makes an input expression into a "context" as the denotational semantics people talk about.
The interpreter parses an input expression, usually a single line of text, delimited by a "return" or newline. Users can back-slash the newline and have a multi-line expression. Parsing the expression creates a (possibly enormous) binary tree internally. At the receipt of a newline, the interpreter destructively "reduces" the parse tree, using an algorithm referred to as "graph reduction".
Graph reduction changes the parse tree, eliminating, duplicating or re-associating pieces of the parse tree. When no more reductions can take place, the interpreter deems the parse tree in normal form, and prints a nominally human-readable version of it.
Some "interpreter commands" exist, like timer, timeout,
etc. These commands, described in a later section,
do not trigger a graph reduction, and do not require a Combinatory Logic input
expression.
Lexer code recognizes identifiers according to this pattern: [a-zA-Z][a-zA-Z_0-9]*.
Identifiers consist of a letter (upper or lower case) followed by any number of letters, digits or underscores.
Lexically, identifiers look like C or C++ or Java variable names.
Semantically, the class of identifiers consists of primitive combinators (S, K, I, B, C, W), and non-primitive combinators. All combinators have left-associative application.
Some identifiers constitute built-ins, or primitives. They have special properties.
The interpreted language has eight primitive combinators, special terms that do things other than just sit there. It has the usual S, K and I combinators. It has B, C, W, M and T combinators as well. Since this interpreter started off as a test bed for graph reduction, it made sense to include B and C combinators, so as to allow for Turner's bracket abstraction.
I put in W to allow users to experience Curry's favorite basis, {B, C K, W}
I put in M and T combinators after reading To Mock a Mockingbird. I wanted to have the ability to work with the {B, T M, K} basis.
The combinators rewrite terms according to the following rules:
I p → pK p q → pS p q r → p r (q r)B p q r → p (q r)C p q r → p r qT p q → q pM p → p pW p q → p q q
Note that all the combinators require a particular number of arguments
before they "activate". For example, K p q → p,
but K p remains K p
Internally to the interpreter, upon finding a particular combinator,
the parse tree (or "graph") gets modified. The S,
W and M combinators don't copy the duplicated
terms, but rather add a reference to the duplicated
sub-tree and increase its reference count.
The language interpreted by cl can include "bracket abstractions", notations that cause the interpreter to create Combinatory Logic expressions that (given correct arguments) will eventually reduce to a specified expression. Refer to the section below for algorithm details.
Actual bracket abstractions have syntax like this:
CL> [x] (x x x) S (S I I) I S (S I I) I CL>
The [x] notation means something like "calculate an expression
free of variable x, using the default algorithm".
cl calculates that combinatory logic expression,
and uses it in the current reduction. Above, cl
prints the expression free of variable x,
reduces that expression,
then prints its normal form.
You can nest bracket abstractions:
CL> [x] [y] x y S (S (K S) (S (K K) I)) (K I) S (S (K S) (S (K K) I)) (K I) CL>
You can store the expression resulting from a bracket abstraction:
CL> def D [x] x x CL> D a S I I a a a CL>
In short, a bracket abstraction can appear anywhere a single combinator can appear.
You can use different abstraction algorithms. The following shows
what the Grzegorcyzk algorithm gives as an expression for the S
combinator's action:
CL> [p]grz [q]grz [r]grz p r (q r) B (B W) (B B C) B (B W) (B B C) CL>
cl has Curry-Fey, Tromp, Turner, Grzegorcyzk bracket abstraction algorithms.
By default, you get the Curry-Fey algorithm if you don't specify which algorithm for a [x] abstraction
operator to use. The -B <algorithm> command line flag allows a user to choose a default
algorithm at interpreter start up.
The cl interpreter does allow you to use abbreviations, single-identifier names that substitute previously-defined expressions for the abbreviation.
The keywords "define
and "def" (for human convenience) allow a cl
user to create symbolic names for parse trees. These names, when used
at the top-level of the interpreter, cause substitution of the previously
created parse tree for the name.
Typing in define omega (S I I)(S I I)
at the CL> prompt causes the interpreter to keep an unreduced
parse tree. Using the identifier omega in another expression
causes the interpreter to put a copy of the unreduced parse tree in the
location where omega would appear in the parse tree that results
from the other expression.
Substitution for a previously-defined abbreviation only happens during parsing. This means that a def can't change a previously defined abbreviation. For example:
CL> def X S m m r user defines "X" CL> def m (C K K) user defines "m", which appears in previous def CL> X S m m r interpreter prints what previously got defined m r (m r) interpreter prints normal form CL> m x user types new expression with "m" in it C K K x Interpreter prints expression, with "m" substituted for x Intepreter prints normal form CL>
The reduce keyword causes the interpreter to perform a
graph reduction on the parse tree of the expression immediately following it.
reduce some expression returns a parse tree,
so the keyword-and-expression can appear anywhere a combinator can appear.
In particular, it can appear in a def or define
line. define doesn't ordinarily cause a reduction, which
constitutes a valid behavior for a user to want, but occasionally one
wants an expression reduced to normal form before storing against
an identifier.
This means that reduce constitutes a part of the interpreted
language. You can put the reduce keyword in front of any
combinatory logic expression:
CL>reduce S (reduce I) (reduce I) x x x note that the parsed expression already appears in normal form x x CL>
I included reduce in the interpreter to allow a user to
store the normal form of an expression:
CL> def twoX (reduce S I I x) CL> CL> twoX x x what the interpreter stored - normal form x x normal form identical to what got stored CL>
Judicious use of reduce can actually pervert the reduction process,
changing it from "normal order" to "applicative order".
You can use reduce inside a bracket abstraction's specified
expression. The reduced expression gets bracket-abstracted.
cl has the usual sort of functional-language-interpreter read-eval-print loop. Identifiers in the input that lexically match previously defined "abbreviations" get replaced by copies of the parse tree stored against that "abbreviation". This makes the top-level loop into a "context", as the functional programming people say. The parse tree for the abbreviation gets substituted during parsing, not during evaluation, so abbreviations really only exist at the top level.
The top level loop performs this series of steps repeatedly:
You can interrupt (usually control-C on the keyboard) the top level loop to exit the interpreter, or you can give it an end-of-file (usually control-D). No explicit "exit the interpreter" command exists.
Due to the location of the magic error production in
the yacc grammar, a syntax error can cause (what the human user perceives
as a) single expression to get parsed as two expressions. Two
reductions get performed, which can confuse.
Some expressions (like W W W or M M) do not have a normal form. The graph reduction algorithm will not terminate given those expressions as input. The timeout or count interpreter commands give the graph reduction algorithm a finite amount of time to work or distinct number of reductions to perform, interrupting it at some point. In the case of an interrupt, the interpreter will not print an expression's normal form. It will print whatever the graph reduction algorithm has created.
load - read in and interpret a file of cl input lines.
Usage: CL> load "some.file" Users must double-quote the file name so loaded.
timer - toggle the printing of elapsed time of both graph reduction and bracket abstraction.
timeout - set a length of time for the graph reduction algorithm to run. Usage: CL> timeout 15
step - toggle single-stepping, actually "single-reductioning". Enter trace or debug to enable output
during single-stepping.
trace - toggle the interpreter's ability to perform a run-time trace of graph reduction.
A "trace" shows the expression after each reduction, with no further annotations.
debug - toggle the interpreter's ability to perform a run-time trace of graph reduction.
Lots of information about what the graph reduction algorithm does gets printed, as well as a somewhat
annotated expression before and after each reduction.
elaborate - toggle printing copious data about each node in the parse tree during debugging.
Goes well with debug.
count - perform a finite number of reductions, then stop. Usage: CL> count 1000 abstraction - choose default bracket abstraction algorithm. Usage: CL> abstraction tromp The step interpreter command causes cl to "single step" through a graph reduction. Each "step" consists of a built-in primitive performing its action. cl does not consider examination of an application node of the parse tree as a "step".
trace, debug and elaborate keywords control what output occurs during single-stepping with trace producing the least-detailed output. Confusingly, cl will give no output during single-stepping unless you tell it how much output you want with one of these three keywords.
The step command causes the interpreter to print a prompt (continue?) before every reduction takes place. What the interpreter does depends on the user's input:
trace has the most utility when you want to see how a particular
expression gets reduced. debug has more utility when trying
to figure out what went wrong with the graph reduction algorithm,
and elaborate can help figure out reference counting
problems. step along with trace or debug
probably has the most use when trying to watch a large expression get
reduced.
You can issue these three commands without an accompanying step command, and you will see the same output you would during single-stepping.
The abstraction <algorithm> interpreter command allows a user to switch default bracket abstraction algorithms during the lifetime of an interpreter. "curry", "turner", "tromp", "grz" and "btmk" constitute all the valid values of algorithm
After invoking this command, the interpreter will print an elapsed time for all graph reductions and bracket abstraction calculations.
Set a time limit on how long the reduction of an expression can proceed. Prevents the interpreter from running forever reducing an expression with no normal form.
Allow the interpreter to perform only a finite number of combinator reductions before terminating graph reduction. This can keep the interpreter from running forever on expressions with no normal form.
The count command requires a number as an argument:
CL> count 100 CL> M M M M Reduction limit M M CL> count 0
Setting the "maximum reduction count" to 0 (zero) turns counted reduction limiting off.
The -L command line flag causes the interpreter to
read in and interpret some outside file full of cl input.
More than one -L flag can appear: the files get read in
in order of appearance on the command line.
Syntax errors in the input do not cause the interpreter to exit,
nor do they "roll back" what a file's input does: once a def
or reduce gets interpreted, its effect lasts.
A user can access the same function via the load "filename"
command during interaction with the interpreter. At any CL>
prompt, a user can enter a command like:
CL> load "tests.in/test.048"
You must double-quote the file name. Non-absolute file names get opened relative to the interpreter's current working directory.
Since actual parsing of input text makes up such a small part of the run-time of any given reduction, I used old-time compiler construction favorites lex (actually flex) and yacc (actually bison or byacc), to tokenize and parse respectively. Using flex or lex limits the interpreter to ASCII input.
The yacc productions structure the input code. Input code essentially constructs a binary tree representing the parsed input expression.
Flex (or lex) does the hard part of reading input bytes. Both lexer-generators create code that understands the difference between "human user input from a keyboard" and "input from a file", and buffer input appropriately.
I chose the grammar to make the language as much like textbook Combinatory Logic as possible: combinators as single letters, whitespace separates tokens names. The grammar allows flexible parenthesizing. Since non-primitive combinators can appear, and have more than 1 character, whitespace or parentheses must separate tokens.
You can't redefine key words or use them as plain combinators. This constitutes the biggest wart in the "programming language", outside of its complete lack of I/O facilities.
Comments consist of a '#' (octothorpe) and any number of characters, to the end of the line on which the octothorpe appears. Lexer code recognizes comments, and treats them as whitespace. Nothing about comments appears in the Yacc productions below because of that.
Some productions exist to handle errors. Some productions exist to make it easier to type things in.
program → stmntprogram → program stmntprogram → errorstmnt → expression end-of-linestmnt → "def"|"define" IDENTIFIER expression end-of-linestmnt → interpreter_commandstmnt → end-of-lineinterpreter_command → "timer" end-of-lineinterpreter_command → "elaborate" end-of-lineinterpreter_command → "debug" end-of-lineinterpreter_command → "trace" end-of-lineinterpreter_command → "step" end-of-lineinterpreter_command → "load" "file name" end-of-lineinterpreter_command → "timeout" number end-of-lineinterpreter_command → "count" number end-of-lineinterpreter_command → "abstraction" algorithm name end-of-lineexpression → applicationexpression → termexpression → "reduce" expressionexpression → bracket_abstraction abstraction_algorithm expressionabstraction_algorithm → algorithm nameabstraction_algorithm → εapplication → term termapplication → application termbracket_abstraction → '[' IDENTIFIER ']'term → constantterm → IDENTIFIERterm → '(' expression ')'constant → PRIMITIVE
Lexer code says that the usual single-letters constitute
combinators (S, K, I,
B, C,T, M, W), and that
Yacc-generated code should consider them to have a PRIMITIVE
token type. See production 29. Command line flags can "turn off" the treatment
of those single letters as combinators, in case you want to ensure
that you only use S and K combinators,
or some such asceticism.
The grammar includes explicit "end-of-line" tokens so that the lexer can pass that token when it reads a real newline character, or when it finds an octothorpe-prefixed comment. The lexer consumes backslashed-real-newlines and does not pass an end-of-line to the parser. A semicolon character could also cause an end-of-line, which would allow multiple expressions per physical line of input.
Production 1 allows recognition of the first statement a user inputs,
production 2 recognizes every other statement.
Production 3 allows Yacc or Bison to write out code to "unwind"
to a stmnt when a syntax error occurs.
Production 7 allows empty lines to occur without giving a syntax error.
Production 5 allows you to save a parse tree to the environment,
an implicit dictionary of named parse trees.
The code in the action for Production 27 looks up all IDENTIFIER
tokens in the environment. When the code finds an IDENTIFIER
in the environment,
a copy of the previously-saved parse tree gets used as a term non-terminal.
The code in the actions for productions 4 and 19 pass a parse tree to the graph reduction function. This occurs when the parser sends an end-of-line back to the Yacc-generated code. The keyword "reduce" (production 19) exists to trigger a graph reduction before the "def"/"define" keywords save the parse tree to the environment.
Bracket abstraction (production 20) actually gives back a parse tree, so a bracket-abstraction expression can appear anywhere a regular term can.
Productions 21 and 22 exist to allow the use of other than the default bracket abstraction algorithm during run time. Production 16 allows you to change the default algorithm during run time.
Productions 17, 18, 23, 24, 26 through 29 define an "applicative structure" where juxtaposition implies the application operation, and application associates to the left.
Single-stepping, time out and reduction counting leave the graph
reduction algorithm using a setjmp() call, and
three siglongjmp()
function calls. These calls share the use of a single jmp_buf structure.
Single-stepping gives the cl user the choice of bailing out
of the current graph reduction at every step.
If the user chooses to quit reduction, a longjmp()
returns the user to the
interpreter's top level loop.
Time outs on graph reduction happen via a call to the alarm() system call.
When the time out expires, the operating system gives the process a SIGALARM,
involving an "upcall" to the signal handler function.
The signal handler calls longjmp()
to return to the
interpreter's top level loop.
This ties the graph reduction function to the Yacc-generated parsing code via signal handling.
The yacc-generated parser ends up composing a binary tree
of struct node elements.
struct node {
int sn;
enum nodeType typ;
enum combinatorName cn;
const char *name;
struct node *left;
struct node *right;
struct node **updateable;
int branch_marker;
int examined;
int refcnt;
};
struct node instances make up both the interior and
the leaf nodes of a parse tree.
When traversing the graph, the interpreter makes a
distinction based on the value of the
enum nodeType typ; element:
The Yacc-types defined by the grammar, rather than the combinator types,
prevent using different C structs for
"leafs" and "interior" nodes.
enum nodeType { UNTYPED, APPLICATION, COMBINATOR };
APPLICATION instances should appear only as interior nodes
of the parse tree, while COMBINATOR nodes should appear
only as leaf nodes. UNTYPED exists to allow
struct node instances on a free list
to have a type that
should not appear in a correct parse tree. Any appearance of an
UNTYPED node in a parse tree constitutes an error.
In accordance with the usual nomenclature of Combinatory Logic, all
terms, whether they have names of built-in combinators
(S, K, I,
B, C, T, M, W) or not
have type COMBINATOR. Mathematical writings about
Combinatory Logic seem split over the issue of including
non-built-in-combinators
in the official language describing Combinatory Logic.
Some authors allow non-built-in-combinators, some limit the language
to S and K only,
some allow I as well.
Practically, every
author who deals with Combinatory Logic uses non-combinators (with names like 'P', 'Q', 'R',
'x', 'y', etc) to at least denote arbitrary Combinatory Logic terms.
So, the enum combinatorName cn field of the struct gets used to
determine if the combinator in question does a reduction or not.
The three fields struct node **updateable, int branch_marker
and int examined
all get used during graph reduction.
See spine stack notes.
The field int refcnt comprises a typical reference count:
a nominal "freeing" of one of these structs decreases the reference count.
When the reference count goes to zero, the code puts the struct on
a free list, for reuse later. This constitutes extremely explicit
custom memory management.
The field int sn contains a "serial number", immutable
through a particular reduction. It exists solely to aid debugging.
Except for memory address of a struct, no other way exists to tell
one I combinator from another, which can confuse the
human doing debugging.
The way that I and K combinators rewrite expressions drives the use of a "dummy root node".
Expressions like I a b have a parse tree like this:
Arrows represent the left and right elements
of struct node instances.
The reduction of I a to a just changes the left-hand pointer
of the application node to the a leaf node. Reference counts change as
well, ultimately looking like this:
If an node's reference count falls to zero, the reference counts of
any left and right elements get decremented.
A zero-reference-count node gets put on the struct node
free list.
The left element of the struct node that comprises
application node 1 in the image above changes. Instead of containing a pointer
to the I a application node, it contains a pointer to the a
combinator node.
For an expression like I a, a dummy root node has to exist
to allow the code to not check every time for whether it has to
change a pointer, or do something special for the (non-dummy)
root node.
Expressions like K a b have the same problem as I a.
As I read Philip Koopman's work, I think he left extra I nodes around, so as to not have to deal with any irregularities like dummy nodes during reduction. The expression printing routine does have to deal with the dummies. I did not find dummy root nodes especially onerous.
This interpreter uses the "Strings as Atoms" concept from David Hanson's book C Interfaces and Implementations.
The lexer code looks up all input strings in a hash table.
The hash table lookup
returns a pointer to a permanent (malloc'ed) copy of the string.
Should a lexically-identical
string already exist in the hash table, the hash table-insert returns
a pointer to the existent string in the hash table.
This is analogous to Java's String.intern().
No strcmp() calls appear anywhere, using the "==" arithmetic equality test
works on two potentially lexically equal "Atoms".
The code does not have to explicitly deallocate strings except in the hash table deallocation.
The code does have to pay attention to the read-only nature of "Atoms".
The main benefit of this occurs in the bracket abstraction algorithms. Checking a given combinator for lexical equality with an abstracted variable only involves an arithmetic equality test.
I used a C implementation of Per Ake Larson's dynamically expandable hash table. You can find a description of this hash table in: "Dynamic Hash Tables" by Per-Ake Larson, Communications of the ACM, April 1988, volume 31, number 4.
A single instance of this hash table holds both strings (as
Atoms) and parse trees (which have pointers to struct node
as roots) named by strings.
Strings which never get used in a define appear in the
hash table without an associated parse tree.
This structure constitutes a single-chaining hash table which has a multiple-of-two number of chains ("buckets"). A multiple-of-two makes it really easy to determine which chain to start looking in, but it has the disadvantage of doubling the number of buckets whenever the number of items per chain gets too high.
I used Dan Bernstein's DJB2 hashing algorithm. The hash table uses ASCII-Nul-terminated C strings as keys, so DJB2 seems like a good choice.
Memory management comprises one of those areas of programming
that causes consternation.
I ended up with a three-level memory management scheme.
On the first level, the interpreter uses good ol' malloc and free
to obtain heap memory. On the second level, it uses a stack of
"arenas", (at least) page-sized blocks of memory. After every graph reduction,
the interpreter "resets" the arenas. Arenas give out smaller allocations
of memory, but you never bother freeing them: you just reset the arenas.
On the third level, the most-frequently-allocated C-language data structure,
struct node has reference counts, and its own free-list
based reuse.
struct spine_stack, possibly expensive-to-allocate
structures, have their own free-list based reuse.
struct memory_arena {
char *first_allocation;
char *next_allocation;
int sz; /* max free size */
int remaining; /* how many bytes remain in allocation */
struct memory_arena *next;
int usage_info;
int sn; /* serial number: malloc order */
};
"Heap" memory gets allocated on demand into (multiples of) one-page sized "arenas" as per the David Hanson paper Fast Allocation and Deallocation of Memory Based on Object Lifetimes, rather than the implementation given in his later book C Interfaces and Implementations, chapter 6. I had read both, and didn't have either on hand when I wrote the arena code, so I ended up with a paper-like, rather than book-like implementation.
The list of arenas gets reset at the conclusion of every read-eval-print loop.
Note the use of "reset" and not "deallocated". Once a cl interpreter
allocates a page-sized block of heap memory, it retains a reference to it until
the interpreter exits. The cl interpreter keeps a list of arenas,
When the interpreter needs a new chunk of memory, it walks the list of arenas
in order of their allocation,
searching for an unused piece of arena. Should the search fail, the interpreter allocates a new arena
using malloc().
Each arena keeps a pointer to the unused remainder of its allocation
(next_allocation field), and two numbers:
the size of the allocation (sz field) and the amount of the arena already given away
(remaining field).
A "search" involves comparing the amount of memory (in bytes) with what free space
the allocation has. This implementation makes no attempt at "packing" chunks of
memory based on requested size.
It gives back requested memory first-come, first-served.
Instances of
of struct node
and struct spine_stack get allocated and discarded almost constantly during
graph reduction. The pairs of functions that allocate and deallocate
those two structs cooperate by using a free list.
During input parsing and graph reduction, the functions that give back "new" instances
of struct node
and struct spine_stack actually try to use a previously-allocated instance
from a free list. Should that struct's free list not have any entries,
the functions get an allocation from an arena (for struct node) or
malloc() (for struct spine_stack).
Should the search through all available arenas not turn up a suitable amount of previously
unused memory, the interpreter allocates a new arena using malloc.
Deallocation functions for these data structures just push a "free" structure
on a simple, linked-list based, FIFO stack. The function free_node(struct node *)
decrements reference counts, pushing the "free" structure on the FIFO stack
only when its reference count reaches zero. If the struct node instance
in question has type of APPLICATION, its left and right child nodes
get "freed".
Parsing input produces a binary tree, with all interior nodes representing function application, and all leaf nodes representing combinators, either built-in primitives or arbitrary non-primitives. To calculate a the normal form of an expression, the cl interpreter destructively re-writes the binary tree, a process known as "graph reduction".
I implemented graph reduction in a single non-recursive function,
void reduce_graph(struct node *graph_root);.
The keys to this function lie in the following data structure:
struct spine_stack {
struct node **stack;
int top;
int maxdepth;
int size;
int sn;
struct spine_stack *prev;
};
The struct node **stack, top and size
elements comprise a last-in, first-out "spine stack".
Graphically and informally,
the left-hand side of a parsed Combinatory Logic expression
makes up the "spine" of the expression. The cl interpreter
iteratively pushes application-type struct node on the "spine stack",
following the left pointer of the struct node
comprising an application-type node.
The interpreter changes the graph when it has pushed all the
application-type nodes on the stack and has reached a combinator-type
node.
After performing whatever reduction the combinator-type node specifies, the graph reduction "pops" one or more nodes off the spine stack. The depth of the spine stack may disallow a given reduction, and also prevents "popping" past the top of the spine stack.
The top and size elements of struct spine_stack
allow resizing of the memory pointed to by the stack element.
This gets done in the pushnode() function, when adding a new element
to the spine stack would cause an overflow.
Function pushnode() allocates heap memory for stack
using malloc() and realloc(), standard C library functions.
pushnode() uses these library functions to guarantee
contiguous blocks - stacks can get very deep.
The prev element performs two functions:
struct spine_stack instances get allocated using malloc(), the
free list keeps them around for more than the reduction of a single expression.
The free list for struct spine_stack instances lasts the entire life
of a cl process. This differs from the free list for struct node
instances, which lasts for the reduction of a single expression.
I suspect that a malloc() based allocation scheme has a greater run-time
cost than the arena-based allocation scheme used for nodes in the parse tree.
Keeping the spine stack free list around for an entire interpreter run amortizes this
run-time cost over many graph reductions, just as the arenas' allocation time
get amortized.
The examined, branch_marker
and updateable fields of struct node only make sense
in the context of stack-of-spine-stacks. Consider this expression: S (I a).
An instance of struct node with
typ == APPLICATION has the address of the application node of the
(I a) subexpression as the value of its right element.
Since the S combinator has only one argument (the (I a) subexpression),
cl cannot perform graph reduction using S. It can reduce (I a).
Pushing right on the spine stack won't work: the I reduction would
end up updating the left element of the application node to point to a.
Enter the use of examined, branch_marker
and updateable fields of struct node.
examined.node->updateable = &node->right)
Any reductions will change stack_top->updateable to point to the new sub-graph.
Setting node->updateable = &node->left)
happens in a normal, left-node push onto the spine stack.
Setting node->updateable = &node->right)
happens when the graph reduction creates a new spine stack to
traverse the tree pointed to by a right element
of an application-type node.
Having S, M and W combinators create shared subexpressions rather than copying them leads to the ability of cl to do a single reduction that appears in multiple subexpressions.
Tracing reduction of S I I (M I I) shows that the M reduces once appearing in 2 subexpressions, and the (I I I) reduces once also.
CL> S I I (M I I) S I I (M I I) I (M I I) (I (M I I)) (M I I) subexpression shared, not duplicated by S-reduction M I I (I (M I I)) I I I (I (I I I)) M-reduction shows up in both apparent subexpressions I I (I (I I)) (I I I) reduces to (I I) in two places as well I (I (I I)) I (I I) I I I I CL>
The standard, obvious algorithm.
I believe that I have only implemented a subset of the real Turner algorithm. I found hints in the literature about a more elaborate Turner algorithm, but I have no access to back issues of the expensive Software: Practice & Experience to verify.
The existence of B and C combinators in this algorithm prompted me to include them as distinct primitives in cl
John Tromp created a 9 rule algorithm which optimizes for the minimum number of S and K combinators in the abstracted expression. Tromp wants to get a small self-interpreter, and he confines himself to just S and K combinators. For example, the "S K M → S K" rule could change to "S K M → I", if you have an I primitive. Also, the "(M L)(N L) → S M N L" rule undoes a reduction that will get re-done at reduction-time anyway. This seems like a classic time versus space tradeoff in the small, to me.
I don't entirely understand what phrases like M, N combinators means in rules 6, 7 and 8. I chose to interpret those phrases as "M, N represent single, built-in primitives". I also implemented rule 8 by doing a depth-first traversal of parts of parse trees to determine if the potential "L" portions of the rule really do equate. A faster method of doing that may exist.
cl includes an implementation of M. Price and H. Simmon's reconstruction of Grzegorcyzk's "g" algorithm, using only B, C, K, W, I combinators.
If you consider W K as a replacement for I, you can get away with only using B, C, K and W.
I put in an algorithm that only uses B, T, M and K combinators. I devised it, but presumably someone else has come up with this same algorithm in the past. The {B, T, M, K} basis is not unknown.
Perhaps after reading To Mock a Mockingbird, you want to add the L combinator. Steps to follow:
COMB_L to enum combinatorName, which appears in node.hCOMB_L to new_combinator(), which appears in node.cswitch in reduce_graph(), appearing in file graph.cL_as_combinator.
L_as_combinator to file grammar.y. Fix up command line parsing and usage function, too.
Of these steps, (2), adding code to the switch,
will cause the most work. The L combinator reduces like this:
L x y → x (y y)
Just like W, L can only cause
a reduction if the spine stack has a depth greater than 3.
In fact, the code for L should look similar to
the code that exists for W.
The code for the new COMB_L block will have to allocate
application nodes, and de-allocate the L x y
parse tree. The y node will end up with an incremented
reference count, as the action of L duplicates it.
Steps to follow:
struct afmp[], a string naming the algorithm, for use in
the interpreted language, and a pointer to the function that allows entry to the algorithm.TK_ALGORITHM_NAME when it recognizes the string.Every new "algorithm name" becomes a keyword in the interpreted language, so have a care in selecting names.
If you just find an expression for the action of S with your chosen basis and use it in the Curry-Fey algorithm, you end up with grossly inefficient bracket abstraction: the resulting expression will contain vastly more primitives than it needs to. You want to take advantage of the primitives in your chosen basis to do special things, like B and C in the Turner bracket abstraction algorithm.
Note that this section merely explains how to derive, it does not describe an algorithm. The "working backwards" part is not algorithmic.
x from x.
This usually ends up as [x] x → I.
x from N,
an expression that does not have x in it.
This usually ends up as [x] N → K N.
x from (P Q),
where x doesn't appear in P. Your chosen basis may have an efficiency advantage here.x from (P Q),
where x doesn't appear in Q. More efficiency may appear here.x from (P Q),
where P and Q might comprise applications themselves.
This case constitutes the heart of the matter, and you may end up with cases where P
contains x, but Q does not, and vice versa. Recursion usually shows up
in this case, as [x]P and [x]Q show up in the right-hand-side of the
transformation rule(s).
You want to figure out some expression R so that R(([x]P) ([x]Q)) x
ends up reducing to ([x]P) x (([x]Q) x). Note that the S combinator does this
exact reduction: S P Q x → P x (Q X).
The best way I've discovered to do this: start with the expression P x (Q x) and "work backwards"
to some expression that contains P and Q, and, when applied to x
reduces to P x (Q x).
Turner's algorithm and Tromp's algorithm show that a handling a few special cases may make a big difference. In Turner's case, he used B and C combinators to make more efficient the different sub-cases of term application.
Chapter 11, "Birds Galore" of To Mock a Mockingbird and section 2.2 of Combinatory Logic in Programming give an explanation of "working backwards". The mental sensation of working backwards bears kinship to solving ordinary differential equations, or deciding what form of integration to use.
As an example of "working backwards", here's my example of getting to Grzegorcyzk's bracket abstraction for applications.
| 1. | P x (Q x) | initial expression | ||
| 2. | C P (Q x) x | Prefix with C and reverse its action. | ||
| 3. | (C P) (Q x) x | Make explicit parenthesizing. | ||
| 4. | B (C P) Q x x | Prefix with B and reverse its action. | ||
| 5. | (B (C P) Q) x x | Add explicit parentheses. | ||
| 6. | W (B (C P) Q) x | Prefix with W and reverse its action. |
When you get to the expression
W (B (C P) Q) x, you can get to the
bracket abstraction rule (#6, above)by using [x]P for plain P,
and [x]Q for plain Q. Note that merely putting in S
expressed in terms of B, W, C, K would give you a substantially less
efficient bracket abstraction, at least for applications where x appears in
both terms.
You only add combinators as prefixes.
You must explore parenthesizing alternatives in light of the goal.
In the example above, I have the goal of moving both appearances of x in the initial
expression to the end of the expression.
Once that happens, figure out how to consolidate the occurances
into a single x.
Given the limited number of options available at any one juncture, this might be a good place to try some genetic programming type of thing, or a tree-search.
All of the following testing performed on a 1350 MHz AMD Athlon based motherboard. 512 Mb of PC133 non-ECC memory, 2931852 Kb of swap space. I used Linux kernel 2.6.22.1. Compiler versions as follows.
I used gcc -I. -g -O -Wall to compile 4 (CVS tagged) versions:
I did tests based on adding Church Numerals, represented in Combinatory Logic. I did not use the usual "add" function, which cleverly does not require recursion. Rather, I wrote an addition function that recurses using a fixed-point combinator,
The calculations performed for the tests:
| A | Add Church numeral for 63 to Church numeral for 0 | |
| B | Add Church numeral 0 to Church numeral for 63 | |
| C | Add Church numeral for 63 to Church numeral for 63 | |
| D | Add Church numeral for 126 to Church numeral for 0 | |
| E | Add Church numeral for 126 to Church numeral for 126 |
| A | B | C | D | E | |
|---|---|---|---|---|---|
| 1 | 0.070 | 0.000 | 2.503 | 11.213 | > 1200 |
| 2 | 0.157 | 0.002 | 4.556 | 22.167 | > 130 |
| 3 | 0.102 | 0.001 | 3.335 | 19.630 | > 150 |
| 4 | 0.055 | 0.000 | 0.057 | 0.392 | 0.383 |
| 5 | 0.060 | 0.000 | 0.061 | 0.429 | 0.425 |
| 6 | 0.065 | 0.001 | 0.065 | 0.455 | 0.447 |
The elapsed times don't constitute reliable statistical indicators: they represent only a single run. Nevertheless, the consistency exhibited does give me some confidence.
All elapsed times in seconds. I lost patience on test E with version 1, 2 and 3, and interrupted them from the keyboard.
Three things stand out:
reduce_graph() maybe causing cache misses.I'd have to say that the iterative graph reduction probably has a bigger constant than the recursive graph reduction, so any performance benefits probably don't show up until it does a huge graph reduction like test E. I found iterative graph reduction far easier to get correct than recursive graph reduction, so I'd take it even with some performance loss.
The free-lists seem like the biggest performance increase. Arena allocation actually
has benefits in preventing memory leaks, even though it doesn't seem like a big
performance booster. Since spine stacks and parse-tree nodes have different
free-lists, and spine stacks get allocated with malloc(),
I could do some more work here to decide which caused the biggest
performance gain.
It appears that iterative graph reduction caused a performance decrease. I found it far easier to get the iterative graph reduction correct than to get the recursive graph reduction correct, so despite the performance hit, I'd take iterative graph reduction.
I compiled the version tagged Version1_0 with 3 different compilers:
The row for "gcc -O" should compare directly with row 6 in the "cl version timings" table above.
| A | B | C | D | E | |
|---|---|---|---|---|---|
| tcc | 0.145 | 0.002 | 0.144 | 1.028 | 1.025 |
| lcc | 0.118 | 0.002 | 0.116 | 0.886 | 0.825 |
| gcc | 0.156 | 0.002 | 0.158 | 1.105 | 1.123 |
| gcc -O | 0.072 | 0.001 | 0.073 | 0.504 | 0.543 |
| gcc -O2 | 0.063 | 0.001 | 0.063 | 0.436 | 0.431 |
Compilers do seem to produce different quality code, with respect to execution speed on this set of tests.
The algorithm improvements section shows how cl got better/worse with changes in the underlying algorithm. Showing pure reduction speed has some merit when comparing the executables produced by various compilers. Since cl has a -N <number> option, I can time a certain number of reductions of an expression. The easiest way to time a large number of reductions uses an expression with no normal form, but also one that cycles through a number of non-normal forms.
Naively, expressions like S I I (S I I)
might seem to work. They don't, as the second I combinator
don't get reduced in pure normal-order evaluation:
S I I (S I I) I (S I I) (I (S I I)) S I I (I (S I I)) I (I (S I I)) (I (I (S I I))) I (S I I) (I (I (S I I))) S I I (I (I (S I I))) I (I (I (S I I))) (I (I (I (S I I)))) I (I (S I I)) (I (I (I (S I I)))) I (S I I) (I (I (I (S I I)))) S I I (I (I (I (S I I)))) ...
Not only does it not have a normal form, it increases in size during "reduction". Increasing in size means that cl spends time allocating new arenas for the increasing number of nodes in the expression.
I want a combinatory logic expression that uses all the primitives possible, and cycles through one or more intermediate expressions. The following fit that description to a lesser or greater degree:
| Cycling expression | Period | Pattern | |
|---|---|---|---|
| 1. | M M | 1 | M |
| 2. | W W W | 1 | W |
| 3. | W I (W I) | 2 | WI |
| 4. | W T (W T) | 2 | WT |
| 5. | B I M (B I M) | 3 | BIM |
| 6. | W (W K) (W (W K)) | 3 | WWK |
| 7. | W (C K K) (W (C K K)) | 3 | WCK |
| 8. | S T I (S T I) | 4 | STI |
| 9. | S T (I I) (S T (I I)) | 4 | STII |
| 10. | W (B (T M) K) (W (B (T M) K)) | 4 | WBTK |
| 11. | B (T M) K (M (B (B (T M) K) M)) | 5 | BBTKM |
| 12. | B (K (S K K) y) (K M z) (B (K (S K K) y) (K M z)) | 6 | BKSKKM |
| 13. | W (B ((C (W K)) M) K) (W (B ((C (W K)) M) K)) | 6 | WBCWKK |
| 14. | C (S (C C) (C C)) (C (S (C C) (C C))) (C (S (C C) (C C))) | 6 | CSCCCC |
| 15. | B (K (S K K) y) (K (I M) z) (B (K (S K K) y) (K (I M) z)) | 7 | BKSKKIM |
| 16. | C C (S (C C) (C C)) (C C) (C C (S (C C) (C C)) (C C)) (C C (S (C C) (C C)) (C C)) | 9 | CCCCSCCCC |
| 17. | B W (W (B (B (C (W K))))) (B W (W (B (B (C (W K)))))) (C K K) | 10 | BWWBBCWKCK |
| 18. | B B C (C C) (C C) (C C (B (B W) (B B C) (C C) (C C)) (C C)) (C C (B (B W) (B B C) (C C) (C C)) (C C)) (C C (B (B W) (B B C) (C C) (C C)) (C C)) | 14 | BBCCCCCCCCCBBW |
| 19. | W (B (C (C C) (C (C C) (C (C C) (C C)))) (C (C C) (C (C C) (C C)))) (C (C (C C) (C (C C) (C C))) (W (B (C (C C) (C (C C) (C (C C) (C C)))) (C (C C) (C (C C) (C C)))))) (C (C (C C) (C (C C) (C C))) (W (B (C (C C) (C (C C) (C (C C) (C C)))) (C (C C) (C (C C) (C C)))))) | 30 | WBCCCCCCCCCCCCCCCCCCCCCCCCCCCC |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|
| Period | 1 | 1 | 2 | 2 | 3 | 3 | 3 | 3 | 4 |
| tcc | 0.275 | 0.271 | 0.242 | 0.304 | 0.285 | 0.279 | 0.306 | 0.323 | 0.323 |
| lcc | 0.218 | 0.211 | 0.194 | 0.251 | 0.231 | 0.232 | 0.261 | 0.267 | 0.243 |
| gcc | 0.289 | 0.286 | 0.260 | 0.328 | 0.320 | 0.296 | 0.330 | 0.341 | 0.311 |
| gcc -O | 0.136 | 0.120 | 0.118 | 0.152 | 0.138 | 0.131 | 0.146 | 0.154 | 0.141 |
| gcc -O2 | 0.106 | 0.108 | 0.098 | 0.125 | 0.119 | 0.111 | 0.124 | 0.131 | 0.117 |
| 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | Mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Period | 4 | 5 | 6 | 6 | 6 | 7 | 9 | 10 | 14 | 30 | |
| tcc | 0.317 | 0.329 | 0.329 | 0.310 | 0.390 | 0.295 | 0.385 | 0.329 | 0.366 | 0.376 | 0.318 |
| lcc | 0.258 | 0.268 | 0.255 | 0.248 | 0.307 | 0.246 | 0.310 | 0.263 | 0.294 | 0.300 | 0.256 |
| gcc | 0.345 | 0.360 | 0.344 | 0.330 | 0.396 | 0.343 | 0.389 | 0.346 | 0.383 | 0.381 | 0.336 |
| gcc -O | 0.153 | 0.163 | 0.147 | 0.149 | 0.181 | 0.146 | 0.189 | 0.164 | 0.175 | 0.175 | 0.151 |
| gcc -O2 | 0.132 | 0.134 | 0.129 | 0.124 | 0.156 | 0.120 | 0.153 | 0.137 | 0.164 | 0.147 | 0.128 |
It appears that for cl, un-optimizing gcc produces worse code than lcc or tcc do. Both lcc and tcc compile cl far faster than gcc with any set of options I found, so I tended to use them during development to shorten the edit-compile-test-think cycle. The gdb debugger does not work on executables produced by tcc or lcc, so I did have to revert to compiling with gcc on occasion.
A general trend of increasing time with period of the cycling expression exists, probably due to the increased proportion of pointer chases with the larger expressions that have longer periods.
The I combinator appears cheaper to reduce than T: cycling expressions 3 and 4 differ only in executing I and T. Reducing a T requires creating a new application node in the parse tree, while I only involves swinging a pointer. Comparing expressions 8 and 9 reinforces this: 9 reduces a higher proportion of I combinators than 8. cl does 500000 reductions of 9 faster than 8.
The difference between compilers seems more dramatic than I would have thought, the slowest cl doing about 1.5 million reductions per second, and the fastest about 3.9 million.
I wrote the interpreter in ANSI C. Why use ANSI C in 2007, the post-object-oriented-paradigm era? First, C implementations of anything typically have at least decent, and usually good performance. Second, many quality implementations exist. Third, pointer arithmetic helps you keep your chops up. Fourth, I always have fun programming in C.
Finally, this interpreter does not consist of all that many files or lines of code. It does not fit the "programming in the large" paradigm. I don't want to deal with all the "object-oriented sit-ups" that using Java or even C++ would mandate. Essentially, I've traded-off any benefits that data hiding, modularization, inheritance and exception handling might confer for the absence of abstraction-induced complexity.
Despite my assertion above that this project does not constitute programming in the large, I did find several opportunities for code reuse. The hash table implementation, code for abbreviations, and the strings-as-atoms code all came virtually unchanged from an earlier interpreter I wrote. I find this particularly gratifying.
I chose not to use GNU autotools. Even at the beginning of this project, I knew I wanted to use non-GNU tools, like yacc and byacc, and compilers like lcc and tcc. Once you put software into the "autotools" paradigm, you make the whole thing more portable, but only to systems with GNU tools and compilers. The code for cl does not do anything fancy, so using "autools" constitutes some overkill.
I gave statCVS a try, to see what information hides in version numbering and check ins.
statCVS output - not terrifically informative as far as bugs fixed, because I don't track that during development. I don't have a specification to count a bug against, after all. This remains interesting none the less. Note the "step function" appearance of lines-of-code graph. I only checked in a set of changes that pass all my unit tests. Also, I rarely get to work more than 15 minutes at a time on my personal projects.
The following data generated using David A. Wheeler's 'SLOCCount'
| ANSI C | 1770 (65.05%) |
| yacc | 447 (16.43%) |
| lex | 283 (10.40%) |
| perl | 140 (5.15%) |
| makefile | 57 (2.09%) |
| sh | 24 (0.88%) |
Total Physical Source Lines of Code (SLOC) = 2,721 Development Effort Estimate, Person-Years (Person-Months) = 0.57 (6.87) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 0.43 (5.20) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 1.32 Total Estimated Cost to Develop = $ 77,287 (average salary = $56,286/year, overhead = 2.40).
SLOCCount says that development took 6.87 mythical man-months over 5.2 calendar months. statCVS says that I took 17 calendar months.
I used lcov to generate coverage output. The script-driven testing executes 69% of the code.
After a few easy, interactively-driven tests, lcov shows 80% coverage. Only built-in testing code, triggered by a currently undiscovered bug, and malloc failure handling code remains uncovered.
Careful examination of the lcov output during development revealed a few blocks of dead code. It also revealed need for unit tests to trigger previously-un-unit-tested blocks, and an actual bug introduced by changing Yacc productions.
lcov, SLOCCount and statCVS differ on the size of the code:
| lcov | SLOCCount | statCVS | |
|---|---|---|---|
| Lines | 1452 | 2721 | 5125 |
This implies a comment ratio of (5125 - 2721)/5125 = 0.47 which isn't too shabby, except that it includes the wordy GPL copyright notice in every file, and test inputs and validated outputs.
I developed this using no particular methodology or process. I didn't have a "design spec" or a schedule to meet, and I only had one person doing coding.
I don't particularly believe in the efficacy of process, but I do believe in empirically improving programming. I did these things:
Just for the sake of explicitly enumerating them, I present here a list of software tools I used. Also, I feel that "methodology" gets too much emphasis, which tools get too little.
#!/bin/sh invokes bash under Slackware.For the most part, I did development of cl under Slackware linux, on an x86 CPU. I did test early versions on Solaris 9/SPARC, a big-endian system. I compiled and ran cl as both 32- and 64-bit executables, with identical results. I tested nearly-final versions under Red Hat Enterprise Linux 3.0 running on a little-endian 64-bit AMD Opteron platform. I seriously doubt that any endianess, alignment or pointer size assumptions exist in the code.
I used flex/lex to recognize tokens, and yacc/byacc/bison to parse those tokens into grammatical forms. I suspect that one could write the textbook Combinatory Logic grammar as some kind of recursive descent parser, rather than using yacc/byacc/bison. This would nominally increase the speed of the interpreter, but it would probably qualify as misplaced effort.
Lexing and parsing takes little CPU and runs rapidly compared to graph reduction. Using yacc/byacc/bison allowed easy addition of "interpreter commands" (like timeout, load etc) as they occurred to me. A hand-written, recursive-descent parser would have cost work to add these in any fashion, ad hoc or not.
The tradeoff involves abstraction-induced complexity, again. Using yacc/bison give you the ability to easily change the grammar. It takes away the ability to easily emit error messages that relate to locations in the input. It also makes harder doing anything outside of lexing and parsing one file/standard input. The tradeoff worked favorably for me this time around.
I arrived at the interpreter's memory management design
after discovering that simple malloc()/free()
memory management cost a huge amount of run-time. This discovery
led me to trying an arena or slab-style of allocation.
Using arena allocation, I found I could easily implement explicit
reference counted management of struct node instances,
which further increased speed. Performance concerns pushed me to
more elaborate memory management.
As a side effect, situations which would cause malloc()/free()
style memory management to leak (time out, keyboard interrupts, etc)
only cause problems with built-in checking of free list consistency.
In essence, I created a "CustoMalloc" type allocator by gradual evolution.
The struct node and struct spine_stack
free lists bear a direct analogy to CustoMalloc's "front end allocator".
The arena bears a relationship to CustooMalloc's "back end allocator".
My experience and the measurements above
seem to directly contradict Berger, Zorn and McKinley's
Reconsidering Custom Memory Allocation findings.
cl may indeed consume a lot of extra memory, given
that a single struct node can trigger the allocation
of an entire,
memory-page-sized arena. cl retains arenas throughout
its lifetime, as well.
I did script-driven regression testing after compiling new code. This helps prevent "regressions" from creeping in.
I used 3 different compilers, and 2 different implementations of "yacc". Not only do different compilers give different error messages, using different compilers should prevent compiler dependencies from slipping in. It should also promote standard conformance.
I ran the regression testing for builds with every compiler, ensuring that no compiler dependencies slip in.
I also ran valgrind in memory-leak-checking mode on all test cases used in the script-driven regression testing, and on interactive sessions as well. This uncovers memory leaks outside the arena allocation and free list use.
I used randomly-generated combinatory logic expressions to do "fuzz" testing of graph reduction and bracket-abstraction algorithms.
In general, I think it helps any program to have some built-in "sanity" testing, rather than relying solely on black-box unit tests. In programs that compose large trees of data structures, or have algorithms that recurse mercilessly, I regard built-in testing as a necessity.
Code that traverses parse trees for output checks for "single-ply" looping.
Addresses of child nodes (named by left and right
elements of struct node) should not numerically equate
to the address of the parent application node. The traversal code prints an error
message, and refuses to follow the looping pointers.
This finds simple errors in reduction algorithms, but still fails on
"loops" of more than 1 element in depth.
The code for allocation and deallocation of struct node and
struct spine_stack counts total numbers of allocations and deallocations.
After reducing an expression to normal form, the interpreter can
check to see if the counts of allocations and deallocations match.
This simple and cheap form of built-in-test rapidly uncovers problems with
reference counting and free list implementations.
Allocation and deallocation counts may not match for some nominally "normal" occurances, like syntax errors, and timeout interruptions.
Allocation and deallocation functions for struct node
alert for deallocation of structs with a zero reference count.
The arena-allocation code uses the first few bytes of the arena-allocation
itself as the header, an instance of struct memory_arena.
Instances of struct memory_arena have a "serial number" field.
The implementation of arenas keeps instances of struct memory_arena
in order of increasing serial number.
Freeing the arena's contents (which does not deallocate the arena) checks to see that the arena serial numbers stay in order. This alerts one to the presence of grosser forms of buffer overflows. The same function checks for one-element "loops" in the chain of allocations by checking that the serial numbers increment by one each time it follows a pointer.
The above graph should give some sense of the development time line. It has a 'step" appearance as I have a habit of only checking in file sets that actually work, which means "pass the script-driven regression testing".
The long fallow period from March to December of 2007 represents the time I spent writing this document.
The built-in primitives have single-step reductions that look almost exactly like yacc productions. Why not just write 7 or so yacc productions that mimic reductions and be done with it, letting yacc generate efficient code to do reductions?
As attractive as this seems initially, I don't think you can do it
without a great deal of Yacc (or Bison) jiggery-pokey. The first problem
lies in the implicit "type" of a token given to Yacc. You can certainly
have flex/lex give tokens of type "combinator" back to Yacc. A given
combinator's reduction doesn't care what "type" its arguments have,
but Yacc's reductions definitely do care what type lex gave to a token.
So, you have to have multiple Yacc productions for each (logically simple)
combinator reduction. For example, the I
combinator's single reduction would end up as 8 Yacc productions:
COMB_I COMB_S: COMP_S; COMB_I COMB_K: COMB_K; COMB_I COMB_I: COMP_I; COMB_I COMB_B: COMP_B; COMB_I COMB_C: COMP_C; COMB_I COMB_W: COMP_W; COMB_I COMB_T: COMP_T; COMB_I COMB_M: COMP_M;
The S combinator would end up with
a huge number of Yacc productions (512).
Handling parenthesized subexpressions would cause an even
greater explosion of production.
Another problem involves getting Yacc to recognize normal forms. It may prove impossible to write Yacc productions to do this. I didn't get past the explosion in Yacc productions caused by the typing issue.
Willem van der Poel's article, The Mechanization of Lambda Calculus contains a description of a combination lambda calculus and Combinatory Logic interpreter. van der Poel has substitution for abbreviations occurring at the time of reduction of the abbreviation, not during parsing, as cl does it.
Substitution at reduction time has the advantage of leaving abbreviations in the human-readable, printed form of the expression. It also seems to have the disadvantage of having "free variables" in the abbreviation itself, with all the attendant problems they cause. If an abbreviation contains an identifier unbound at the time of abbreviation, does binding that identifier later cause the reduction time substitution to use the later binding, or leave the identifier free?
struct node instances differently than application struct node
instances. Use a hash table to track combinator nodes by combinator name. Only a single instance
of any given combinator node would exist.
Another implementation might serve you better. I found these other Combinatory Logic interpreters:
Unfortunately, the standard reference volume on Combinatory Logic probably went out of print in the early or mid 60s. A copy of Haskell Curry's Combinatory Logic fetches about $300 now, and it bears a 1958 copyright. Those without access to many a quaint and curious volume of forgotten lore have to make due with mostly on-line resources. Strangely, only two of Haskell Curry's original papers appear on-line, as far as I can tell.
I used the following during development and testing. They represent some fundamentals of programming in the meta-language (C, flex, Yacc) rather than in Combinatory Logic itself.