Using and programming in Lisp (or Scheme) requires a very different way of thinking than does using and programming in Pascal, C, C++, or many other "procedural" languages. The most obvious difference is that Lisp is (customarily) interactive. When working in most procedural languages, you write a program in a text editor, get out of the editor, compile the program, run the program, find some bugs, and go back to the editor to fix them. In Lisp, you type commands one at a time and you see the responses to them immediately; the programming process consists of typing particular commands that have the effect of defining new commands. You can put such definitions into a file with a text editor and then load it as a unit, and you can mix this way of doing things freely with the more usual command-response way. Indeed, the way you load a file is to type a command which has the effect of reading in a specified file just as though you were typing it.
I've used the word "command" in the above paragraph. In fact, a better word would be "expression": Lisp expressions are primarily thought of as having a value (for example, the expression (+ 3 5) has the value 8), and only secondarily as doing something. This is another big difference between Lisp and the procedural languages you've seen before: in Pascal, each statement does something -- it assigns a variable, or reads a number, or calls a procedure -- whereas in Lisp, each expression (like (+ 3 5) has a value, but only some special ones (like (define x 5)) do anything as a side effect.
The name "Lisp" stands for "List Processing". Accordingly, as you might expect, one of Lisp's strong points is the easy handling of variable-length lists (and dynamic data structures in general). Following is a list of some of Lisp's strengths and weaknesses: note that Lisp is capable of doing each of the things in the "weaknesses" column, but that they are not the most natural way of working in Lisp.
| Strengths | Weaknesses |
|---|---|
| Variable-sized lists, trees, etc. | Fixed-sized arrays |
| Symbols and programs | Numbers and characters |
| Specifying program behavior | Specifying program behavior |
| by properties (axiomatically) | by process (procedurally) |
| Recursion | Sequence |
| Parameters and function application | Variables and assignment statements |
| Flexible control constructs | Concise, efficient control constructs |
| Quickly getting it working | Getting it working quickly |
I urge budding Lisp/Scheme programmers to study this list and use the strengths, not the weaknesses! This involves at least two kinds of choice: choose Lisp or Scheme only for tasks that call for what Lisp and Scheme do well, and when you are using Lisp or Scheme, select a programming style appropriate to the language. For example, in typical C and Pascal programs, the single most common kind of statement is the assignment (e.g. "x := y + foo(z);"), while experienced Lisp programmers may write pages of code without ever using an assignment statement.
Numbers, symbols, and strings are collectively referred to as atoms, so called because they cannot be broken down into smaller meaningful pieces. Lists, on the other hand, are the most common kind of complex expressions, i.e. expressions made up of smaller pieces which would have meaning by themselves. One of Lisp's strong points is the ease with which it allows the programmer to manipulate lists; indeed, Lisp's name stands for "list processing". Lists are typed in, and printed out, in a standard form: a pair of parentheses around a sequence of list elements, each separated by at least one space from the previous and next list elements.
Practically, a symbol looks like a variable name: it
consists of a sequence of one or more letters, numbers, and punctuation
marks (most often hyphens and underscores) without intervening space.
Philosophically, a symbol is a data object which has no inherent meaning
(as opposed to a number, which has various built-in behaviours with
respect to arithmetic). Symbols can be used as
the names of variables, or they can be used as data in their own right.
For example, if you were keeping track of birds you had seen, you might
type
(define birdlist '(crow raven pigeon starling sparrow hawk))
This defines a new symbol named "birdlist" as a variable,
whose value is a list containing the symbols named "crow",
"raven", etc.
These symbols are all distinct from one another, but they have no
built-in properties; indeed, the only meaning ascribed to any of them is
that birdlist is a variable name whose current value is
such-and-such a list. (I'll explain the apostrophe before the list of
bird names in a moment.)
These typeless variables present both advantages and disadvantages for the programmer. The obvious advantage is the saving of time figuring out and typing declaration statements. A less obvious advantage is the ability to write a function that works equally well on a variety of different types of data. (For example, in Pascal it is impossible to write a general "matrix multiplication" function that works for all sizes of matrices; if you want to work with 2x2, 3x3, 4x4, and 10x10 matrices, you need to write four versions of the same function. In Lisp, you could easily write a single matrix-multiplication function that looked at its arguments to see what size they were.) On the other hand, Lisp will silently allow you to assign a floating-point number to a variable you thought contained a list of character-strings, thus perhaps creating a hard-to-find bug, while C or Pascal would give you a compiler error message before you had a chance to even run the buggy program.
Each kind of expression is evaluated in a different way:
For example, let's trace what happens when you type in
(+ y 4)
where y is a variable currently holding the value 6.
For another example, let's trace what happens when you type in
(define z (+ y 4))
where y is still a variable holding the value 6.
(list 4 5 6 7) ;Value: (4 5 6 7)Note that the list function takes as many arguments as you choose to give it, and constructs a list with that many elements.
(list 5 y birdlist '(a b c)) ;Value: (5 6 (crow raven pigeon starling sparrow hawk) (a b c))In this example, some of the elements in the list were variables, and they were replaced by their values. list is not a special form: it evaluates its arguments in a perfectly ordinary way before putting them together into a list. Note also that a list can be an element of another list.
To get individual elements from a list, you can use the built-in functions first, second, ... ,tenth:
(first birdlist) ;Value: crow (second birdlist) ;Value: raven (third birdlist) ;Value: pigeonOnly the first ten such functions are defined, and in practice most Scheme programmers only use first, second, and perhaps third. The reason is that a list is not an array, and should not be used for the same purposes as an array: if you really want to get at the i-th element of a list, you're better off with a real array. However, lists make it easy to work with "the rest of" a list: indeed, some Scheme interpreters have a built-in function rest that gives you "all but the first element". Most Scheme programmers, however, use the traditional name "cdr":
(cdr birdlist) ;Value: (raven pigeon starling sparrow hawk)These more traditional Scheme programmers also tend to use "car" for "first":
(car birdlist) ;Value: crowThe names "car" and "cdr" are something of an embarrassment: they refer to two machine instructions on a particular model of IBM computer popular in the late 1950's, standing for "contents of the address register" and "contents of the data register". In practice, you should think of them as meaning "first" and "rest". They can be combined to get the equivalents of "second", "third", etc.
(car (cdr birdlist)) ;Value: raven (car (cdr (cdr birdlist))) ;Value: pigeon (car (cdr (cdr (cdr birdlist)))) ;Value: starlingIndeed, before "first", "second", "third", etc. were invented as more meaningful names, people used these combinations of "car" and "cdr" so much that they started defining single functions for the common combinations: "cadr" stands for "(car (cdr ...", "cddar" stands for "(cdr (cdr (car ..."), and so on:
(car birdlist) ;Value: crow (cadr birdlist) ;Value: raven (caddr birdlist) ;Value: pigeon (cadddr birdlist) ;Value: starling
So "car", "cdr", and their relatives (including "first", "second", etc.) allow you to take a list apart. Scheme also provides functions to put a list together. The simplest one is cons, which takes two arguments (an object and a list) and constructs a new list with the object inserted in front of the list:
(cons 'swan birdlist) ;Value: (swan crow raven pigeon starling sparrow hawk) (cons 4 '(5 6 7)) ;Value: (4 5 6 7Note that cons does not change any of its arguments (indeed, most Scheme functions don't):
birdlist ;Value: (crow raven pigeon starling sparrow hawk) (cons 'swan birdlist) ;Value: (swan crow raven pigeon starling sparrow hawk) birdlist ;Value: (crow raven pigeon starling sparrow hawk)The simplest possible list is called "the empty list" or "the null list".
() ;Value: () '() ;Value: () (cons 'lonely ()) ;Value: (lonely) (cons 'a (cons 'b ())) ;Value: (a b)Note that cons-ing something with the empty list () produces, as you would expect, a list starting with the "something", but not containing any other elements. A list with one element is very different from that element itself!
'lonely ;Value: lonely (list 'lonely) ;Value: (lonely) (cons 'lonely '()) ;Value: (lonely) (cons birdlist birdlist) ;Value: ((crow raven pigeon starling sparrow hawk) crow raven pigeon starling sparrow hawk)The last example illustrates that cons does not care whether its first argument is itself a list; cons treats it as a single object to be inserted into another list. If you want to combine two or more lists, you can use the append function:
(append birdlist birdlist) ;Value: (crow raven pigeon starling sparrow hawk crow raven pigeon starling sparrow hawk) birdlist ;Value: (crow raven pigeon starling sparrow hawk)Note that append, too, does not change its arguments.
Here's a simple example:
(define (add1 x) (+ x 1))) ;Value: add1This defines a new function named "add1" which takes one parameter, "x", and returns the value of the expression "(+ x 1)". After typing this into a Scheme interpreter, we can use "add1" freely, e.g.
(add1 4) ;Value: 5 (define y 6) ;Value: y (add1 y) ;Value: 7and even use it in other functions that you define:
(define (add3 x) (add1 (add1 (add1 x)))) ;Value: add3 (add3 y) ;Value: 9 y ;Value: 6For another example, suppose we want to build a list with three copies of something:
(define (threecopies x) (list x x x)) ;Value: threecopies (threecopies 4) ;Value: (4 4 4) (threecopies birdlist) ;Value: ((crow raven pigeon starling sparrow hawk) (crow raven pigeon starling sparrow hawk) (crow raven pigeon starling sparrow hawk))
((lambda (x) (+ x 1)) 6) ;Value: 7This is of course an unrealistic example, since any reasonable Scheme programmer would say
(+ 1 6)instead. But the function "threecopies" above provides a more realistic example. If you didn't plan to use the operation "make a list with three copies of x" in the future, you might skip defining "threecopies" and just say
((lambda (x) (list x x x)) birdlist) ;Value: ((crow raven pigeon starling sparrow hawk) (crow raven pigeon starling sparrow hawk) (crow raven pigeon starling sparrow hawk))or
((lambda (x) (list x x x)) '(a b c d e f g)) ;Value: ((a b c d e f g) (a b c d e f g) (a b c d e f g))In this case, using an unnamed function would be shorter and less error-prone than writing
(list '(a b c d e f g) '(a b c d e f g) '(a b c d e f g))
So what is this "lambda" thing, anyway? "lambda" is another special form used for building functions. The first argument to "lambda" is a parameter list, and the second argument is an expression which is the body of the function. In fact, although we defined the "add1" function above with the syntax
(define (add1 x) (+ x 1))some Scheme programmers prefer an alternate syntax:
(define add1 (lambda (x) (+ x 1)))This way of stating things is longer and less convenient, but makes clearer what's happening: we define a new symbol named "add1" and give it, as its value, a function which takes an argument which it calls "x" and returns x+1. Thus typing "(add1 5)" is exactly equivalent to typing "((lambda (x) (+ x 1)) 5)", since "add1" has the function "(lambda (x) (+ x 1))" as its value. Similarly, the following two function calls are exactly equivalent:
(threecopies '(a b c d e f g)) ;Value: ((a b c d e f g) (a b c d e f g) (a b c d e f g)) ((lambda (x) (list x x x)) '(a b c d e f g)) ;Value: ((a b c d e f g) (a b c d e f g) (a b c d e f g))
x ;Value: 3 y ;Value: 6 (> y x) ;Value: #t (< y x) ;Value: () (if (> y x) 'raven 'bluebird) ;Value: raven (if (< y x) 'raven 'bluebird) ;Value: bluebirdThe "if" special form can also be used with only two argument: if the first argument is nonempty, it evaluates and returns the second, and if not, it returns no value at all. This version of "if" is normally used when the arguments are to be evaluated for their side effects, e.g.
(if (< y x) (display 'hello)) ;No value (if (> y x) (display 'hello)) hello ;No value
In C and Pascal, it's common to write a sequence of statements like "if... then... else if... then... else if... then... else...". The same can be done in Scheme, but the parentheses start nesting very deeply:
(if (< x y) 'x-less-than-y
(if (= x y) 'x-equals-y
(if (= x (+ y 1)) 'x-one-more-than-y
'x-at-least-two-more-than-y)))
A more convenient, and incidentally more general, way to do this is with
the "cond" special form. "cond" takes one or more
arguments, each of which is a list of expressions. It evaluates the
first expression in the first list: if it's non-null, it then evaluates
the rest of the expressions in the first list and returns the value of
the last one; if it's null, on the other hand, "cond" goes on
to the second list and does the same thing: evaluate the first
expression, see whether it's null, and so on. So the following "cond"
form has the same effect as the above three nested "if"s:
(cond ((< x y) 'x-less-than-y)
((= x y) 'x-equals-y)
((= x (+ y 1)) 'x-one-more-than-y)
(#t 'x-at-least-two-more-than-y))
Recursion, in a programming language, simply means a function calling
itself. Of course, if the function f(x) is computed by calling
f(x) again, the computer will go into an infinite loop, asking
the same question over and over again, so it's important that each
recursive call be on a simpler version of the same question.
Perhaps the most well-known example is the factorial function,
(define factorial (lambda (n)
(if (<= n 0) 1
(* n (factorial (- n 1)))))
That is, the factorial of n is 1 if n <= 0, and
otherwise it can be computed by first computing the factorial of
n-1 and multiplying the result by n. Thus
I usually use the following step-by-step approach to writing recursive functions.
I recommend writing the following functions as exercises in recursive programming. These are not homework assignments, to be turned in for a grade, but I might put a recursion problem on your next exam.
One implication of this is that in Lisp and Scheme, you're not limited to the control constructs the language designer thought would be useful; you can make up your own. In Pascal, for example, the predefined control constructs are
This property is necessary if the language is to be cleanly defined or
written in itself, which is nice because you only have to learn one
language to understand what's going on. For example, perhaps the most
important function in Lisp or Scheme is eval, which
takes a Lisp object representing an expression, evaluates the
expression, and returns the result. (The Scheme version of
eval takes a second "environment" parameter; we'll
discuss this later.) So one could write a Scheme interpreter
(at least, the read-eval-print loop) in a few lines of Scheme:
(define scheme (lambda ()
(print (eval (read) (the-environment)))
(scheme)))
Suppose, for another example, you were trying to program a process with twenty steps, each depending on the previous ones, and you didn't know until run-time how many of them to perform. In Pascal:
case n of 1: step1; 2: begin step1; step2; end; 3: begin step1; step2; step3; end; ... 20: begin step1; step2; ... step20; end; end;or perhaps
if (n >= 1) then
begin
step1;
if (n >= 2) then
begin
step2;
if (n >= 3) then
begin
...
end;
end;
end;
or maybe even
for i := 1 to n do
case i of
1: step1;
2: step2;
...
20: step20;
end;
All of these methods are inelegant, repetitive, and prone to
typographical errors, and none of them really makes clear that what's
going on is "doing the first n steps of a sequence of steps". But
worse, what if "doing the first n steps" turned out to be a common
situation in your program? You'd need to write all this code over again
each time it happened, and knowing that you'd gotten one copy of it
right would give you no assurance that you had typed the next copy
correctly.
In Lisp or Scheme, you could abstract the idea of "doing the first n steps" into a function, do-first-n, write, test, and debug it once, and use it every time you needed to do the first n steps of something. An added advantage is that the steps themselves, as well as n, can be decided at run-time, and simply passed in to do-first-n.
(define do-first-n (lambda (n steplist)
(cond ((null? steplist) ())
((= n 0) ())
(#t (eval (car steplist) (the-environment))
(do-first-n (- n 1) (cdr steplist))))))
Of course, "do-first-n" is not a terribly common control construct in
most programs. But there are many other "cliches" of programming that
have to be written out in full every time they're used in a procedural
language, but can be abstracted in Lisp. Here are some examples. Each
of these happens to have a predefined Scheme function that does it for
you, but you could have written any one of them yourself in a few lines,
like the above recursive definition of do-first-n:
If we're going to be passing code all over the place as function arguments, how do we specify it? Well, most code comes in the form of defined functions, so we can just pass the function name, which Scheme treats as a variable whose value is the function itself:
(define add3 (lambda (x) (+ x 3))) ADD3 (map add3 '(3 5 7 2 9 -5)) (6 8 10 5 12 -2)But in a procedural language, you're not restricted to passing arguments that happen to live in variables already; you can pass literals (like 3 or "hello there") too. In the same way, you can discuss code in a literal form, without bothering to define a new function. Since add3 was defined above as essentially a variable whose value was the function (lambda (x) (+ x 3)), why not just pass the function itself?
(map (lambda (x) (+ x 3)) '(3 5 7 2 9 -5)) (6 8 10 5 12 -2)In fact, perhaps a majority of the uses of these "cliche" functions are with lambda-expressions, rather than named functions, as arguments.
First, let me describe something that doesn't sound like a flow-of-control structure; indeed, it's often used just as a way of declaring local variables. Here's an example:
(let ((x 3) (y 6) (z 'bluebird)) (list x y z z y)) ;Value: (3 6 bluebird bluebird 6) (let ((x '(a b c d e f g))) (list x x x)) ;Value: ((a b c d e f g) (a b c d e f g) (a b c d e f g)) (let ((n 5000)) (* n n n n) ;Value: 625000000000000 n ;Unbound variable: n"let" expects its first argument to be a list of lists: each smaller list starts with a variable name, optionally followed by a value to give that variable. "let" declares all the variables temporarily, gives them the corresponding values (if any), evaluates the remaining arguments, returns the last one, and discards the variables it just declared.
A good way to think about this is to read it as "let x be 3, y be 6, and z be bluebird for the moment. Now evaluate the expression `(list x y z z y)'. OK, you can forget about x, y, and z now."
There are several variants on "let", with names like "let*", "letrec", "fluid-let", and "named let", but I don't want to go into the differences now; you can read about them in info scheme ref.
The "do" special form resembles "let" -- it declares and initializes some local variables, then evaluates some expressions -- but it does all this repeatedly until a specified condition is met. "do" expects its first argument to be a list of lists, each consisting of a variable name, an optional initial value, and an optional "update expression". The second argument to "do" should be a list of expressions analogous to those in "cond": if the first one evaluates to non-null, it evaluates the rest and returns the last one, but otherwise, "do" goes on to evaluate its third, fourth, fifth, etc. arguments, then replaces each local variable with the value of its "update expression" and repeats the whole process. For example:
(do ((x 1 (+ x 1)) (sum 0 (+ sum x)))
((> x 10) sum)
(display x)
(display " "))
1 2 3 4 5 6 7 8 9 10
;Value: 55
This is roughly equivalent to the C code
{ int x,sum;
for (x=1, sum=0; x <= 10; sum += x++)
printf ("%d ",x);
return sum;
}