Articles, Blog

Don Knuth ,1974 ACM Turing Award Recipient – Part 1


My name is Edward Feigenbaum. I’m a professor
at Stanford University, in the Computer Science Department. I’m very pleased to have the opportunity to interview
my colleague and friend from 1968 on, Professor Don Knuth of the Computer Science Department. Don and
I have discussed the question of what viewers and readers of this oral history we think there are. We’re orienting
our questions and comments to several groups of you readers and viewers. First, the generally intelligent
and enlightened science-oriented person who has seen the field of computer science explode in the past half
century and would like to find out what is important, even beautiful, and what some of the field’s
problems have been. Second, the student of today who would like orientation and motivation toward computer
science as a field of scholarly work and application, much as Don and I had to do in the 1950s. And third, those
of you who maybe are not yet born, the history of science scholar of a dozen or 100 years from now,
who will want to know more about Donald Knuth, the scientist and programming artist, who produced a memorable
body of work in the decades just before and after the turn of the millennium. Don and I share several things
in our past in common. Actually, many things. We both went to institutes of technology, I to Carnegie Institute
of Technology, now Carnegie Mellon University, and Don to Case Institute of Technology, now Case Western
Reserve. We both encountered early computers during that college experience. We both went on to take
a first job at a university. And then our next job was at Stanford University, in the new Computer Science
Department, where we both stayed from the earliest days of the department. I’d like to ask Don to describe
his first encounter with a computer. What led him into the world of computing? In my case, it was an
IBM 701, learned from the manual. In Don’s case, it was an IBM Type 650 that had been delivered to Case Institute
of Technology. In fact, Don even dedicated one of his major books to the IBM Type 650 computer. Don, what
is the story of your discovery of computing and your early work with this intriguing new artifact? Okay. Thanks, Ed. I guess I want to add that
Ed and I are doing a team thing here; next week, I’ll be asking Ed the questions that he’s asking me today.
We thought that it might be more fun for both of us and, also for people who are listening or watching or reading
this material, to see the symmetrical approach instead of having a historian interviewing us. We’re
colleagues, although we work in different fields. We can give you some slants on the thing from people who sort
of both have been there. We’re going to be covering many years of the story today, so we can’t do too much
in-depth. But we also want to do a few things in depth, because the defining thing about computer science
is that computer science plunges into things at low levels as well as sticking on high level. Since we’re going
to cover so many topics, I’m sure that I won’t sleep tonight because I’ll be saying to myself, “Oh,
I should’ve said such and such when he asked me that question”. So I think Ed and I also are going to maybe add
another little thing to this oral interview, where we might want to add a page or two of afterthoughts that
come to us later, because then I don’t have to be so careful about answering every question that he asks me now.
The interesting thing will be not only the wrong answer that pops first in my mind, but also maybe a slightly
corrected thing. One of the stories of my life, as you’ll probably find out, is I try to get things
correct. I probably obsess about not making too many mistakes. Okay. Now, your question, Ed, was how did
I get into the computing business. When the computers were first built in the ‘40s I was ten years old, so
I certainly was not a pioneer in that sense. I saw my first computer in 1957, which is pretty late in
the history game as far as computers are concerned. On the other hand, programming was still pretty much a
new thing. There were, I don’t know, maybe 2,000 programmers in the world at that time. I’m not sure how to
figure it, but it was still fairly early from that point of view. I was a freshman in college. Your question was:
how did I get to be a physics student there in college. I grew up in Milwaukee, Wisconsin. Those of you who
want to do the math can figure out, I was born in 1938. My father was the first person among all his
ancestors who had gone to college. My mother was the first person in all of her ancestors who had gone to a
year of school to learn how to be a typist. There was no tradition in our family of higher education at all.
I think [that’s] typical of America at the time. My great- grandfather was a blacksmith. My grandfather
was a janitor, let’s say. The people were pretty smart. They could play cards well, but they didn’t have
an academic background. I don’t want to dwell on this too much, because I find that there’s lots of discussion
on the internet about the early part of my life. There’s a book called Mathematical People, in which
people asked me these questions at length — how I got started. The one thing that stands out most, probably,
is when I was an eighth grader there was a contest run by a local TV station, a company called Zeigler’s Giant
Bar. They said, “How many words can you make out of the letters in ‘Zeigler’s Giant Bar’?” Well, there’s
a lot of letters there. I was kind of intrigued by this question, and I had just seen an unabridged dictionary.
So I spent two weeks going through the unabridged dictionary, finding every word in that dictionary that
could be made out of the letters in “Zeigler’s Giant Bar”. I pretended that I had a stomachache, so I stayed
home from school those two weeks. The bottom line is that I found 4500 words that could be made, and the
judges had only found 2500. I won the contest, and I won Zeigler’s Giant Bars for everybody in my
class, and also got to be on television and everything. This was the first indication that I would obsess about
problems that would take a little bit of – – what do you call it? — long attention span to solve. But my main
interest in those days was music. I almost became a music major when I went to college. Our high school was
not very strong in science, but I had a wonderful chemistry and physics teacher who inspired me. When I got
the chance to go to Case, looking back, it seems that the thing that really turned it was that Case was a
challenge. It was supposed to have a lot of meat. It wasn’t going to be easy. At the college where I had been
admitted to be a music major, the people, when I visited there, sort of emphasized how easy it was going to
be there. Instead of coasting, I think I was intrigued by the idea that Case was going to make me work hard.
I was scared that I was going to flunk out, but still I was ready to work. I worked especially hard as
a freshman, and then I coasted pretty much after that. In my freshman year, I started out and I found out
that my chemistry teacher knew a lot of chemistry,
but he didn’t know physics or mathematics. My physics teacher
knew physics and chemistry, but he didn’t know much about mathematics. But my math teacher knew all
three things. I was very impressed by my math teacher. Then in my sophomore year in physics, I had to take a
required class of welding. I just couldn’t do welding, so I decided maybe I can’t be a physicist. Welding
was so scary. I’ve got these thousands of volts in this stuff that I’m carrying. I have to wear goggles.
I can’t have my glasses on, I can’t see what I’m doing, and I’m too tall. The table is way down there. I’m
supposed to be having these scary electrons shooting all over the place and still connect X to Y. It was terrible.
I was a miserable failure at welding. On the other hand, mathematics! In the sophomore year for mathematicians,
they give you courses that are what we now call discrete mathematics, where you study logic
and things that are integers instead of continuous quantities. I was drawn to that. That was something, somehow,
that had great appeal to me. Meanwhile, in order to support myself, I had to work for the statisticians
at Case. First this meant drawing graphs and sorting cards. We had a fascinating machine where you put in
punch cards and they fall into different piles, and you can look at what comes out. Then I could plot the numbers
on a graph, and get some salary from this. Later on in my freshman year there arrived a machine that,
at first, I could only see through the glass window. They called it a computer. I think it was actually called
the IBM 650 “Univac”. That was a funny name, because Univac was a competing brand. One night a guy showed
me how it worked, and gave me a chance to look at the manual. It was love at first sight. I could sit all night
with that machine and play with it. Actually, to be exact, the first programs I wrote for the machine were
not in machine language but in a system called The Bell Interpreter System. It was something like
this. You have an instruction, and the instruction would say, “Add the number in cell 2 to the number in cell
15 and put the result in cell 30.” We had instructions like that, a bunch of them. This was a simple way to
learn programming. In fact, I still believe that it might be the best way to teach people programming, instead
of teaching them what we call high-level language right now. Certainly, it’s something that you could
easily teach to a fourth or fifth grader who hasn’t had algebra yet, and get the idea of what a machine is. I was
pledging a fraternity, and one of my fraternity brothers didn’t want to do his homework assignment where he
was supposed to find the roots of a fifth-degree equation. I looked at some textbooks, and it told me how
to solve a fifth-degree equation. I programmed it in this Bell Interpretive Language. I wrote the program.
My memory is that it worked correctly the first time. I don’t know if it really gave the right answers,
but miraculously it ground out these numbers. My fraternity brother passed his course, I got into the
fraternity, and that was my first little program. Then I learned about the machine language inside the 650.
I wrote my first program for the 650 probably in the spring of my freshman year, and debugged it at night. The
first time I wrote the program, it was about 60 instructions long in machine language. It was a program
to find the prime factors of a number. The 650 was a machine that had decimal arithmetic with ten-digit numbers.
You could dial the numbers on the console of the machine. So you would dial a ten- digit number, and my
program would go into action. It would punch out cards that would say what are the factors of this number that
you dialed in there. The computer was a very slow computer. In order to do a division instruction, it took
five milliseconds. I don’t know, is that six orders of magnitude slower than today’s machines, to do division?
Of course, the way I did factoring was by division. To see if a number was divisible by 13, I had to divide
by 13. I divided by 15 as well, 17, 19. It would try to find everything that divided. If I started out
with a big ten-digit number that happened to be prime — had no dividers at all — I think it would take 15
or 20 minutes for my program to decide. Not only did my program have about 60 or so instructions when I started,
they were almost all wrong. When I finished, it was about 120, 130 instructions. I made more errors
in this program than there were lines of code! One of the things that I had to change, for example, that took
a lot of work, was I had originally thought I could get all the prime factors onto one card. But a card had
80 columns, and each number took ten digits. So I could only get eight factors on a single card. Well, you
take a number like 2 to the 32nd power, that’s going to take four cards. Because it’s two times two times
two times two [and so on]. I had to put in lots of extra stuff in my program that would handle these cases. So
my first program taught me a lot about the errors that I was going to be making in the future, and also about
how to find errors. That’s sort of the story of my life, is making errors and trying to recover from them. Did
I answer your question yet? No. No. Don, a couple questions about your early career,
before Case and at Case. It’s very interesting that you mentioned the Zeigler’s Giant Bar, because
it points to a really early interest in combinatorics. Your intuition at combinatorics is one of the things
that impresses so many of us. Why combinatorics, and how did you get to that? Do you see combinatorics
in your head in a different way than the rest of us do? I think that there is something strange about
my head. It’s clear that I have much better intuition about discrete things than continuous things. In
physics, for example, I could pass the exams and I could do the problems in quantum mechanics, but I couldn’t
intuit what I was doing. I didn’t feel right being able to get an “A” on an exam without ever having
the idea of how I would’ve thought of the questions that the person made up solving the exam. But on the other
hand, in my discrete math class, these were things that really seemed part of me. There’s definitely something
in how I had developed by the time I was a teenager that made me understand discrete objects, like zeros
and ones of course, or things that are made out of alphabetical letters, much better than things like Fourier
transforms or waves — radio waves, things like this. I can do these other things, but it’s like a dog
standing on his hind legs. “Oh, look, the dog can walk.” But no, he’s not walking; he’s just a dog trying to walk.
That’s the way it is for me in a lot of the continuous or more geometrical things. But when it comes to marks
on papers, or integer numbers like finding the prime factors of something, that’s a question that appealed
to me more than even finding the roots of polynomial. Don, question about that. Sorry to interject
this question, behaving like a cognitive psychologist. This is what you’re paid to do, right? Right. Herb Simon — Professor Simon, of Carnegie
Mellon University — once did a set of experiments that kind of separated thinkers into what he called
“visualizers” and “symbolizers”. When you do the combinatorics and discrete math that you do, which so amazes
us guys who can’t do it that well, are you actually visualizing what’s going on, or is it just
pure symbol manipulation? Well, you know, I’m visualizing the symbols.
To me, the symbols are reality, in a way. I take a mathematical problem, I translate it into formulas, and
then the formulas are the reality. I know how to transform one formula into another. That should be the subtitle
of my book Concrete Mathematics: How to Manipulate Formulas. I’d like to talk about that a
little. It started out… My cousin, Earl, who died, Earl Goldschlager [ph?], he was a engineer, eventually went
to Southern California, but I knew him in Cleveland area. When I was in second grade he went to Case. He was
one of the people who sort of influenced me that it may be good to go to Case. When I was visiting him in
the summer, he told me a little bit about algebra. He said, “If you have two numbers, and you know that the sum
of these numbers is 100 and the difference of these numbers is 20, what are the two numbers?” He said,
“You know how you can solve this, Don? You can say X is one of the numbers and Y is one of the numbers. X plus
Y is 100. X minus Y is 20. And how do you find those numbers?” he says. “Well, you add these two equations
together, and you get 2X=120. And you subtract the equation from each other, and you get 2Y=80. So X must
be 60, and Y must be 40. Okay?” Wow! This was an “aha!” thing for me when I was in second grade. I liked
symbols in this form. The main thing that I enjoyed doing, in seventh grade, was diagramming sentences.
NPR had a show; a woman published a book about diagramming sentences, “The Lost Art of Diagramming
Sentences”, during the last year. This is where you take a sentence of English and you find its structure. It
says, “It’s a noun phrase followed by a verb phrase.” Let’s take a sentence here, “How did you get to be a
physics student?” Okay. It’s not a noun phrase followed by a verb phrase; this is an imperative sentence. It
starts with a verb. “How did you get…” It’s very interesting, the structure of that sentence. We had a textbook
that showed how to diagram simple English sentences. The kids in our class, we would then try to apply this
to sentences that weren’t in the book, sentences that we would see in newspapers or in advertisements. We
looked at the hymns in the hymnal, and we couldn’t figure out how to diagram those. We spent hours and hours
trying to figure this out. But we thought about structure of language, and trying to make these discrete
tree structures out of English sentences, in seventh grade. My friends and I, this turned us on. When we
got to high school, we breezed through all our English classes because we knew more than the teachers did.
They had never studied this diagramming. So I had this kind of interest in symbols and diagramming early
on — discrete things, early on. When I got into logic as a sophomore, and saw that mathematics involved
a lot of symbol manipulation, then that took me there. I see punched cards in this. I mean, holes in cards
are nice and discrete. The way a computer works is totally discrete. A computer has to stand on its hind
legs trying to do continuous mathematics. I have a feeling that a lot of the brightest students don’t go
into mathematics because — curious thing — they don’t need algebra at the level I did. I don’t think I was
smarter than the other people in my class, but I learned algebra first. A lot of very bright students today
don’t see any need for algebra. They see a problem, say, the sum of two numbers is 100 and the difference is
20, they just sort of say, “Oh, 60 and 40.” They’re so smart they don’t need algebra. They go on seeing lots
of problems and they can just do them, without knowing how they do it, particularly. Then finally they get to
a harder problem, where the only way to solve it is with algebra. But by that time, they haven’t learned the
fundamental ideas of algebra. The fact that they were so smart prevented them from learning this important
crutch that I think turned out to be important for the way I approach a problem. Then they say, “Oh,
I can’t do math.” They do very well as biologists, doctors and lawyers. You’re recounting your interest in the structure
of languages very early. Seventh grade, I think you said. That’s really interesting. Because among
the people — well, the word “computer science” wasn’t used, but we would now call it “information technology”
people — your early reputation was in programming languages and compilers. Were the seeds of that planted
at Case? Tell us about that early work. I mean, that’s how I got to know you first. Yeah, the seeds were planted at Case in the
following way. First I learned about the 650. Then, I’m not sure when it was but it probably was in the summer
of my freshman year, where we got a program from Carnegie — where you were a student — that was written
by Alan Perlis and three other people. “IT”. The IT program, “IT”, standing for Internal
Translator. Yeah, it was Perlis, [Harold] van Zoeren,
and Joe Smith. In this program you would punch on cards a
algebraic formula. You would say, “A=B + C.” Well, in IT, you had to say, “X1=X2 + X4.” Because you
didn’t have a plus sign, you had to say, “A” for the plus sign. So you had to say, “X1 Z X2 A X4.” No, “S,”
I guess, was plus, and “A” was for absolute value. But anyway, we had to encode algebra in terms of a small
character set, a few letters. There weren’t that many characters you could punch on a card. You punch this
thing on a card, and you feed the card into the machine. The lights spin around for a few seconds and then — punch,
punch, punch, punch — out come machine language instructions that set X1 equal to X2 + X4.
Automatic programming coming out of an algebraic formula. Well, this blew my mind. I couldn’t understand
how this was possible, to do this miracle where I had just these punches on the card. I could understand how
to write a program to factor numbers, but I couldn’t understand how to write a program that would convert
algebra into machine instructions. It hadn’t yet occurred to you that the computer
was a general symbol-manipulating device. No. No, that occurred to Lady Lovelace, but
it didn’t occur to me. I’m slow to pick up on these things, but then I persevere. So I got hold of the source
code for IT. It couldn’t be too long, because the
650 had only 2,000 words of memory, and some of those words of memory had to be used to
hold the data as well as the instructions. It’s probably, I don’t
know, 1,000 lines of code. The source code is not hard to find. They published it in a report and I’ve seen
it in several libraries. I’m pretty sure it’s on the internet long ago. I went through every line of that
program. During the summer we have a family get- together on a beach on Lake Erie. I would spend the time
playing cards and playing tennis. But most of the time I was going through this listing, trying to find out the
miracle of how IT worked. Ok, it wasn’t impossible after all. In fact, I thought of better ways to do it than
were in that program. Since we’re in a history museum, we should also mention that the program had originally
been developed when Perlis was at Purdue, before he went to Carnegie, with three other people there. I
think maybe Smith and van Zoeren came with Alan to Carnegie. But there was Sylvia Orgel and several other people
at Purdue who had worked on a similar project, for a different computer at Purdue. Purdue also
produced another compiler, a different one. It’s not as well-known as IT. But anyway, I didn’t know this at
the time, either. The code, once I saw how it happened, was
inspiring to me. Also, the discipline of reading other people’s program was something good to learn early.
Through my life I’ve had a love of reading source materials –
– reading something that pioneers had written and trying to understand what their thought
processes were in order to write this out. Especially when they’re
solving a problem I don’t know how to solve, because this is the best way for me to put into my own brain
how to get past stumbling blocks. At Case I also remember looking at papers that Fermat had written
in Latin in the 17th century, in order to understand
how that great number theorist approached problems. I have to rely on friends to help
me get through Sanskrit manuscripts and things now, but I
still…. Just last month, I found, to my great surprise, that the concept of orthogonal Latin squares, which
we’ll probably talk about briefly later on, originated in North Africa in the 13th century. Or was it the
14th century? I was looking at some historical documents and I
came across accounts of this Arabic thing. By reading it in French translation I was
able to see that the guy really had this concept, orthogonal Latin
squares, that early. The previous earliest known example was 1724. I love to look at the work of pioneers and
try to get into their minds and see what’s happening. One of the things worth observing — it’s
off the track but as long as we’re talking about history — is that our current generation, and generations of
students, don’t even know the history of their own field. They’re constantly reinventing things, or thoughtlessly
disregarding things. We’re not just talking about history going back in time hundreds of years. We’re
talking about history going back a dozen years, or two-dozen years. Yeah, I know. It’s such a common failing.
I would say that’s my major disappointment with my teaching career. I was not able to get this across to any of
my students this love for that kind of scholarship, reading source material. I was a complete failure
at passing this on to the people that I worked with the most closely. I don’t know what I should’ve
done. When I came to Stanford from Caltech, I had been researching Pascal. I couldn’t find much about Pascal’s
work in the Caltech library. At Stanford, I found two shelves devoted to it. I was really impressed by that.
Then I came to the Stanford engineering library, and everything was in storage if it was more than
five years old. It was a basket case at that time, in the 60’s. I’ve got to restrain myself from not telling
too much about the early compiler. But anyway, after IT, I have to mention that I had a job by this time
at the Case Computing Center. I wasn’t just growing grass for statisticians anymore. Case was one of the
very few institutions in the country with a really
enlightened attitude that undergraduate students were allowed to touch the computers by themselves,
and also write software for the whole campus. Dartmouth
was another place. There was a guy named Fred Way who set the policy at Case, and instead of going the way
most places go, which would hire professionals to run their computer center, Case hired its own students
to play with the machines and to do the stuff everybody was doing. There were about a dozen of us there,
and we turned out to be fairly good contributors to the computing industry in the long range of things.
I told all of my friends how this IT compiler worked, and we got together and made our own greatly improved
version the following year. It was called RUNCIBLE. Every program in those days had to have an acronym
and this was the Revised Unified New Compiler Basic Language Extended, or something like this. We found
a reason for the name. But we added a million bells and whistles to IT, basically. All on the 2000 word drum. All on the 2000 word drum. Not only that,
but we had four versions of our compiler. One of them would compile to assembly language. One would compile
directly into machine language. One version would use floating point hardware. And one version would
use floating point attachment. If you changed 613 instructions, you would go from the floating
point attachment to the floating point hardware version. If you changed another 372 instructions, it would
change from the assembly language version to the machine language version. If we could figure out a
way to save a line of code in the 373 instructions in one version, then we had to figure out a way to correspondingly
save another line of code in the other version. Then we could have another instruction available to
put in a new feature. So RUNCIBLE went through the stages of software development that have since become
familiar, where there is what they call “creeping featurism”, where every
user you see wants a new thing to be added to the software. Then you put that in and pretty soon the thing gets…
you have a harder and harder user manual. That is the way software always has been. We got our experience
of this. It was a group of us developing this; about, I don’t know, eight of us worked together on different
parts of it. But my friend, Bill Lynch, and I
did most of the parts that were the compiler itself. Other people were working on the subroutines
that would support the library, and things like that.
Since I mentioned Bill Lynch, I should also, I guess… I wrote a paper about the way RUNCIBLE worked inside,
and it was published in the Communications of the ACM during my senior year, because we had seen other articles
in this journal that described methods that were not as good as the ones that were in our compiler. So
we thought, okay, let’s put it to work. But I had no idea what scientific publishing was all about. I had
only experienced magazines before, and magazines don’t give credit for things, they just tell the news. So I
wrote this article and it explained how we did it in our compiler. But
I didn’t mention Bill Lynch’s name or anything in the article. I found out to my great surprise afterwards
that I was getting credit for having invented these things, when actually it was a complete team effort. Mostly
other people, in fact. I had just caught a few bugs and done a lot of things, but nothing really very original.
I had to learn about scholarship, about scientific publishing and things as part of this story.
So we got this experience with users, and I also wrote the user manual for this machine. I am an undergraduate.
Case allows me to write the user manual for RUNCIBLE, and it is used as a textbook in classes. Here I’ve
got a class that I am taking; I can take a class and I wrote the textbook for it already as an undergraduate.
This meant that I had an unusual visibility on campus, I guess. The
truth is that Case was a really great college for undergraduates, and it had superb teachers.
But it did not have very strong standards for graduate
studies. It was very difficult to get admitted to the undergraduate program
at Case, and a lot of people would flunk out. But in graduate school it wasn’t so hard to get over. I
noticed this, and I started taking graduate courses, because there was no competition. This impressed my teachers
— “Oh, Knuth is taking graduate courses” — not realizing that this was line of least resistance so
that I could do other things like write compilers as a student. I edited a magazine and things like that, and
played in the band, and did lots of activity. Now [to] the story, however: What about compilers? Well, I got
a job at the end of my senior year to write a compiler for Burroughs, who wanted to sell their drum machine
to people who had IBM 650s. Burroughs had this computer called the
205, which was a drum machine that had 4000 words of memory instead of 2000, and they needed a compiler
for it. ALGOL was a new language at the time. Somebody heard that I knew something about how to write compilers,
and Thompson Ramo Wooldridge [later TRW Inc.] had a consulting branch in Cleveland. They approached me early
in my senior year and said, “Don, we want to make a proposal to Burroughs Corporation that we will write
them an ALGOL compiler. Would you write it for us if we got the contract?” I believe what happened is that
they made a proposal to Burroughs that for $75,000 they would write a ALGOL compiler, and they would pay
me $5,000 for it, something like this. Burroughs turned it down. But meanwhile I had learned about the 205
machine language, and it was kind of appealing to me. So I made my own proposal to Burroughs. I said I’ll write
you an ALGOL compiler for $5,000, but I can’t implement all of ALGOL. I think I told them I can’t implement
all of ALGOL for this; I am just one guy. Let’s leave out procedures — subroutines. Well, this is a
big hole in the language! Burroughs said, “No, no — you got to put in procedures.” I said, “Okay, I will
put in procedures, but you got to pay me $5,500.” That’s what happened. They paid me $5,500, which was a
fairly good salary in those days. I think a college professor was making eight or nine thousand dollars a year
in those days. So between graduating from Case and going to
Cal Tech, I worked on this compiler. As I drove
out to California, I drove a 100 miles a day and I sat in a motel and wrote code. The coding form on which I
wrote this code, I now donated it to the Computer History Museum, and you can see exactly the code that I wrote.
I debugged it, and it was Christmas time [when] I had the compiler ready for Burroughs to use. So I
was interested; I had two compilers that I knew all the code by the end of the ‘60s. Then I learned about other
projects. When I was in graduate school some people came to me and said, “Don, how about writing software
full time? Quit graduate school. Just name your price. Write compilers for a living, and you will have
a pretty good living.” That was my second year of graduate school. In what department at Cal Tech? I was at Cal Tech in the math department.
There was no such thing as a computer science department anywhere. Right. But you didn’t do physics. I didn‘t do physics. I switched into math
after my sophomore year at Case, after flunking welding. I switched into math. There were only seven
of us math majors at Case. I went to Cal Tech, and that’s another story we’ll get into soon. I’m in my second
year at Cal Tech, and I was a consultant to Burroughs. After finishing my compiler for Burroughs, I joined
the Product Planning Department. The Product Planning Department was largely composed of people
who had written the best software ever done in the world up to that time, which was a Burroughs ALGOL compiler
for the 220 computer. That was a great leap forward for software. It was the first software that used list processing
and high level data structures in an intelligent way. They took the ideas of Newell and Simon and
applied them to compilers. It ran circles around all the other things that we were doing. I wanted to get
to know these people, and they were by this time in the Product Planning Group, because Burroughs was doing
its very innovative machines that are the opposite of RISC. They tried to make the machine language look like
algebraic language. This group I joined at Burroughs as a consultant. So I had a programming hat when
I was outside of Cal Tech, and at Cal Tech I am a mathematician taking my grad studies. A startup company,
called Green Tree Corporation because green is the color of money, came to me and said, “Don, name your price.
Write compilers for us and we will take care of finding computers for you to debug them on, and assistance for
you to do your work. Name your price.” I said, “Oh, okay.
$100,000.”, assuming that this was… In that era this was not quite at Bill Gate’s
level today, but it was sort of out there. The guy didn’t blink.
He said, “Okay.” I didn’t really blink either. I said, “Well, I’m not going to do it. I just thought this was
an impossible number.” At that point I made the decision in my life that I wasn’t going to optimize my
income; I was really going to do what I thought I could do for… well, I don’t know. If you ask me what makes me
most happy, number one would be somebody saying “I learned something from you”. Number two would be
somebody saying “I used your software”. But number infinity would be… Well, no. Number infinity minus one
would be “I bought your book”. It’s not as good as “I read your book”, you know. Then there is “I bought
your software”; that was not in my own personal value. So that decision came up. I kept up with the literature
about compilers. The Communications of the ACM was where the action was. I also worked with people on trying
to debug the ALGOL language, which had problems with it. I published a few papers, like ”The Remaining
Trouble Spots in ALGOL 60” was one of the papers that I worked on. I chaired a committee called “Smallgol”
which was to find a subset of ALGOL that would work on small computers. I was active in programming languages. Was McCarthy on Smallgol? DK: No. No, I don’t
think he was. EF: Or Klaus Wirth? No. There was a big European group, but this
was mostly Americans. Gosh, I can’t remember. We had about 20 people as co-authors of the paper. It was
Smallgol 61? I don’t know. It was so long ago I can’t remember. But all the authors are there. You were still a graduate student. I was a graduate student, yeah. But this was
my computing life. What did your thesis advisors think of all
this? Oh, at Case they thought it was terrible that
I even touched computers. The math professor said, “Don’t dirty your hands with that.” You mean Cal Tech. No, first at Case. Cal Tech was one of the
few graduate schools that did not have that opinion, that I shouldn’t touch computers. I went to Cal
Tech because they had this [strength] in combinatorics. Their computing system was incredibly arcane, and
it was terrible. I couldn’t run any programs at Cal Tech. I mean, I would have to use punched paper tape. They
didn’t even have punch cards, and their computing system was horrible unless you went to JPL, Jet Propulsion
Laboratory, which was quite a bit off campus. There you would have to submit a job and then come back a
day later. You couldn’t touch the machines or anything. It was just hopeless. At Burroughs I could go into what
they called the fishbowl, which was the demonstration computer room, and I could run hands-on every night,
and get work done. There was a program that I had debugged one night at Burroughs that was solving a problem
that Marshall Hall, my thesis advisor, was interested in. It took more memory than the Burroughs machine
had, so I had to run it at JPL. Well, eight months later I had gotten the output from JPL and I had also
accumulated the listings that were 10 feet high in my office, because it’s a one- or two-day turnaround
time and then they give you a memory dump at the end of the run. Then you can say, “Oh, I’ll change this
and I’ll try another thing tomorrow.” It was incredibly inefficient, brain damaged computing at Cal Tech in the
early ‘60s. But I kept track with the programming languages community and I became editor of the programming
languages section of the Communications of the ACM and the Journal of the ACM in, I don’t know, ’64,
’65, something like that. I was not a graduate student, but I was just out of graduate school in the ‘60s.
That was definitely the part of computing that I did by far the most in, in
those days. Computing was divided into three categories. By the time I came to Stanford,
you were either a numerical analyst, or artificial intelligence,
or programming language person. We had three qualifying exams and there was a tripartite division of the
field. Don, just before we leave your thesis advisor:
your thesis itself was in mathematics, not in computing, right? Yes. Tell us a little bit about that and what your
thesis advisor’s influence on your work was at the time. Yeah, because this is combinatorial, and it’s
definitely an important part of the story. Combinatorics was not a academic subject at Case. Cal Tech
was one of the few places that had it as a graduate course, and there were textbooks that began to be written.
I believe at Stanford, for example, George Danzig introduced the first class in combinatorics probably
about 1970. It was something that was low on the totem pole in the mathematics world in those days. The high
on the totem pole was the Bourbaki school from France, of highly abstract mathematics that was involved with
higher orders of infinities and things. I had colleagues at Cal Tech that I would say, “You and I intersect
at countable infinity, because I never think of anything that is more than countable infinity, and you never
think of anything that is less than countable infinity.” I mostly stuck to things that were finite in my own
work. At Case, when I’m a senior, we had a visiting professor, R. C. Bose from North Carolina, who was a very
inspiring lecturer. He was an extremely charismatic guy, and he had just solved a problem that became front
page news in the New York Times. It was to find orthogonal Latin squares. Now, today there is a craze called
Sudoku, but I imagine by the time people are watching this tape or listening to this tape that craze will
have faded away. An N-by-N Latin square is an arrangement of N letters so ever row and every column has all
N of the letters. An orthogonal Latin square is where you have two Latin squareswith the property that if
you put them next to each other, so you have a symbol from the first and a symbol from the second, the N
squared cells you get have all N squared possibilities. All combinations of A will occur with A somewhere.
A will occur with B somewhere. Z will occur with Z somewhere. A famous paper, from 1783, I think, by Leonard
Euler had conjectured that it was impossible to find orthogonal Latin squares that were 10 by 10,
or 14 by 14, or 18 by 18, or 6 by 6 — all the cases that were twice an odd number. This conjecture was believed
for 170 years, and even had been proved three times, but people found holes in the proof. In 1959 R.
C. Bose and two other people found that it was wrong, and they constructed Latin squares that were 10 by
10 and 14 by 14. They showed that all those cases where actually it was possible to find orthogonal Latin squares.
I met Bose. I was taking a class from him. It was a graduate class, and I was taking graduate classes.
He asked me if I could find some 12 by 12 orthogonal Latin squares. It sounded like an interesting program,
so I wrote it up and I presented him with the answer the next morning. He was happy
and impressed, and we found five mutually orthogonal Latin squares of the order of 12. That became a
paper. Some interesting stories about that, that I won’t go into it. The main thing is that he was on the cutting
edge on this research. I was at an undergraduate place where we had great teaching, but we did not have
cutting edge researchers. He could recommend me to graduate school, and he could also tell me Marshall
Hall is very good at combinatorics. He gives me a good plug for going to Cal Tech. I had visited California
with my parents on summer vacations, and so when I applied to graduate school I applied to Stanford, Berkeley
and Cal Tech, and no other places. When I got admitted to Cal Tech, I got admitted to all three. I took
Cal Tech because I knew that they had a good combinatorial attitude there, which was not really true at Stanford.
In fact, [at] Stanford I wouldn’t have been able to study Latin squares at all. While we’re at it, I might
as well mention that I got fellowships. I got a National Science Foundation Fellowship, Woodrow Wilson Foundation
Fellowship, to come to these place, but they all had the requirement that you could not do anything
except study as a graduate student. I couldn’t be a consultant to Burroughs and also have an NSF fellowship.
So I turned down the fellowships. Marshall Hall was then my thesis advisor. He was a world class mathematician,
and had done, for a long time, pioneering work in combinatorics. He was my mentor. But it was a funny thing,
because I was such in awe of him that when I was in the same room with him I could not think straight. I wouldn’t
remember my name. I would write down what he was saying, and then I would go back to my office so that
I could figure it out. We couldn’t do joint research together in the same room.
We could do it back and forth. It was almost like farming my programs out to JPL to be
run. But we did collaborate on a few things. The one thing
that we did the most on actually never got published, however, because it turned out that it just didn’t
lead to the solution. He thought he had a way to solve
the Burnside problem in group theory, but it didn’t pan out. After we did all the
computation I learned a lot in the process, but none of these programs
have ever appeared in print or anything. It taught me how to deal with tree structures inside a machine, and
I used the techniques in other things over the years. He also was an extremely good advisor, in better ways
than I was with my students. He would seem to keep track of me to make sure I was not slipping. When I was working
with my own graduate students, I was pretty much in a mode where they would bug me instead of me bugging
them. But he would actually write me notes and say, Don, why don’t you
do such and such? Now, I chose a thesis topic which was to find a certain kind of what they call block designs.
I will just say: symmetric block designs with parameter Lambda equals 2. Anybody could look that up and find
out what that means. I don’t want to explain it now. At the time I did this, I believe there were six
known designs of this form altogether. I had found a new way to look at those designs, and so I thought maybe
I’ll be able to find infinitely many more such designs. They would be mostly academic interest, although
statisticians would justify that they could use them somehow. But mostly, just, do they exist or not? This was
the question. Purely intellectual curiosity. That was going to be my thesis topic: to see if I could find
lots of these elusive combinatorial patterns. But one morning I was looking at another problem entirely, having
to do with finite projective geometry, and I got a listing from a guy at Princeton who had just computed
32 solutions to a problem that I had been looking at with respect to
a homework problem in my combinatorics class. He had found that there are 32 solutions of Type A, and 32 solutions
of Type B, to this particular problem. I said, hmm, that’s interesting, because the 32 solutions of Type
A, one of those was a well known construction. The 32 of Type B, nobody had ever found any Type B solutions
before for the next higher up case. I remember I had just gotten this listing from Princeton, and I
was riding up on the elevator with Olga Todd, one of our professors, and I said, “Mrs. Todd, I think
I’m going to have a theorem in an hour. I’m going to look at these two lists of 32 numbers. For every one
on this page I am going to find a corresponding one on this page. I am going to psyche out the rule that
explains why there happen to be 32 of each kind.” Sure enough, an hour later I had seen how to get from each
solution on the first page to the solution on the second page. I showed this to Marshall Hall. He said, “Don,
that’s your thesis. Don’t worry on this Lambda equals 2 business. Write this up and get out of here.”
So that became my thesis. And it is a good thing, because since then only one more design with Lambda equals
2 has been discovered in the history of the world. I might still be working on my thesis if I had stuck to
that problem. I felt a little guilty that I had solved my PhD problem in one hour, so I dressed it up with
a few other chapters of stuff. The whole thesis is 70 some pages long. I discovered that it is now on the internet,
probably for peoples’ curiosity, I suppose: what did he write about in those days? But of all the
areas of mathematics that I’ve applied to computer science, I would say the only area that I have never applied
to computer science is the one that I did my thesis in. It just was good training for me to exercise my brain
cells. Yeah. In fact for your colleagues, that is
kind of a black hole in their knowledge of you and understanding of you, is that thesis. The thesis, yeah. Well, I was going to say
the reason that it is not used anymore is because these designs turn out… Okay, we can construct
them with all this pain and careful, deep analysis. But
it turned out later on that if we just work at random, we get even better results. So
it was kind of pointless from the point of view of that application,
except for certain codes and things like that. Don, just a footnote to that story. I intended
this would come up later in the interview, but it’s just so great a point to bring it in. When I’ve
been advising graduate students, I tell them that the really hard part of the thesis is finding the right problem.
That’s at least half the problem. Yeah. And then the other half is just doing it.
And that’s the easy part of it. So I am not impressed by this one hour. I mean, the hard part went into
finding the problem, not in the solving of it. We will get to, of course, the great piece of work that you did
on The Art of Computer Programming. But it’s always seemed to me that the researching and then writing the
text of The Art of Computer Programming was a problem generator for you. The way you and I have expressed it in
the past is that you were weaving a fabric and you would encounter holes in the fabric. Those would
be the great problems to solve, and that’s more than half the work. Once you find the problems you can go
get at them. Do you want to comment on that? Right. Well, yeah. We will probably comment
on it more later, too. But I guess one of the blessings and curses of the way I work is that I don’t
have difficulty thinking of questions. I don’t have too much difficulty in the problem generation phase
— what to work on. I have to actively suppress stimulation so that I’m not working on too
many things at once. But you can ask questions that are… The hard thing, for me anyway, is not to find a problem,
but to find a good problem. To find a problem that has some juice to it. Something that will not just
be isolated to something that happens to be true, but also will be something that will have spin offs. That once
you’ve solved the problem, the techniques are going to apply to many other things, or that this will be
a link in a chain to other things. It’s not just having a question that needs an answer. It’s very easy to…
There’s a professor; I might as well mention his name, although I don’t like to. It would be hard to mention
the concept without somebody thinking of his name. His name is [Florentin] Smarandache. I’ve never met
him, but he generates problems by the zillions. I’ve never seen one of them that I thought any merit in it whatsoever.
I mean, you can generate sequences of numbers in various ways. You can cube them and remove the middle
digit, or something like this. And say, ”Oh, is this prime?”, something like that. There’s all kinds of
ways of defining sequences of numbers or patterns of things and then asking a question about it. But if one
of my students say “I want to work on this for a thesis”, I would have to say “this problem stinks”. So
the hard thing is not to come up with a problem, but
to come up with a fruitful problem. Like the famous problem of
Fermat’s Last Theorem: can there be A to the N, plus B to the N equals C to the N,
for N greater than 2. It has no applications. So you found A, B
and C. It doesn’t really matter to anything. But in the course of working on this problem, people discovered
beautiful things about mathematical structures that have solved uncountably many practical applications as
a spin off. So that’s one. My thesis problem that I solved was probably not in that sense, though, extremely
interesting either. It answered a question whether there existed projective geometries of certain orders
that weren’t symmetrical. All the cases that people had ever thought of were symmetrical, and I thought
of unsymmetrical ways to do it. Well, so what? But the technique that I used for it led to some insight and
got around some other blocks that people had in other theory. I have to worry about not getting bogged down
in every question that I think of, because otherwise I can’t move on and get anything out the door. Don, we’ve gotten a little mixed up between
the finishing of your thesis and your assistant professorship at Caltech, but it doesn’t matter. Around
this time there was the embryonic beginnings of a multi-volume work which you’re known for, “The Art of Computer
Programming.” Could you tell us the story about the beginning? Because soon it’s going to be the middle of
it, you were working on it so fast. This is, of course, really the story of my
life, because I hope to live long enough to finish it. But I may not, because it’s turned out to be such
a huge project. I got married in the summer of 1961, after my first year of graduate school. My wife finished
college, and I could use the money I had made — the
$5000 on the compiler — to finance a trip to Europe for our honeymoon. We had four months
of wedded bliss in Southern California, and then a man from Addison-Wesley
came to visit me and said “Don, we would like you to write a book about how to write compilers.”
The more I thought about it, I decided “Oh yes, I’ve got this book inside of me.” I sketched out that
day — I still have the sheet of tablet paper on which I wrote — I sketched out 12 chapters that I thought ought
to be in such a book. I told Jill, my wife, “I think I’m going to write a book.” As I say, we had four months
of bliss, because the rest of our marriage has all been devoted to this book. Well, we still have
had happiness. But really, I wake up every morning and I still haven’t finished the book. So I try to — I
have to — organize the rest of my life around this, as one main unifying theme. The
book was supposed to be about how to write a compiler. They had heard about me from one of their editorial advisors, that I knew something
about how to do this. The idea appealed to me for two main reasons. One is
that I did enjoy writing. In high school I had been editor of the weekly paper. In college I was editor
of the science magazine, and I worked on the campus paper as copy editor. And, as I told you, I wrote the manual
for that compiler that we wrote. I enjoyed writing, number one. Also, Addison-Wesley was the people who
were asking me to do this book; my favorite textbooks had been published by Addison Wesley. They had done
the books that I loved the most as a student. For them to come to me and say, ”Would you write a book for
us?”, and here I am just a second- year gradate student — this was a thrill. Another very important reason at the
time was that I knew that there was a great need for a book about compilers, because there were a lot
of people who even in 1962 — this was January of 1962 — were starting
to rediscover the wheel. The knowledge was out there, but it hadn’t been explained. The people who had
discovered it, though, were scattered all over the world and they didn’t know of each other’s work either, very
much. I had been following it. Everybody I could think of who could write a book about compilers, as far
as I could see, they would only give a piece of the
fabric. They would slant it to their own view of it. There might be four people who could
write about it, but they would write four different books. I could
present all four of their viewpoints in what I would think was
a balanced way, without any axe to grind, without slanting it towards something that
I thought would be misleading to the compiler writer for the
future. I considered myself as a journalist, essentially. I could be the expositor, the tech writer, that could
do the job that was needed in order to take the work of these brilliant people and make it accessible to
the world. That was my motivation. Now, I didn’t have much time to spend on it then, I just had this page
of paper with 12 chapter headings on it. That’s all I could do while I’m a consultant at Burroughs and doing
my graduate work. I signed a contract, but they said “We know it’ll take you a while.” I didn’t really begin
to have much time to work on it until 1963, my third year
of graduate school, as I’m already finishing up on my thesis. In the summer of ’62, I guess
I should mention, I wrote another compiler. This was for Univac;
it was a FORTRAN compiler. I spent the summer, I sold my soul to the devil, I guess you say, for three months
in the summer of 1962 to write a FORTRAN compiler. I believe that the salary for that was $15,000, which
was much more than an assistant professor. I think assistant professors were getting eight or nine thousand
in those days. Well, when I started in 1960 at [University
of California] Berkeley, I was getting $7,600 for the nine- month year. Yeah, so you see it. I got $15,000 for a summer
job in 1962 writing a FORTRAN compiler. One day during that summer I was writing the part of the
compiler that looks up identifiers in a hash table. The method that we used is called linear probing. Basically
you take the variable name that you want to look up, you scramble it, like you square it or something like this,
and that gives you a number between one and, well in those days it would have been between 1 and 1000,
and then you look there. If you find it, good; if you don’t find it, go to the next place and keep on going
until you either get to an empty place, or you find the number you’re looking for. It’s called linear probing.
There was a rumor that one of Professor Feller’s students at Princeton had tried to figure out how fast
linear probing works and was unable to succeed. This was a new thing for me. It was a case where I was doing
programming, but I also had a mathematical problem that would go into my other [job]. My winter job was
being a math student, my summer job was writing compilers. There was no mix. These worlds did not intersect
at all in my life at that point. So I spent one day during the summer while writing the compiler looking
at the mathematics of how fast does linear probing work. I got lucky, and I solved the problem. I figured
out some math, and I kept two or three sheets of paper with me and I typed it up. [“Notes on ‘Open’ Addressing’,
7/22/63] I guess that’s on the internet now, because this became really the genesis of my main research
work, which developed not to be working on compilers, but to be working on what they call analysis of algorithms,
which is, have a computer method and find out how good is it quantitatively. I can say, if I got so
many things to look up in the table, how long is linear probing going to take. It dawned on me that this was
just one of many algorithms that would be important, and each one would lead to a fascinating mathematical
problem. This was easily a good lifetime source of rich problems to work on. Here I am then, in the middle
of 1962, writing this FORTRAN compiler, and I had one day to do the research and mathematics that changed my life
for my future research trends. But now I’ve gotten off the topic of what your original question was. We were talking about sort of the.. You talked
about the embryo of The Art of Computing. The compiler book morphed into The Art of Computer Programming,
which became a seven-volume plan. Exactly. Anyway, I’m working on a compiler
and I’m thinking about this. But now I’m starting, after I finish this summer job, then I began to do
things that were going to be relating to the book. One of the things I knew I had to have in the book was
an artificial machine, because I’m writing a compiler book but machines are changing faster than I can write
books. I have to have a machine that I’m totally in control of. I invented this machine called MIX, which
was typical of the computers of 1962. In 1963 I wrote a simulator for MIX so that I could write sample
programs for it, and I taught a class at Caltech on how to write programs in assembly language for this
hypothetical computer. Then I started writing the parts that dealt with sorting problems and searching
problems, like the linear probing idea. I began to write those parts, which are part of a compiler, of the
book. I had several hundred pages of notes gathering for those chapters for The Art of Computer Programming.
Before I graduated, I’ve already done quite a bit of writing on The Art of Computer Programming. I met George
Forsythe about this time. George was the man who inspired both of us [Knuth and Feigenbaum] to come to Stanford
during the ’60s. George came down to Southern California for a talk, and he said, “Come up to Stanford.
How about joining our faculty?” I said “Oh no, I can’t do that. I just got married, and I’ve got to finish this
book first.” I said, “I think I’ll finish the book next year, and then I can come up [and] start thinking
about the rest of my life, but I want to get my book done before my son is born.” Well, John is now 40-some
years old and I’m not done with the book. Part of my lack of expertise is
any good estimation procedure as to how long projects are going to take. I way underestimated how much needed to be written about in this
book. Anyway, I started writing the manuscript, and I went merrily along writing pages of things that
I thought really needed to be said. Of course, it didn’t take long before I had started to discover a few things
of my own that weren’t in any of the existing literature. I did have an axe to grind. The message that I was
presenting was in fact not going to be unbiased at all. It was going to be based on my own particular slant
on stuff, and that original reason for why I should write the book became impossible to sustain. But the
fact that I had worked on linear probing and solved the problem gave me a new unifying theme for the book.
I was going to base it around this idea of analyzing algorithms, and have some quantitative ideas about how
good methods were. Not just that they worked, but that they worked well: this method worked 3 times better than
this method, or 3.1 times better than this method. Also, at this time I was learning mathematical techniques
that I had never been taught in school. I found they were out there, but they just hadn’t been emphasized
openly, about how to solve problems of this kind. So my book would also present a different kind of mathematics
than was common in the curriculum at the time, that was very relevant to analysis of algorithm. I
went to the publishers, I went to Addison Wesley, and said “How about changing the title of the book from
‘The Art of Computer Programming’ to ‘The Analysis of Algorithms’.” They said that will never sell; their focus
group couldn’t buy that one. I’m glad they stuck to the original title, although I’m also glad to see that
several books have now come out called “The Analysis of Algorithms”, 20 years down the line. But
in those days, The Art of Computer Programming was very important because I’m thinking of the aesthetical: the
whole question of writing programs as something that has artistic aspects in all senses of the word.
The one idea is “art” which means artificial, and the other “art” means fine art. All these are long stories,
but I’ve got to cover it fairly quickly. I’ve got The Art of Computer Programming started out, and I’m
working on my 12 chapters. I finish a rough draft of all 12 chapters by, I think it was like 1965. I’ve
got 3,000 pages of notes, including a very good example of
what you mentioned about seeing holes in the fabric.
One of the most important chapters in the book is parsing: going from somebody’s algebraic formula and
figuring out the structure of the formula. Just the way I had done in seventh grade finding the structure
of English sentences, I had to do this with mathematical sentences. Chapter ten is all about parsing
of context-free language, [which] is what we called it at the time. I covered what people had published
about context-free languages and parsing. I got to the end of the chapter and I said, well, you can combine
these ideas and these ideas, and all of a sudden you get a unifying thing which goes all the way to the limit.
These other ideas had sort of gone partway there. They would say “Oh, if a grammar satisfies this condition,
I can do it efficiently.” ”If a grammar satisfies this condition, I can do it efficiently.” But now, all of
a sudden, I saw there was a way to say I can find the most general condition that can be done efficiently without
looking ahead to the end of the sentence. That you could make a decision on the fly, reading from left to
right, about the structure of the thing. That was just a natural outgrowth of seeing the different pieces of
the fabric that other people had put together, and writing it into a chapter for the first time. But I felt
that this general concept, well, I didn’t feel that I had surrounded the concept. I knew that I had
it, and I could prove it, and I could check it, but I couldn’t really intuit it all in my head. I knew it
was right, but it was too hard for me, really, to explain it well. So I didn’t put in The Art of Computer Programming.
I thought it was beyond the scope of my book. Textbooks don’t have to cover everything when you get
to the harder things; then you have to go to the literature. My idea at that time [is] I’m writing this book
and I’m thinking it’s going to be published very soon, so any little things I discover and put in the book
I didn’t bother to write a paper and publish in the journal because I figure it’ll be in my book pretty
soon anyway. Computer science is changing so fast, my book is bound to be obsolete. It takes a year for
it to go through editing, and people drawing the illustrations, and then they have to print it and bind it and
so on. I have to be a little bit ahead of the state-of-the-art if my book isn’t going to be obsolete when it
comes out. So I kept most of the stuff to myself that I had, these little ideas I had been coming up with. But
when I got to this idea of left-to-right parsing, I said “Well here’s something I don’t really understand
very well. I’ll publish this, let other people figure out what it is, and then they can tell me what I should
have said.” I published that paper I believe in 1965, at the end of finishing my draft of the chapter, which
didn’t get as far as that story, LR(k). Well now, textbooks of computer science start with LR(k) and take
off from there. But I want to give you an idea of… Don, for historical reasons, tell the audience
where the LR(k) paper was published so they can go look it up. It was published in the journal called Information
and Control, which has now changed its name to Information and Computation. In those days,
you can see why they called it Information and Control. It was the journal that had had the best papers on
parsing of languages at the time. It’s a long paper, and difficult. It’s also reprinted in my book
“Selected Papers on Computer Languages”, with a few corrections to the original. In the original, I drew the
trees with the root at the bottom. But everybody draws trees with the root at the top now, so the reprint has
trees drawn in a more modern notation. I’m trying to give the flavor of the way things were in 1965. My
son was born in the summer of ’65, and I finished this work on LR (k) at Christmastime in ’65. Then I had, I
think, one more chapter to write. But early in ’66 I had all 3000 pages of the manuscript ready. I typed chapter
one. My idea was, I looked at these pages — the pages were all hand-written — and it looked to me like
my handwriting, I would guess, that was, I don’t know how many words there were on a page. I had chapter
one and I typed it and I sent it to the publishers, and they said “Don, what have you written? This book is
going to be huge.” I had actually written them a letter earlier as I’m working on sorting. I said to the guy
who signed me up, I signed a contract with him; by this time, he had been promoted. No, I’m not sure was about
this, but anyway, I wrote to him in ’63 or ’64 saying, “You know, as I’m working on this book on compilers,
there’s a few things that deserve more complete treatment than a compiler writer needs to know. Do you
mind if I add a little bit here?” They said “Sure, Don, go right ahead. Whatever you think is good to write
about, do it.” Then I send them chapter one a few years later. By this time, I guess the guy’s promoted, and
he’s saying “Oh my goodness, what are we going to do? Did you realize that this book is going to be more
than 2,000 pages long?”, or something like this. No, I didn’t. I had read a lot of books, and I thought I understood
about things. I had my typed pages there, and I was figuring five typed pages would go into one
page of text. It just looked to me, to my eyes, if I had five typewritten pages — you know, the letters
in a textbook are smaller. But I should have realized that the guys at the publishing house knew something
about books too. They told me “No, no, it was one and a half pages of text makes a book [page].” I didn’t
believe it. So I went back to my calculus book, which was an Addison Wesley book, and it typed it out.
Sure enough, they were absolutely right. It took one and a half pages. So I had three times longer. No wonder
it had taken me so long to get chapter one done! I’m sitting here with much, much more than I thought I
had. Meanwhile computer science hasn’t been standing still, and I knew that more still has to be written as
I go. I went to Boston, and I happened to catch a glance at some notes that my editor had written to himself
for the meeting that we were going to have with his bosses, and one of the comments on there was “Terrific
cost bind” or something like that. Publishing houses all have their horror stories about a professor who
writes12 volumes about the history of an egg, or something like this, and it never sells, and it just is a
terrible thing that they have a contract that they’ve signed. So they have to figure out how to rescue something
out of this situation coming with this monster book. We thought at first we would package it into
three volumes instead of one. Then they sent out chapter one to a dozen readers in a focus group, and they got
comments on it. Well, the readers liked what they saw in that chapter, and so at least I had some support
from them. Then after a few more months we decided to package it. They figured out that of the 12 chapters there
were seven of them that would sell, and we could stick the other five in some way that would make a fairly
decent seven-volume set. That was what was finally announced in 1966 or something: that it would come out
in seven volumes. After typing chapter one I typed chapter two, and so on. I kept working on it. All the time
when I’m not teaching my classes at Caltech, I’m typing up my notes and polishing the hand-written notes
that I had made from these 3000 pages of rough draft. That sets the scene for the early days of The Art of
Computer Programming. What year are we at now? What happened is, I’m at Caltech. I’m a math
professor. I’m teaching classes in algebra and once in a while combinatorics at Caltech. Also one or
two classes connected with computing, like sorting, I think I might have taught one quarter. But most of
the things I’m teaching at Caltech are orthogonal to The Art of Computer Programming. My daughter is born
in December of ’66. I’ve got the entire manuscript of volume one to the publisher, I think, during ’66. I’m working
on typing up chapters three and four at the beginning of ’67. I think this is approximately the way things
stand. I was trying to finish the book before my son was born in ’65, and what happened is that I got… I’m
seeing now that…Volume one actually turned out to be almost 700 pages, which means 1,000 type-written pages.
You can see why I said that my blissful marriage wasn’t quite so blissful, because I’m working on this a lot.
I’m doing most of it actually watching the late late show on television. I have also some earplugs for
when the kids are screaming a little bit too much. Here I am, typing The Art of Computer Programming when
the babies are crying, although I did also change diapers and so on. I think that what we need to do is talk about…
This is December ’66, when your daughter was born. Yeah. That leads sort of directly into this magical
year of 1967, which didn’t end so magically. Let’s continue on with 1967 in a moment. Okay. Don, once you told me that 1967 was your most
creative year. I’d like to get into it. You also said you had only a very short time to do your research
during that year, and the year didn’t end so well for you. Let’s talk about that. Well, it’s certainly a pivotal year in my
life. You can see in retrospect why I think things were building up to a crisis, because I was just
working at high pitch all the time. I think I mentioned I was editor of ACM Communications, and ACM Journal,
in the programming languages sections. I took the editorial duties very seriously. A lot of people were
submitting papers, and I would write long referee reports in many cases, as well as discussing with referees
all the things I had to be doing. I was a consultant to Burroughs on innovative machines. I was consumed with
getting The Art of Computer Programming done, and I had children, and being a father, and husband.
I would start out every day and I would say “Well, what am I going to accomplish today?” Then I would stay up
until I finished it. I used to be able to do this. When I was in high school and I was editor of the paper,
I would do an all-nighter every week when the paper came out. I would just go without sleep on those occasions.
I was sort of used to working in this mode, where I didn’t realize I was punishing my body. We didn’t
have iPods and things like that, but still I had the TV on. That was enough to kill the boredom while I had
to do the typing of a lot of material. Now, in 1967, is when things came to a head. Also, it was time for
me to make a career decision. I was getting offers. I think I was offered full professorships at North Carolina
in Chapel Hill, and also at Purdue, I think. I had to make a decision as to what I should do. I was promoted
to Associate Professor at Caltech surprisingly early. The question is, where should I spend the rest
of my life? Should I be a mathematician? Should I be a computer scientist? By this time I had learned that
there was actually possible to do mathematical work as a computer scientist. I had analysis of algorithms to
do. What would be a permanent home? I visited Stanford. I gave a talk about my left-to-right parsing. I discovered
a theorem about it sitting in one of the student dormitories, Stern Hall, the night I gave
the lecture. I came up there, I liked George Forsythe very much, I liked the people that I met here very much.
I was thinking Stanford would be a nice place, but also there were other places too that I wanted to check
out carefully. I was also trying to think about what to do long-term for my permanent home. I don’t like
to move. My model of my life was going to be that I was going to make one move in my lifetime to a place
where I had tenure, and I would stay there forever. I wanted to check all these things out, so I was confronted
with this aspect as well. I was signed up to be an ACM lecturer, ACM National Lecture Program, for
two or three weeks in February of 1967, which meant that I give a list of three talks. Each ACM chapter or university
that wants to have a speaker, they coordinate so that I have a schedule. I go from city to city every
day. You probably did the same thing about then. Yep. Stanford and Berkeley were on this list, as
well as quite a few schools in the east. That was three weeks in February where I was giving talks,
about different things about programming languages, mostly. When I’m at Caltech, I’ve got to be either preparing
my class lectures, or typing my book and getting it done. I’m in the middle of typing chapter four at this
time, which is the second part of volume two. I’m about, I don’t know, one third of the way into volume two.
That’s why I don’t have time to do research. If I get a new idea, if I’m saying “Here’s a problem that ought
to be solved”, when am I going to do it? Maybe on the airplane. As you know, when you’re a lecturer every
day goes the same way. You get up at your hotel, and you get on the plane. Somebody meets you at noon and you
go out to lunch and then they have small talk. They ask you the same questions; “Where are you going to be
tomorrow, Don”, and so on. You give your lecture in the afternoon, there’s a party in the evening, and then you
go to your hotel. The next morning you just go off to the next city. After three weeks of this, I got really
not very good. I skipped out in one case. There was a snowstorm in Atlanta, so I skipped my talk in Atlanta
and I stayed an extra day. I’m trying to give you the flavor of this. But on this trip in February, also,
it turned out to be very fruitful because one of my stops was in Cornell, where Peter Wegner was a visiting
professor. We went out for a hike that weekend to talk about the main topic in programming languages in those
days: how do you define the semantics of a programming language. What’s a good way to formalize the meaning
of the sentences in that language? When someone writes a string of symbols, we wanted to say exactly what that
means, and do it in a way that we can prove interesting results about, and make sure that we’ve translated
it correctly. There were a lot of ideas floating in the air at the time. I had been thinking of how I’m presenting
it in The Art of Computer Programming. I said, well, you know, there were two basic ways to do this.
One is top down, where you have the context telling you what to do. You start out and you say, “Oh, this
is supposed to be a program. What does a program mean?” Then a program tells the things inside the program
what they’re supposed to mean. The other is bottom up, where you just start with one symbol, this is a number
one, and say “this means one”, and then you have a plus sign, and one plus two, and you build up from the
bottom, and say “that means three”. So we have a bottom-up version of semantics, and a top-down version
of semantics. Peter Wegner says to me “Don, why don’t you use both top-down and bottom-up? Have the synthesized
attributes from the bottom up and the inherited attributes that come down from the environment.” I said
“Well, this is obviously impossible. You get into circular reasoning. You can’t define something in terms
of itself.” We were talking about this, and after ten minutes I realized I was shouting to him, because
I was realizing that he was absolutely right. You could do it both ways, and define the things in a way that
they would not interfere with each other; that certain aspects of the meaning could come from the top, and other
aspects from the bottom, and that this actually made a beautiful combination. Don, we were speaking about semantics of programming
languages and you were shouting at Peter Wegner. I’m shouting at Peter Wegner because it
turns out that there’s a beautiful way to combine the top-down and bottom-up approaches simultaneously when
you’re defining semantics. This is happening on a weekend as we’re hiking at Cornell in a beautiful park
by frozen icicles. I can remember the scene because this was kind of an “aha” moment that doesn’t happen
to you very often in your life. People tell me now no one’s allowed in that park in February because it’s too risky
that you’re going to slide and hurt yourself. It was when all of a sudden it occurred to me that this might
be possible. But I don’t have time to do research. I have to go on and give more lectures. Well, I find myself
the next week at Stanford University speaking to the graduate students. I gave one of my regular lectures,
and then there was an hour where the students ask questions to the visitor. There was a student there named
Susan Graham, who of course turned out to be a very distinguished professor at Berkeley and editor
of Transactions on Programming Languages and Systems, and she asked me a question. “Don, how do you think
would be a good way to define semantics of programming languages?” In the back of my mind through
that week I had been tossing around this idea that Peter and I had talked about the week before. So I said, “Let’s
try to sketch out a simple language and try to define its semantics”. On the blackboard, in response
to Susan’s questions, we would erase, and try things, and some things wouldn’t work. But for the next 15
or 20 minutes I tried to write down something that I had never written down before, but it was sort of in
the back of my mind: how to define a very simple algebraic language and convert it into a very simple
machine language which we invented on the spot to be an abstract but very simple computer. Then we would try
to write out the formal semantics for this, so that I could write a few lines in this algebraic language, and
then we could parse it and see exactly what the semantics would be, which would be the machine language program.
Of course there must have been a lot of bugs in it, but this is the way I had to do research at that time.
I had a chance while I’m in front of the students to think about the research problem that was just beginning
to jell. Who knows how bad it was fouled up, but on the other hand, being a teacher, that’s when
you get your thoughts in order best. If you’re only talking to yourself, you don’t organize your thoughts
with as much discipline. It probably was also not a bad way to do research. I didn’t get a chance to think
about it when I got home to Caltech because I’m typing up The Art of Computer Programming when I’m at home, and
I’m being an editor, and I’m teaching my classes the rest of the time at Caltech. Then in April I happened
to be giving a lecture in Grenoble, and a Frenchman, Louis Bolliet, asked me something about how one might define
semantics, in another sort of a bull session in Grenoble in France. That was my second chance to think
about this problem, when I was talking with him there. I was stealing time from the other things. That
wasn’t the only thing going on in ’67. I wasn’t only thinking of what to do with my future life, and editing
journals and so on, I’m also teaching a class at Caltech for sophomores. It’s an all year class, sort
of an introduction to abstract mathematics. While I was looking at a problem, we had a visitor at Caltech named
Trevor– what’s his last name– Evans, Trevor Evans. He and I were discussing how to work from axioms, and to
prove theorems from axioms. This is a basic thing in abstract mathematics. Somebody sets down an axiom,
like the associative law; it says that if parentheses “ab” times “c” is equal to “a” times parentheses
“bc.” That’s an axiom. I was looking at other axioms that were sort of random. One of the things I asked my students
in the class was, I was trying to teach the sophomores how to do mini research problems. So I gave them
axioms which I called the “axioms of a grope.” They were supposed to develop “grope theory” — they were
supposed to grope for theorems. Of course the mathematical theory well developed is a “group”, which I had been
teaching them; axioms of groups. One of them is the associative law. Another axiom of groups is that an element
times its inverse is equal to the identity. Another axiom is that the identity times anything, identity
times “X”, is “X”. So groups have axioms. We learned in the class how to derive consequences of these axioms
that weren’t exactly obvious at the beginning. So I said, okay, let’s make a “grope.” The axiom for
a grope is something like “x” times the quantity “yx” was equal to “y”. I give them this axiom, and I say to the class,
what can you derive? Can you find all gropes that have five elements? Can you prove any theorems about
normal subgroups, or whatever it is? Make up a theory.
As a class we came back in a week saying what theorems could you come up with. We tried
to imagine ourselves in the shoes of an inventor of a
mathematical theory, starting with axioms. Well, Trevor Evans was there and he showed me how to define what
we called the “free grope,” which is the set of all… It can be infinite, but you take all strings of letters,
all formulas. Is it possible to tell whether one formula can be proved equal to the other formula just
by using this one axiom of the grope, “x” times “yx” equals “y”? He showed me a very nice way to solve that
problem, because he had been working on word problems in what’s called universal algebra, the study of axiom
systems. While I was looking at Trevor Evans’ solution to this problem — this problem arose in connection
with my teaching of the class — I looked at Trevor Evans’ solution to this problem and I realized that
I could develop an actual method that would work with axioms in general, without thinking that a machine could
figure out. The machine could start out with the axioms of group theory, and after a small amount of
computation, it could come up with a set of 10 consequences of those axioms that would be enough to decide
the word problem for free groups. And the machine was doing it. We didn’t need a mathematician there to
prove, to say, “Oh, now try combining this formula and this formula.” With the technique I learned from Trevor Evans,
and then with a little extra twist that I put on it, I could set the machine going on axioms and it would
automatically know which consequences of these things, which things to plug in, would be potentially fruitful.
If we were lucky, like we were in the case of group theory axioms, it would finally get to the end and
say, “Now, there’s nothing more can be proved. I’ve got enough. I’ve got a complete set of reductions. If
you apply these reductions and none of them applies, you’ve got it.” It relates to AI techniques of expert
systems, in a way. This idea came to me as I’m teaching this basic math class. The students in this class were
supposed to do a term paper. In the third quarter, everybody worked on this. One of the best students in
the class, Peter Bendix, chose to do his term paper by implementing the algorithm that I had sketched
on the blackboard in one of the lectures at that time. So we could do experiments during the spring of
’67, trying out a whole bunch of different kinds of axioms and seeing which ones the machine would solve
and which ones it would keep spinning and keep generating more and more reductions that seemed to go without
limit. We figured out in some cases how we could introduce new axioms that would bring the whole thing back
down again. So we’re doing a lot of experiments on that kind of thing. I don’t have time to sit down at
home and work out the theory for it, but I knew it had lots of possibilities. Here I had attribute grammars
coming up in February, and these reductions systems coming up in March, and I’m supposed to be grinding out
Volume Two of The Art of Computer Programming. The text of volume one had gone to Addison-Wesley the previous
year, and the copy editor had sent me back corrections and told me, “Don, this isn’t good writing. You’ve
got to change this,” and he’d teach me Addison-Wesley house style. The page proofs started coming. I started
going through galley proofs, but now it was time to get page proofs for volume one. Volume one was published in
January of 1968, but the page proofs started to be available in the spring also. So it’s layer, upon layer, upon layer. Right. There’s a conference in April in
Norway on simulation languages; that was another of the things that I’d been working on at Burroughs. We
had a language called SOL, Simulation Oriented Language, which was an improvement of the state-of-the-art in
systems simulation, in what they called discrete simulation languages. There was an international conference
held in Norway by the people who had invented the Simula language, which wasn’t very well known.
They organized this conference and I went to that, visiting Paris and Grenoble on my way because Maurice Nivat and
I had also become friends. His thesis was on theory of context- free grammars, and no one in France would
read it. He found a guy in America who would appreciate his work, so he came out and we spent some time together
in ’66 getting to know each other and talking about context- free grammar research. I visited him in Paris
and then I went to Grenoble, and then went to Norway for this conference on simulation languages where I
presented a paper about SOL, and learned about Simula, and so on. My parents and Jill’s parents are taking
care of our kids while we’re in Europe during this time in April. I’m scheduled in June to lecture at a summer
school in Copenhagen, an international summer school. I’m giving lectures about how to parse, what’s called
top-down parsing. “LL(k)” is the terminology that developed after these lectures. This was a topic that I did
put in my draft of Chapter 10. It was something that I understood well enough that I didn’t have to publish
it at the time. I gave it for the first time in these lectures in June in Copenhagen. That was a one-week series
of lectures with several lectures every day, five days, to be given there. The summer school met for two
weeks, and I was supposed to speak in the second week of that summer school. All right. What happened then
in May is I had a massive bleeding ulcer, and I was in the hospital. My body gave out. I was just doing
all this stuff, and it couldn’t take it. I learned about myself. I had a wonderful doctor who showed me his
textbook about ulcers. At that time they didn’t know that ulcers are related to this bacteria. As far as they
were concerned it, was just acid. Stress. Yeah. People would get operations so that
their stomachs wouldn’t produce so much acid, and things like that. Anyway, he showed me his textbook, and
his textbook described the typical ulcer patient; what other people call the “Type A” personality.
It just described me to a “T”, all of the things that were there. I was an automaton, I think, basically. I had all
been all my life pretty much a test-taking machine. You know, I saw a goal and I put myself to it, and I worked
on it and pushed it through. I didn’t say no to people when they said, “Don, can you do this for me?”
At this point I saw, I could all of a sudden get to understand, that I had this problem; that I shouldn’t
try to do the impossible. The doctor, I say he’s so wonderful because doctors usually talk down to patients
and they keep their secrets to themselves. But here he let me look at this textbook so I could know that
he wasn’t just telling me something to make me feel good. I had access to anything I wanted to know about
my condition. So I wrote a letter to my publisher, framed in black, saying, “I’m not going to be able to get
the manuscript of volume two to you this year. I’m sorry. I’m not supposed to work for the next three weeks.”
In fact, you can tell exactly where this was. I was writing in a part of volume two when the ulcer happened,
when it started to burst or whatever. I was working out the answer to a problem about greatest common
divisors that goes about in the middle of volume two. It was an exercise where the answer had a lot of cases
to it, so it takes about a page and a half to explain the answer. It was a problem that needed to be
studied and nobody had studied before, and I was working at it. All of a sudden, bingo. The reason you can
find it is if you look in the index to volume two under “brute force,” it refers you to a page, an answer
page. I was solving this problem by brute force, and so you look at that page, you can see exactly what exercise
I was working on. Then I put it away. I only solved half of the exercise before I could work on it again
a few weeks later. I went into the hospital. It wasn’t too bad, but the blood supply… I took iron pills
and got ready. I could still go to Copenhagen to give my lectures in June. However, the first week was supposed
to be lectures by Nicholas Wirth, and the second week was supposed to be lectures by me. But Klaus had just gone
on an around the world tour with his wife and had come down with dysentery in India and was extremely
ill, and had to cancel his lectures. So I was supposed to go on in the first week instead. But I was stealing
time so bad, I hadn’t really prepared my lectures. I said, oh, I have a week. I’ll go to Copenhagen, listen
to Klaus and I’ll prepare my lectures. I hadn’t prepared. So I’m talking about stuff that has never been written
down before, never been developed with the students. I get to Copenhagen with one day to prepare for this
week of lectures. Well, one thing in Copenhagen, there’s wonderful parks all over the city. I sat down
under a big tree in one of those parks on the first day, and I thought of enough things to say in my first
two lectures. On the second day I gave the lectures, and I sat down under that tree and I worked out the
lectures for the next day. These lectures became my paper called “Top-Down Syntax Analysis.”
That was the story of the first part of June. The second part of June I’m going to a conference in Oxford, one of the first
conferences on discrete mathematics. There I’m presenting my paper on the new method that I had discovered,
now called the Knuth-Bendix algorithm, about the word problems in universal algebra. After I finished my
lectures at Copenhagen I had time to write the paper that I was giving at Oxford the following week. There
at Oxford, I meet a lot of other people and get more stimulated about combinatorial research, which I can’t
do. Come back to Caltech and I’m working as a consultant as well. I resigned from ten editorial boards at this
time. No more ACM Journal, no more Communications. I gave up all of the editorships that I was on in order
to cut down my work load. I started working again on volume two where I left off at the time of the ulcer,
but I would be careful to go to sleep and keep a regular schedule. In the fall I went to a conference in Santa
Barbara, a conference on combinatorial mathematics. That was my first chance to be away from Caltech, away
from my teaching duties, away from having to type The Art of Computer Programming. That’s where I had
three days to sit on the beach and develop the theory of attribute grammars, this idea of top-down and bottom–up.
I cut out of the whole conference. I didn’t go to any of the talks. I just sat on the beach and worked
on the theory of attribute grammar. As it turned out, I wasn’t that interested in most of the talks, although
I met people that became lifelong friends at the meals and we talked about things off-line. But the formal
talks themselves, I was getting disappointed with mathematical talks. I found myself, in most lectures on
mathematics that I heard in 1966 and ’67, I sat in the back row and I said, “So what? So what?” Computer
science was becoming much more exciting to me. When I finally made my career decision as to where to go, I had
four main choices. One was stay at Caltech. They offered me full professor of mathematics. I could go to Harvard
as a full professor in applied science, which meant computer science. That was as close as you could get
to computer science there. At Harvard my job would have been to build up a computer science department there.
Harvard was, in Floyd’s term, an advanced backwater at that point in time for computer science, and Caltech
was as well. Because Caltech and Harvard are so good at physics and chemistry and biology, they were
thinking of computers because they can help chemists and physicists and biologists. They didn’t think
of it as having problems of its own interest. Stanford, where we had the best group of computer scientists
in the world already there, and knowing that computer science had a great future, and also the best students in
the world were there to work with, the program was already built up. I could come to Stanford and be one of
the boys and do computer science, instead of argue for computer science and try to do barnstorming. Berkeley
was the fourth place. I admired Berkeley very much as probably the greatest all around institution for covering
everything. Everything Stanford covered it covered well, but it didn’t have a professor of Sanskrit,
and Berkeley had a professor of Sanskrit, that sort of thing. But I was worried about Berkeley because Ronald
Reagan was governor. Stanford was a private school and wouldn’t be subject to the whims of politicians so much
as the University of California. Stanford had this great other thing where the faculty can live on campus,
so I knew that I could come to Stanford and the rest of my life I would be able to bike to work; I wouldn’t
have to do any commuting. And Forsythe was a wonderful person, and all the group at Stanford were great, and
the students were the best. So it was almost a no-brainer, why I finally came to Stanford. My offer from Stanford
came through in February of ’68, which was the end. The other three had already come in earlier, but
I was waiting for Stanford before I made my final decision. In February of ’68 I finally got the offer
from Stanford. It was a month after volume one had been published, and George said, “Oh yes, everybody’s
all smiles now.” Everyone was all smiles because they had gone
out on a limb to offer you a full professorship? No, because the committees were saying, “This
guy is just 30 years old.” You know, I was born in ’38 and this was January of ’68. But when they looked
at the book, they said, “Oh, there’s some credibility here.” That helped me. I got through ’67 and learned
how to slack off a little bit, right? I’ve always felt after that, hearing many other stories of people
of when did they get these special insights that turned out to be important in their research thing, that was
very rarely in a settled time of their life, where they had a comfortable living conditions and good – the
word is escaping me now – but anyway, luxury; set up a nice office space and good lighting and so forth.
No, people are working in a garret, they’re starving, they’ve got kids screaming, there’s a war going
on or something. But that’s when they get a lot of their most… almost every breakthrough idea. I’ve always wondered,
if you wanted to set up a think tank where you were going to get the most productivity out of your scientists,
wouldn’t you have to, not exactly torture them, but deprive them of things? It’s not sustainable. Still,
looking back, that was a time when I did as much science as I could, as well as try to fulfill all my other
obligations. Don, to go back to the Stanford move. A couple
of questions come up, because I was around. I remember sitting in George Forsythe’s office, just
a handful of us people considering the appointment of this young guy from Caltech who had this wonderful outline
of books. One of the things that we were discussing was [that] Don Knuth wanted us to also hire Bob
Floyd. It turns out that hiring Bob Floyd was a wonderful idea. Bob Floyd was magnificent. But it hadn’t
occurred to us until you brought it up, and then we did it. Can you go into that story?DK: Yeah, because Bob was
a very special person to me throughout this period. As I said, I’d been reading the literature about programming
languages avidly. When I was asked to write a book about it in ’62, I knew there were these people who
had written nice papers, but nobody knew how to sort out the chaff from
the wheat. In the early days, like by 1964, my strong opinion was that five good papers
about programming languages had ever been written,
and four of them were by Bob Floyd. I met Bob the first time in summer of ’62 when I was working on this
Fortran compiler for Univac. At the end of the summer I went to the ACM conference in Syracuse, New York, and
Bob was there. We hit it off very well right away. He was showing me his strange idea that you could prove a
computer program correct, something that had never occurred to me. I said I was a programmer in one room, and
I was a mathematician in another room. Mathematicians prove things. Programmers write code and they hope
it works, and they twiddle it until it works. But Bob’s saying, no, you don’t have to twiddle; you can take
a program and you can give a mathematical proof that it works. He was way ahead of me. There were very few people
who had ever conceived of putting those two worlds together at that time. EF: [John] McCarthy was one
of them, though.DK: McCarthy, exactly, right. John and Bob were probably… I don’t know if there was anybody
in Europe yet who had seen this right. Bob tells me his thoughts about this when I meet him in this conference
in Syracuse. Then I went to visit him a year later when I was in Massachusetts at the crisis meeting with
my publishers. He lived there, and I went and spent a couple of days in Topsfield where he lived. We shared
ideas about sorting. Then we had a really exciting correspondence over the next time where letters go back and
forth, each one trying to trump the other about coming up with a better idea about something that’s now called
sorting networks. Bob and I developed a theory of sorting networks between us in the correspondence.
We were thinking at the time, this looks like Leibnitz writing to Bernoulli in the old days of scientists trying
to develop a new theory. We had a very exciting time working on these letters. Every time I would send
a letter off to Bob, thinking, “Okay, now this is the last result,” he would come back with a brand new idea and
make me work harder to come up with the next step in our development of this theory. We weren’t talking
only about programming languages; we were talking also about a variety of algorithms. We found that we had
lots of common interests. He came out to visit me a couple of times in California, and I visited him. So
when I was making my career decision, I said, “Hey Bob, wouldn’t it be nice if we could both end up at the
same place?” I wrote him a letter, probably the same letter where I was describing to him my idea about left-to-right
parsing. As soon as I discovered it, I wrote immediately to Bob a 12-page letter with ideas of left-to-right
parsing after I had come up with the idea. He comes back and says, “Oh, bravo, and did you think about
this,” and so on. So we had this going on. Then at the beginning of ’67 I said, “You know, Bob, why don’t
we think about trying to get into the same place together? What is your take on the different places in the world?”
At that time he was at Carnegie. He had left Computer Associates and spent, I think, two years at Carnegie.
He was enjoying it there, and he was teaching and introducing new things into the curriculum there. He wrote
me this letter assessing all of the schools at the time, the way he thought their development of computer science
was. When I quoted him a minute ago saying Harvard was an advanced backwater, that comes out of that
letter that he was describing the way he looked at things. At the end of the letter he says —
I had already mentioned that Stanford was my current number one but I wasn’t totally sure — and at the end he ended up
concurring. He said if I would go there and he could go there, chances are he would go there, too. I presented
this to Forsythe, saying why don’t we try to make it a package deal. This meant they had to give
up two professors to replace us with. They couldn’t get two new billets for us, and so it was a lot of work
on Stanford’s part, but it did develop. Except that you had to lose two other good people, but I think Bob
and I did all right for the department. EF: Maybe that was your first great service to our department was
recruiting Bob Floyd. Well, I don’t know. I did have to work a
little bit the year after I got here. To my surprise they had appointed him as an associate professor but
me as a full professor. It was understandable because he didn’t have a Ph.D. He had been a child prodigy,
and I think he had gotten into graduate school at something like age 17, and then dropped out to become a full
time programmer. So he didn’t have the academic credentials, although he had all the best papers in the
field. I had to meet with the provost and say it’s time to promote him to full professor. The thing that clinched
it was that he was the only person that had gotten — this was 1969 — he was the only person that had been
invited to give keynote addresses in two sessions of the International Congress
in Ljubljana. In ’71. ’71, yeah. That helped. That was IFIPS. Yeah, IFIPS: Information Processing. Don, maybe we could just say a little more
about Bob [Floyd] and his life at Sanford. Right. As it turned out, when we got together
we couldn’t collaborate quite as well as when we were writing letters. I noticed this was true in
other cases. Like sometimes I could advise my students better when I was on sabbatical than when we were
having weekly meetings. It’s not easy to work face- to-face all the time, but rather sometimes offline instead
of online. I told you my experience with Marshall Hall — that I couldn’t think in his presence. I have
to confess that there are some women computer scientists that when I’m in their presence, I think only of their
brown eyes. I love their research, but I’m wired in certain ways that mean that we should write our joint papers
by mail, or by fe-mail. Anyway. We did a lot of joint work in the early ‘70s, but also
it turned out that when Bob became chair of the department… I’m not sure exactly when that was; probably right after my sabbatical. I think 1972. Yeah. I went on leave of absence for a year
in Norway and then I came back and Bob was chair of the department. He took that job extremely seriously,
and worked on it to such an extent that he couldn’t do any research very much at all during those three
or four years when he was chair. I don’t know how many years, five years. I started being chair in ’76, so like four
years. Okay, so it was four years. That included
very detailed planning all aspects of our new building. When he came back, then he had two years of sabbatical.
That’s one credit that you get. So there was a break in our joint collaboration. Afterwards, he never
quite caught up to the leading edge of the same research topics that I was in. We would work on things occasionally,
but not at all the way we had done previously. We wrote a paper that we were quite pleased with at
the end of the ‘80s, but it was not the kind of thing that we imagined originally, that would always be
in each other’s backyard. In fact, I’m a very bad coworker. You can’t count on me to do anything, because
it takes me a while to finish stuff and I think of something else. So how can anybody rely on me as being able
to go with their agenda? Bob, during the ‘70s, came up with a lot of ideas, like his method for half-tone, for
making gray-level pictures, that is in all the printers of the world now. That was done completely independently.
I didn’t even know about it until a couple years after he had come up with these inventions. But I’m
dedicating a book to Bob. My collected works are being published in eight volumes. The seventh volume is selected
papers on design of algorithms. That one is dedicated to Bob Floyd, because a lot of the joint papers,
joint work we did, occurs in that volume. He was one of the few people in my life that really I consider one
of my teachers, the gurus that inspired me. Don, I’m going to call that the end of your
first period of Stanford. I wanted to move into some questions about what I call your second Stanford
period. This is very different. I’ve sort of delineated this as a very different time. I saw you shifting
gears, and I couldn’t believe what was happening. You became, in a solitary way, the world’s greatest programmer.
It was your engineering phase. This was TeX and METAFONT. All of a sudden, you disappeared into just
miles of code, and fantastic coding ideas just pouring out, plus your engineering. We were in the new building
and you were running back and forth from your office to where this new printing machine was installed. You’d
be debugging it with your eyes and with your symbols and pulling your hair out of your head, because
it wasn’t working right, and all that. You were just what the National Academy of Engineering would call
an engineer. Tell me about that period in your life. Okay, well, it ties in with several things.
There was a year that you didn’t see me when I was up at McCarthy’s lab… Well, I heard all about it. …starting this. One of the first papers
that I collaborated with Bob Floyd on in 1970 [had] to do with avoiding go-to statements. There was a revolutionary
new way to write programs that came along in the ‘70s, called structured programming. It was a different
way than we were used to when I had done all my compilers in the ’60s. Bob and I, in a lot of our
earliest conversations at Stanford, were saying, “Let’s get on the bandwagon for this. Let’s understand structured
programming and do it right.” So one of our first papers was to do what we thought was a better approach
to this idea of structured programming than some people had been taking. Some people had misunderstood that
if you just get rid of go-to statements you had a structured program. That’s like saying zero population
growth; you have a numerical goal, but you don’t change the structure. People were figuring out a way
to write programs that were just as messy as before, but without using the word “go-to” in them. We said
no, no, no; here’s what the real issues are. Bob and I were working on this. This is going on, and we’re teaching
students how to write programs at Stanford, but we had never really written more than textbook code ourselves
in this style. Here we are, being full professors, telling people how to do it, having never done it
ourselves except in really sterile cases with not any real world constraints. I probably was itching… Thank
you for calling me the world’s greatest programmer. I was always calling myself that in my head. I love programming,
and so I loved to think that I was doing it as well as anybody. But the fact is, the new way of programming
was something that I didn’t have time to put much effort into. The emphasis in my comment was on the solitary.
You were a single programmer doing all this. No team. That’s right. As I said, it’s hard for
me to have somebody else doing the drumming. I had to march to my… I had The Art of Computer Programming,
too. I could never be a reliable part of a team that I wasn’t the head of, I guess. I did first have to
get into that mode, because I was forced to. I was chair of the committee at Stanford for our university reports.
We put out lots and lots of reports from all phases of the department through these years. We had a big
mailing list. People were also trading their reports with us. We had to have a massive bookkeeping system just
to keep the correspondence, so that the secretaries in charge of it could know who had paid for their reports,
who we were sharing with. All this administrative type of work had to be done. It seemed like just a
small matter of programming to do this. I had a grad student who volunteered to do this as his master’s project;
to write-up program that would take care of all of the administrative chores of the Stanford tech
reports distribution. He turned in his term paper and I looked at it superficially and I gave him an A on it,
and he graduated with his master’s degree. A week later, the secretary called me up and said, “Don, we’re
having a little trouble with this program. Can you take a look at it for us?” The program was running up
at the AI lab, which I hadn’t visited very often. I went up there and took a look at the program. I got to page
five of the program and I said, “Hmmm. This is interesting. Let me make a copy of this page. I’m going to
show it to my class.” [It was] the first time I saw where you change one symbol on the page and you can
make the program run 50 times faster. He had misunderstood a sorting algorithm. I thought this was great.
Then I turned to the next page and he has a searching algorithm there for binary search. I said, “Oh, he
made a very interesting error here. I’ll make a copy of this page so I can show my class next time I teach about
the wrong way to do binary search.” Then I got to page eight or nine, and I realized that the way he had written
his program was hopelessly wrong. He had written a program that would only work on the test case that
he had used in his report for the master’s thesis, that was based on a database of size three or something like
this. If you increased the database to four, all the structures would break down. It was the most weird thing.
I would never conceive of it in my life. He would assume that the whole database was being maintained by
the text editor, and the text editor would generate an index, the way the thing did. Anyway, it was completely
hopeless. There was no way to fix the program. I thought I was going to spend the weekend and give it to
the secretary on Monday and she could work on it. There was no way. I had to spend a month writing a program that
summer — I think it was probably ’75, ’76 — to cover up for my terrible error of giving this guy an A
without seeing it. The report that he had, made it look like his program was working. But it only worked on
that one case. It was really pathetic. So I said, “Okay, I’ll use structured programming. I’ll do it right.
This is my chance to do structured programming. I’ll get a learning experience out of it.” I got a good appreciation
for writing administrative-type programming. I used to think was trivial, [but] there was a lot to it.
After a month I had a structured program that would do Stanford reports, and I could install that and get
back to the rest of my life. Meanwhile, I’d been up at the AI lab and I met the people up there. I got to know
Leland Smith, who is a great musician professor. Leland Smith told me about a problem that he had. He was
typesetting music. He says, “I’ve got a piece of music and it maybe has 50 bars of music. I have to decide
when to turn the page. I know how many notes are in each bar of the music, and I know how much can fit on
the page. But I like to have the breaks come out right. Is there any algorithms that could work for this?”
He described the problem with me. He had the sequence of numbers, how many notes there are, and try to find
a way to break it into lines and pages in a decent way. I looked at the problem and said, “Hey Leland, this
is great. It’s a nice application of something we in computer science call the dynamic programming algorithm (method).
Look, here’s how dynamic programming can be used to solve this problem.” Then I’m teaching Stanford’s
problem seminar the next fall, and it came up in class. I would show the students, “Look how we had this
music problem, and we can solve it with dynamic programming.” One of the students, I don’t remember who it was,
raised his hand and said, “You know, you could also use that to text, to printing books. You could say, instead
of notes into bars, you could also say you’ve got letters and words into lines, and make paragraphs
choosing good line breaks that way.” I said, “Hey, that’s cool. You’re right.” Then comes, in the mail,
the proof sheets for the second edition of volume two. I had changed a lot of pages in volume two of The Art of
Computer Programming. I got page proofs for the new edition. During the ‘70s, printing technology changed
drastically. Printing was done with hot lead in the ‘60s, but they switched over to using film in the ‘70s.
My whole book had been completely retypeset with a different technology. The new fonts looked terrible!
The subscripts were in a different style from the large letters, for example, and the spacing was very bad.
You can look at books printed in the early ‘70s and it turns out that if it wasn’t simple — well, almost
everything looked atrocious in those days. I couldn’t stand to see my books so ugly. I spent all this time working
on it, and you can’t be proud of something that looks hopeless. I’m tearing out my hair. I went
to Boston again and they said, “Oh, well, we know these people in Poland. They can imitate the fonts that you
had in the old hot lead days. It’s probably not legal, but we can probably sneak it through without…”
You know, the copyright problems of the fonts. “They’ll try to do the best they can, and do better”. Then they
come back to me, at the beginning of ’77, with the new version done with these Polish fonts which are supposed
to solve the problem. They are just hopelessly bad. At the very same time, February of ’77, I’m on Stanford’s
comprehensive exam committee, and we’re deciding what the reading list is going to be for next year’s
comp. Pat Winston had just come out with a new book on artificial intelligence, and the proofs of it were just
being done at III Corporation [Information International, Incorporated] in Southern California; at [Ed]
Fredkin’s company. They had a new way of typesetting using lasers. All digital, all dots of ink. Instead
of photographic images and lenses, they were using algorithms, bits. I looked at these galley proofs of Winston’s
book. I knew it was just bits, but they looked gorgeous. They looked absolutely as good as anything
I’d ever seen printed by any method. By this time I was working at the AI lab, where we had the Xerox Graphics
Printer, which did bits at about 120 dots per inch. It looked interesting, but it didn’t look beautiful
by any stretch of the imagination. Here, with I think this was 1,000 dots per inch at III, you couldn’t
tell the difference. It was like: I come from Wisconsin and in Wisconsin we never eat margarine. Margarine
was illegal to bring into the State of Wisconsin unless you didn’t color it. I’m raised on butter.
It’s the same thing here. With typography, I’m thinking: okay, digital typography would have to be like margarine.
It couldn’t be the real thing. But, no! Our eyes don’t see any difference when you’ve got enough
dots to the inch. A week later, I’m flying down with Les Earnest to Southern California to III, and finding out
what’s going on there. How can we get this machine and do it? Meanwhile, I planned to have my sabbatical
year in ‘77-’78. I was going to spend my sabbatical year in Chile. Don, can I interrupt you just a second? Yeah. I don’t know if Fredkin was still involved
with III at that time. But III never gets enough credit for those really revolutionary ideas. That’s right. Not just those ideas, but the high speed graphics
ideas. Oh yeah. That’s when I met Rich Sherpel
[ph?] down there, and he was working on character recognition problems. They had been doing it actually
for a long time on microfilm, before doing Winston’s book. This was the second generation. First they had been
using the digital technology at really high resolutions on microfilm. And so many other things [were]
going on. Fredkin is a guy who– Right at the beginning, Fredkin revolutionized
film reading, using the PDP-1. Anyway, I interrupted you. You were on your Chile. Ed’s life is ten times as interesting as
mine. I’m sure that every time I hear more about Ed, it adds just another… He’s an incredible person.
We got to get 20 oral histories. I think Ed may be a subject for one of these
oral histories of the Computer History Museum. Yeah, we’ve got to do it. Anyway, I cancelled
my sabbatical plan for Chile. I wrote to them saying I’m sorry; instead of working on volume four during
my sabbatical, I’m going to work on typography. I’ve got to solve this problem of getting typesetting
right. It’s only zeros and ones. I can get those dots on the page, and I’ve got to write this program. That’s
when I became an engineer. I’m going to let you go on with this, but
I just wanted to ask a question in the middle here, just related to myself, actually. How much of this
motivation to do TeX related to your just wanting to get back to being a programmer? Life was going on in
too abstract a way, and you wanted to get back to being a programmer and learning what the problems
were, or the joy of programming. It’s a very interesting hypothesis, because
really you can see that I had this. The way I approached the CS reports problem the year before was an
indication of this; that I did want to sink my teeth into something other than a toy problem. It wasn’t real
large, but it wasn’t real small either. It’s true that I probably had this craving. But I had a stronger craving
to finish volume four. I did sincerely believe that it was only going to take me a year to do it. Maybe volume four wasn’t quite ready. Maybe… Oh, this is true. …it was still cooking. No, no, absolutely. You’re absolutely right.
In 1975 and ’76, you can check it out. Look at the Journal of the ACM. Look at the SIAM Journal on Computing.
Look at, well, there’s also SIAM Review and there’s math journals, combinatorial journals, Communications
of the ACM, for that matter. You’ll find more than half of those articles are things that belong in volume
four. People were discovering things right and left that I knew deserved to be done right in volume four.
Volume four is about combinatorial algorithms. Combinatorial algorithms was such a small topic in 1962
when I made that chapter seven of my outline that Johan Dahl asked me, when I was in Norway, “How did you ever
think of putting in a chapter about combinatorial algorithms in 1962?” I said, “Well, the only reason
was, that was the part I thought was most fun.” I really enjoy writing, like this program for Bose that I did overnight.
It was a combinatorial program. So I had to have this chapter just for fun. But there was almost
nothing known about it at the time. People will talk about combinatorial algorithms nowadays [and] they
usually use “combinatorial” in a negative way. In a pejorative sense, instead of the way I look at it. They
say, “Oh, the combinatorial is going to kill you.” “Combinatorial” means “It’s exploding.
you can’t handle it, it’s a huge problem.” The way I look at it is, combinatorial means this is where you’ve
got to use some art. You’ve got to be really skillful, because one good idea can save you six orders of magnitude
and make your program run a million times faster. People are coming up with these ideas all the time. For
me, the combinatorial explosion was the explosion of research. Not the problems exploding, but the ideas
were exploding. So there’s that much more to cover. It’s true that I also in the back of my mind I’m scared
stiff that I can’t write volume four anymore. So maybe I’m waiting for it to simmer down. Somebody did say to
me once, after I solved the problem of typesetting, maybe I would start to look at binding or something, because
I had to have some other reason [to delay]. I’ve certainly seen enough graduate student procrastinators
in my life. Maybe I was in denial. Anyway, you headed into this major engineering
problem. As far as I knew, though, it was going to
take me a year. I was going to work and I was going to enjoy having a year of writing this kind of a program.
The program was going to be just for me and my secretary, Phyllis; my super-secretary, Phyllis. I was going to
teach her how to do it. She loved to do technical typing. I could write my books and she could make them; dotting
I’s and crossing T’s and spit and polish that she did on my math papers when she always typed my math
papers.

9
Comments

Leave a Reply

Your email address will not be published. Required fields are marked *