Where's the AI

Schank, Roger C. (1991) Where's the AI? (Technical Report No. 16). Northwestern University, Institute for the Learning Sciences.

Where's the AI?

Roger C. Schank The Institute for the Learning Sciences Northwestern University

Technical Report 16, 1991

For a variety of reasons, some of which we will discuss in this paper, the newly formed Institute for the Learning Sciences has been concentrating its efforts on building high quality educational software for use in business and elementary and secondary schools. We have, in the two years we have been in operation, created quite a few prototypes that go beyond the kind of things that have traditionally been built under the heading of educational software. Behind these new designs are a number of radically different "teaching architectures" that change the nature of how students interact with computers from what we have been calling "the page-turning architecture" to architectures based on simulation-driven learning by doing, story or case based teaching, and Socratic dialogues (Schank, 1990a; Schank & Jona, 1991).

These prototypes have been very successful, and our sponsors have been, for the most part, quite impressed with the results, acknowledging that they have never seen anything quite like them before. Nevertheless, we have been plagued by a curious kind of question when we show these prototypes to certain audiences. The question we keep hearing is the title of this paper: "Where's the AI?"

At first, I found this question puzzling. Everything we do is AI. But, apparently, AI has a specific definition to many people and these programs didn't fit that definition. My concern was to figure out what this definition was. And, while we are at it, maybe it would be helpful if the field of AI itself understood the answer to this question in a more profound way. It is not that AI needs definition, it is more that AI needs substance, a distinction that I will discuss below.

Four viewpoints on AI

When someone who is not in AI asks where the AI is, what assumptions about AI are inherent in the question? There seem to be at least four prevailing viewpoints that I have to deal with, so this question assumes at least one of the following four things:

AI means magic bullets
AI means inference engines
AI means getting a machine to do something you didn't think a machine could do (the "gee whiz" view)
AI means having a machine learn

The magic bullet view of AI is this: Intelligence is actually very difficult to put into a machine since intelligence is very knowledge-dependent. Since the knowledge acquisition process is very complex, one way to address it is to finesse it. Let the machine be very efficient computationally so that it can connect things to each other without having to explicitly represent anything. In this way, the intelligence will come for free as a byproduct of unanticipated connections that the machine makes.

An alternate version of this view is that AI is something that one could, in principle, discover in one's garage. What the form of this discovery might be remains a mystery, but one would drop the discovered item or technique into the machine and it would become intelligent. This view is held, quite firmly, by many people who write me letters after having read my name in a magazine article, as well as by many venture capitalists (and possibly some connectionists).

The inference engine view of AI was brought forth by a variety of people on the West Coast (e.g., the MYCIN (Shortliffe, 1976) and DENDRAL (Buchanan & Feigenbaum, 1978) projects at Stanford and the PROSPECTOR (Duda, Gaschnig, & Hart, 1979) project at SRI). Their view, was, I suspect, an answer to this very same question. When the expert systems world began to explode, AI experts were expert at finding out what an expert knew and writing down that knowledge in rules that a machine could follow (this process came to be known as knowledge engineering (Feigenbaum, 1977)). While such "expert systems" could in fact make some interesting decisions, business types who were called upon to evaluate the potential of such systems probably asked: "Where's the AI?" The real answer was that the AI was in the ability of AI people to find out what the experts knew and represent that information in some reasonable way, but this answer would not impress a venture capitalist. There had to be something that could be labeled "AI" and that something came to be known as an "inference engine."[1] Of course, much of the AI world understood that inference was an important part of understanding, so it made sense that an expert system would need to make inferences too, but to label the "inference engine" as the AI was both misleading and irrelevant. The business world believed that it needed this label however, as we shall discuss later on.

So, it came to pass that business people who read about AI and assumed that AI and expert systems were identical, began to expect inference engines in anything they saw on a computer that "had AI in it." This left many other people in AI in a quandary as to how to explain that what they did was also AI, which led to questions like "What is AI anyway?" This question never was easy to answer without discussing the nature of intelligence, a subject best left undiscussed in a business meeting.

The next view I term "the gee whiz view." This view maintains that, for a particular task, if no machine ever did it before it must be AI. Two important types of programs to discuss within this conception of AI are optical character readers and chess playing programs. Are these AI? Today most people would say that they are not. Years ago they were.[2] What happened?

The answer is that they worked. They were AI as long as it was unclear how to make them work. When all the engineering was done, and they worked well enough to be used by real people, they ceased to be seen as AI. Why? The answer to this is threefold. First, "gee whiz" only lasts so long. After a while people get used to the idea that a computer can do something they didn't know it could do. Second, people tend to confuse getting a machine to do something intelligent with getting it to be a model of human intelligence and surely these programs aren't very intelligent in any deep sense. And, third, the bulk of the work required to transform an AI prototype into an unbreakable computer program looks a lot more like software engineering than it does like AI to the programmers who are working on the system, so it doesn't feel like you are doing AI even when you are.

The fourth view of AI is one that I myself have espoused. It is a very long term view. It says that intelligence entails learning. When your dog fails to understand his environment and improve on mistakes he has made, you refer to him as "dumb." Intelligence means getting better over time. No system that is static, that fails to change as a result of its experiences, looks very smart. Real AI means a machine that learns. The problem with this definition is that according to it, no one has actually done any AI at all, although some researchers have made some interesting attempts.[3] According to this view there is no AI, at least not yet. This means that "Where's the AI?" is a tough question to answer. Knowing something helps, Knowing a lot helps more

Let's consider the SAM/FRUMP/ATRANS experience. In 1976, a group of my students built SAM, a program that summarized, paraphrased, translated, and answered questions about newspaper stories (Cullingford, 1978, 1981). The program was very slow and cumbersome, and was capable of reading only a few stories in a few domains. Nevertheless, it was the first program to perform this task, and we were proud of it. No one asked where the AI was, since no one had ever seen anything like it. It qualified as AI by the "gee whiz" criterion.

In 1978, we attempted to improve on SAM in some ways with FRUMP (DeJong, 1979a). FRUMP was very fast in comparison to SAM. By this time we had faster machines of course, but the major speedups were accomplished by limiting inferencing, and attempting to understand only the gist of any given story. In addition, we were able to blur the distinction between inferencing and parsing in a way that made interesting theoretical claims (DeJong, 1979b).

We were very excited by FRUMP. It worked on roughly fifty different domains and succeeded on about 75% of the stories that were in those domains. The branch of the Defense Department that had sponsored this work wanted to use FRUMP for a particular task. We were put in the strange position (for a university research lab) of having to hire people to code domain knowledge into FRUMP. After a year of this, FRUMP'S robustness had only improved slightly and the sponsor was annoyed since they had actually wanted to use the program. Further, I was concerned about having to hire more and more non-AI people to make a piece of software really work. I had always thought that our research money was for supporting graduate students to do research. I was uneasy about actually trying to develop a product. And it didn't interest me all that much to try. "What did it mean to be doing AI?" I wondered. It meant building theories and programmed examples of those theories, it seemed to me, in the attempt to understand the nature of mind, as part of the ultimate quest for the learning machine of the fourth viewpoint above. In 1978, I was not interested in building products.

By 1982 though, I had become interested in trying to build AI products. Why? Probably the main thing that had changed was AI. The expert systems hype had hit and the venture capitalists were everywhere. I spoke loudly and often against this overhype and ran a panel that year at the National Conference on Artificial Intelligence (held at Carnegie-Mellon University in Pittsburgh) on the coming AI winter that was likely to result from ridiculous expectations.[4] At the same time, I had begun to think more and more about the FRUMP experience. I realized that AI people really had to produce something that worked at some point, that we couldn't simply write papers about ideas that might work or worked on a few examples and satisfied the sponsors who were pouring real money into AI. With the increase in money coming from the commercial world, the die was cast. We had to produce something then, I figured, or we might not get another chance. So, I too started a company, called Cognitive Systems, with FRUMP as the intended product.

Why FRUMP never was the product of Cognitive Systems is a business discussion not relevant here. What we did wind up building was something called ATRANS, which was a program that read international bank money transfer messages (Lytinen & Gershman, 1986). The program is working today in various international banks. It parses sentences that all have very similar content about how and where money is to be transferred from country to country. In essence ATRANS is a single domain (and a very narrow domain at that) FRUMP. A version of this program is also used by the Coast Guard. At a Defense Department meeting three years ago, ATRANS was identified as one of only three AI programs actually in use for Defense.

Why I am telling you all this? Because there is one very important fact to know about ATRANS. It took something like thirty man-years to make it work. That number is in addition to any of the SAM/FRUMP work. There is an important lesson to be learned here.

To make AI -- real AI -- programs that do something someone might want (which is, after all, the goal of AI for those who fund AI and for many of those in AI), one must do a great deal of work that doesn't look like AI. Any of Cognitive System's programmers would have been justified in complaining that they had come to work there to do AI and all they were doing was working on endless details about determining various abbreviations for bank names. They also asked, "Where's the AI?"

The lesson to be learned from ATRANS is simple enough. AI entails massive software engineering. To paraphrase Thomas Edison "AI is 1% inspiration and 99% perspiration." We will never build any real AI unless we are willing to make the tremendously complex effort involved in making sophisticated software work.

But this still doesn't answer the original question. Sponsors can still honestly inquire about where the AI is, even if it is in only 1% of the completed work.

From Three Examples to Many

The answer to "Where's the AI?" is, "it's in the size." Let me explain. For years, AI people specialized in building systems that worked on only a few examples or in a very limited domain (or micro-world). Sometimes these systems didn't really work on any examples at all, it just seemed plausible that they would work. The practice of getting AI programs to work on a few examples is so rooted in the history of AI that it is rarely discussed. There are many reasons for this, but the simplest explanation is that until the creation of the various venture capital backed companies in the early 80s, almost all AI programs were Ph.D. theses that "proved" the concept of the thesis with a few examples. It was almost never anyone's job to "finish" the thesis.[5] Often it wasn't clear what that would mean anyway since these theses were rarely directed at real problems for which there was a user waiting. In any case, even if there was a ready use for the project, no one wanted to tackle the inherently uninteresting task of doing all that software engineering -- at least no one in an AI lab wanted to do that. And, even if someone did want to do it, there was no one who wanted to pay for it or who seriously understood how much it would really cost to do the other 99%.

Nevertheless we had otherwise intelligent people claiming that Winograd's (1972) SHRDLU program that worked on 31 examples had solved the natural language problem, or that MYCIN (Shortliffe, 1976) had solved the problem of getting expertise into a computer.[6] Prior to 1982, it is safe to say that no one had really tried to build an AI program that was more than simply suggestive of what could be built. AI had a real definition then, and it was the "gee whiz" definition given above.

But underlying even that definition was the issue of "scale up." AI people always had agreed among themselves that this was the true differentiation of what was AI from what was not AI. This measure of AI was one of those things that was so clearly a defining characteristic of the field that there was no need to actually define it on paper, at least I am unaware of any specific statement of this view of what was AI in the sixties and seventies that we all adhered to. So I will say it now. And, true to form, I will say it in the form of a story:

When I arrived at Yale in 1974, there was a junior faculty member there who, I was told, was in AI. I looked into what he was doing, and decided that he was not in AI. Here was what happened. He was working on speech recognition. He had developed an algorithm for detecting which of the numbers one through ten had been spoken into a computer. As far as I could tell, the program did what it claimed to do. I asked, as any AI person would, how the program had accomplished the task. The question was not what he had done, but how he had done it. Here's why.

Suppose his program had determined which of the numbers one through ten had been said by comparing the sound that was received to a prototype of what each of the ten numbers should sound like and then determined the best match of the features of the incoming sound to the features of the prototype. This seems like a reasonable method for performing this task and it should work. Why then isn't it AI?

It isn't AI because it is unlikely that it would scale up. The key concepts here are "scale up" and "unlikely." If the problem that this man was working on was the "detection of ten numbers" problem, it really wouldn't matter how he did it. Any algorithm that worked would be terrific if someone wanted a program to do that task. But, AI has always been concerned with the solution of the deep problem behind the small problem. The issue in this part of AI was how a program would detect any word that was spoken, and the solution being proposed, in order to be an "AI solution," had to address that issue. In other words, the question that I asked myself was whether what he had done for ten words would work for any word. That is the AI question.

Now, obviously, I decided that his solution (which was indeed the one given above) would not scale up. This was an easy conclusion to come to since in order for it to scale up, he would be proposing matching any new word to a database of hundreds of thousands of prototypes. He had not even begun to think about this because he was really not an AI person in any sense of the word. He was simply working on a problem that looked like AI to an outsider.

Suppose he had been thinking about doing speech recognition as a step-by- step differentiation process. At the time, I would have said that this was silly, that what was needed was a theory of recognition of phonemes in the context of predictions derived from language understanding in real situations. This is where "unlikely" comes in. I was, after all, simply guessing about whether his system would scale up. It was a good guess, since he hadn't thought about anything except ten words and it is unlikely that his solution for ten words was going to happen to work for all words. What bothers me about this story today, is that while I would still like to see speech recognition driven by a good theory of language understanding, it is nevertheless now possible to conceive of a program that really did store all the words of English and compared one to another. Would this be AI?

Certainly there are those who would say that it was AI. Avoiding that debate for the moment, it is possible to come to a point of view that defines AI as being one of two things. Either an AI program is one based upon a theory that is likely to scale up, or it is a program based upon an algorithm that is likely to scale up. Either way, we are talking about best guesses about promises for the future.

The point here is that the AI is in the size, or at least the potential size. The curiosity here is that when the size gets big enough, this all ceases to matter. To see this we merely need to look at AI work in chess. Chess was one of the original AI problems. Getting a machine to do something that only smart people can do, seemed a good area for AI to work on. Now, many years later, you can buy a chess playing program in a toy store and no one claims that it is an AI program. What happened?

For many years the popular wisdom was that AI was a field that killed its own successes. If you couldn't do it, the wisdom went, it was AI, and when you did it, it no longer was. This would seem to be what happened in chess, but the reality is somewhat more subtle. Chess was an AI problem because it represented one form of intelligent behavior. The task in AI was to create intelligent machines, which meant having them exhibit intelligent behavior. The problem was, and is, that exactly what constitutes intelligent behavior is not exactly agreed upon. Using my "scale up" measure, the idea in looking at a chess program would have been to ask how its solution to the chess problem scaled up. Or, to put this another way, were people writing chess programs because they wanted computers to play chess well, or were they writing them because they "scaled up" to other problems of intelligent behavior?

It would seem that no one would want a chess program for any real purpose, but that, after all, is a marketing question. If the toys made money, well, fine. In that case it doesn't matter how they work -- it matters that they work. But, if we care about how they work there are only two possible questions. First we need to know if the solution to making them work tells us anything at all about human behavior. Second, we would want to know if it tells us something that we could use in any program that did something similar to, but more general than, playing chess.

Brute force high speed search through a table of possible pre-stored moves is unlikely to be anything like the human method for chess playing. Chess players do seem to understand the game they are playing and are able to explain strategies, and so on, after all. So, the only other question is whether one could use those same methods to help tell us something about how to solve problems in general. In fact, the original motivation to work on chess in AI was bound up with the idea of a general problem solver (e.g., Newell & Simon's (1963) GPS system). The difficulty is that what was learned from that work was that people are really specific problem solvers more than they are general problem solvers, and that the real generalizations to be found were in how knowledge is to be represented and applied in specific situations. Brute force chess programs shed no light on that issue at all, and thus are usually deemed not to be AI.

Thus, the scale up problem can refer to scale up within a domain as well as to scale up in the greater domains that naturally embody smaller domains. But the chess work didn't scale up at all, so the reasonableness of doing such work is simply a question of whether this work was needed by anybody. If there had been a need for chess programs then chess would have been seen as a success of AI, but probably not as actual AI. In the end, AI was never supposed to be about need, however. In the 1960s and 1970s, most AI people didn't care if someone wanted a particular program or not. AI was research.

The correct AI question had to do with the generality of a solution to a problem and there was a very good reason for this. It is trivial to build a program to do what, say, Winograd's (1972) SHRDLU program did for 31 sentences. Just match 31 strings with 31 behaviors. It would take a day to program. People believed that Winograd's program was an AI program because they believed that his program "did it right." They believed it would scale up. They believed that it would work on more than 31 sentences. (In fact, so did he. See Winograd, 1973). When I was asked, at the time, my opinion of Winograd's work, I replied that it would never work on a substantially larger number of sentences nor would it work in different domains than the one for which it was designed. I did not reply that his program was not AI, however.

The fact that a program does not scale up does not necessarily disqualify it from being AI. The ideas in Winograd's program were AI ideas, they just weren't correct AI ideas in my opinion. What then, does it mean for a program to have AI ideas within it? This is, after all, a key question in our search to find the location of the AI.

So, to summarize the argument so far, an AI program exhibits intelligent behavior, but a non-AI program could as well. An AI program should scale up, but many do not. And, an AI program has AI ideas in it. Further, an AI program is not intended to accomplish a particular task, but rather to help shed light on solutions for a set of tasks. This was, more or less, the standard view of AI within AI, prior to 1980.

The great expert system shell game

At the beginning of the last section, I said that the answer to the title question was in the size. When building FRUMP, we realized that the difference between FRUMP and something somebody wanted to actually use involved a great deal of engineering of some rather dull information. To build ATRANS we had to bite that bullet. Little by little it became clear to me that in order to build an AI program that someone wanted, an idea that was virtually a contradiction in terms in the seventies, someone would have to stuff a machine with a great deal of knowledge. Smart is nice, but ignorant tends to obviate smart. Gradually it was becoming clear that AI would have to actually work and that making it work might mean paradoxically, using our own definitions, not doing AI at all.

This was all brought to a head by the great expert system shell game. When the venture capitalists discovered AI, they brought more than just money to the table. They also brought with them the concept of a product. Now, it took me a long time to understand what the word product was supposed to mean, so don't assume (if you are in AI) that you understand it. In fact, I am still not sure I understand it in a profound way. I thought, for example, that a conceptual parser (e.g., ELI (Riesbeck & Schank, 1976) or CA (Birnbaum & Selfridge, 1979)) might be a product. It isn't. I thought that FRUMP might be a product. It wasn't. An expert system shell is a product. Unfortunately, it's not always a very profitable one (which is, not surprisingly, the standard measure of goodness). I don't intend to explain here what a product is. What is important to understand is why the venture capitalists insisted that their AI people build expert system shells.

When the expert systems folks went into business I was skeptical that they could build anything that anyone really wanted. After all, no one had ever done that in AI before, and, as I have said, it was kind of out of the bounds of AI to even try. But, I figured, they were smart people, and quite good with LISP (the programming language used by many AI researchers), and maybe they could pull it off. When I heard about the plan that the venture people had for them, however, I knew they were doomed.

The plan went like this. If the expert systems people were given a hard problem, enough of them could build a program that could solve that problem, within reasonable constraints. But, no one solution would have been a product. For a certain amount of money, they could build a program to solve problem A, but they wouldn't be a great deal closer to solving problem B. No venture capitalist would ever invest in a custom software development company. They want a company that makes one thing and then sells that thing 100,000 times. A company that makes one thing and sells it once isn't much of an investment. What the expert systems people knew how to do was build custom software. But there isn't much money in doing that. What they were asked to do instead was build a shell, an environment for developing software. This made sense to the venture guys, and the AI guys had to go along. The problem was, while the shell might be great to sell, first, it would be wrong, and second, where was the AI?

Addressing the first issue first, why would it be wrong? The assumption of the venture capitalists was that, given the right tool, any programmer could build an expert system. This was a marketable idea, so it worked in their terms. Of course, it wasn't an idea that had a lot of reality in it. Building a complex AI system is certainly made easier by having a good environment for programming, but knowing what to program is the real issue. So one is left addressing the second issue, namely, where is the AI? This was addressed from a business point of view with the concept of an "inference engine." The idea was that there was a piece of magic that was the AI, and that, that magic, plus a software development environment that made it easy to build these things, was salable. And it was, at least initially. The problem was that very little of a really complex nature could be built with these shells. Or, in other words, a programming environment plus an inference engine doesn't comprise all there is to building an expert system, which may not have been the thing to be building in the first place.

Where was the AI? It wasn't in the inference engine at all. These inference engines were, after all, pretty simple pieces of software that tested to see if the logic of the rules that the knowledge engineers wrote came up with any conclusions. The AI in complex expert systems was in the organization and representation of knowledge, the attempt to understand the domain under study, and to crystallize what was important in that domain and how experts in that domain reasoned. Now I was saying at that time (see, e.g., Schank et al., 1977), that the AI was also in collecting the actual experiences of the experts and indexing them so that reminding and hence learning could take place, but the expert systems folks were in no mood to pay attention. Those that did were usually not involved in the shell game. To put this another way, the AI was where it always was, in the attempt to understand the intelligent behavior in the system being modeled.

The fact that no expert ever experiences anything in his domain of expertise without learning something from the experience and changing in some way was easily ignored.[7] The hope was that static systems would model an expert's reasoning ability at any moment. But experts don't remain experts for long if all they do is blindly apply rules. The very act of creating a shell with which non-AI people could write rules to be run by an inference engine, was, in a sense, an act of not doing AI. What had been left out was the very skill that AI people had been learning all this time, the skill of figuring out what was going on in a domain and getting a machine to model the human behavior that occurred in that domain. What had been left out were the AI ideas.

Some companies tried to remedy this situation by training the users of these new shells to become "knowledge engineers." But AI is kind of a funny subject. For years we have been taking graduates from the best institutions in the country and teaching them AI. Quite often, even after a few years, they just don't get it. What don't they get? They don't get what the issues are, how to attack an issue, how to have an idea about an unsolved problem, and how to build programs that embody these ideas. AI is a way of looking at very complex problems, and often it is quite difficult to learn to do. It seemed to me that it was hopeless to try to teach this to programmers in a few weeks when years often didn't help.

So, my claim is that while some expert systems might well have been AI, few of those built with inference engines were likely to be, unless one changed the then current definition of AI. The change is easy enough to describe. AI had been the creation of new ideas about how to represent and use complex knowledge on a computer. Now, the definition was changing towards AI being programs that utilized these AI ideas in some application that someone wanted.

Where was the AI in the expert system shells? It was in the assertion that rules would effectively model expertise and in the programs that attempted to implement that assertion. The only problem was that for complex domains -- that is, for AI-like domains of inquiry -- the assertion was wrong.

The larger the better

One of the real issues in AI, as I mentioned earlier, is size. When we talk about "scale up," we are, of course, talking about working on more than a few examples. What every AI person knows is that a program that works on five examples is probably not one-tenth the size of one that works on fifty. Outsiders imagine that it is maybe the same size or, possibly one-half the size. But what is really the case is that for real domains, the size changes work the other way. Once you have to account for all the myriad possibilities, the complexity is phenomenal. It is critical, if AI is to mean applications of AI ideas rather than simply the creation of those ideas, that size issues be attacked seriously. No one needs a program that does five examples. This worked in 1970 because AI was new and glossy. It will not work any longer. Too much money has been spent. AI has to dive headlong into size issues.

Now, as it turns out, while this may seem to be annoying and possibly dull as dust, the truth is that it is size that is at the core of human intelligence. In my recent book, Tell Me a Story (Schank, 1990b), I argue that people are really best seen as story telling machines, ready to tell you their favorite story at a moment's notice. People rarely say anything that is very new or something they have never said before. Looked at in this way, conversation is dependent on the art of indexing, of finding the right thing to say at the right time [8] . This is a pretty trivial problem for someone who only has three stories to tell. Much like our speech recognition friend above, it is simply a problem of differentiation. Many a grandfather has survived many a conversation on the same few stories.

But, when the numbers get into the thousands, one has to be clever about finding, and finding quickly, germane stories to tell. We cannot even begin to attack this problem until the numbers are large enough. In order to get machines to be intelligent they must be able to access and modify a tremendously large knowledge base. There is no intelligence without real, and changeable, knowledge.

And this thought, of course, brings us back to the original question. If a system is small, can there be AI in it? In the seventies, AI systems were all, in essence, promises for the future. They were promises about potential scale up, promises that the theoretical basis of the program was sound enough to allow scale up. Now, the question is, how big is big enough to declare a system an AI system?

It is fairly clear that while this is an important question, it is rather difficult to answer. This brings us back to the implicit issue in all this, the concept of an AI idea. When I said earlier that certain programs have AI ideas within them, I was declaring that even programs that did not work very well, and never did scale up, were AI programs. What could this mean?

AI is about the representation of knowledge. Even a small functioning computer program that someone actually wanted could be an AI program if it was based on AI ideas. If issues of representation of knowledge were addressed in some coherent fashion within a given program, AI people could claim that it was an AI program. But, the key point is that this question, "Where's the AI?" is never asked by AI people, it is asked by others who are viewing a program that has been created in an AI lab. And the important point is that to these people, it simply shouldn't matter. The answer to the question about where the AI is to be found in a program that does a job that someone wanted to do, is that the AI was in the thinking of the program's designers and is represented in some way in the programmer's code. However, if the reason that they wanted this program in some way depends upon the answer to this question, that is, if they wanted "AI" in the program they were sponsoring, they are likely to be rather disappointed.

Five issues to think about before you try to do real AI

If this answer to the question about where the AI is makes no sense to an outsider, as it surely will not, I hope that it makes sense to people who are in AI. AI is in a rather weird state these days. AI people are hoping to live the way they lived in the seventies, but, for a variety of reasons, those days are over. We cannot continue to build programs that we hope will scale up. We must scale them up ourselves.

It may be that one can argue that we are not ready to face the "scale up" issue just yet, that the fundamental problems have not been solved, that we don't know all there is to know about the mind and how to model it. This seems fair enough, not to mention true enough. Nevertheless, due to the realities of the nineties, we must give it a try. There are things we can do, and there are some very good reasons to try. For one thing, sponsors will expect it. For another thing, the problems of AI demand it -- we simply must start to look at the scale up problems for the sound theoretical reason that these considerations will force us to address many real AI problems. But, I think the most important reason is that this is where the action has to take place. The sheer amount of difficulty present in the creation of a functioning piece of software from a prototype that worked on a few examples is frightening. We simply must learn how to deal with these issues or there never will be any AI. AI people cannot keep expecting that non-AI people will somehow magically turn their prototypes into reality. This simply will never happen. The worst effect of the shell game is that it got AI people believing what venture capitalists wanted to believe.

If you buy what I am saying, then the following five issues represent some practical problems that must be faced before you do any real (scaled up) AI:

1. Real problems are needed for prototyping. We cannot keep working in toy domains. Real problems identify real users with real needs. This changes what the interactions with the program will be considerably and must be part of the original design.

2. Real knowledge that real domain experts have must be found and stored. This does not mean interviewing them and asking for the rules that they use and ignoring everything else that fails to fit. Real experts have real experiences, contradictory viewpoints, exceptions, confusions, and the ability to have an intuitive feel for a problem. Getting at these issues is critical. It is possible to build interesting systems that do not know what they know. Expertise can be captured in video, stored and indexed in a sound way, and retrieved without having to fully represent the content of that expertise (e.g., the ASK TOM system (Schank, Ferguson, Birnbaum, Barger, Greising, 1991). Such a system would be full of AI ideas, interesting to interact with, and not wholly intelligent but a far sight better than systems that did not have such knowledge available.

3. Software engineering is harder than you think. I can't emphasize strongly enough how true this is. AI had better deal with the problem.

4. Everyone wants to do research. One serious problem in AI these days is that we keep producing researchers instead of builders. Every new Ph.D. receipient, it seems, wants to continue to work on some obscure small problem whose solution will benefit some mythical program that no one will ever write. We are in danger of creating a generation of computationally sophisticated philosophers. They will have all the usefulness and employability of philosophers as well.

5. All that matters is tool building. This may seem like an odd statement considering my comments about the expert system shell game. However, ultimately we will not be able to build each new AI system from scratch. When we start to build useful systems the second one should be easier to build than the first, and we should be able to train non-AI experts to build them. This doesn't mean that these tools will allow everyone to do AI on their personal computers. It does mean that certain standard architectures should evolve for capturing and finding knowledge. From that point of view the shell game people were right, they just put the wrong stuff in the shell. The shell should have had expert knowledge about various domains in it, available to make the next system in that domain that much easier to build.

These five issues are real and important to think about. They are practical points, not theoretical ones. A little practicality may help the field get to the next level of theory.

OK, but what do you really think?

AI depends upon computers that have real knowledge in them. This means that the crux of AI is in the representation of that knowledge, the content-based indexing of that knowledge, and the adaptation and modification of that knowledge through the exercise of that knowledge.

What I really think is that case-based reasoning (Riesbeck & Schank, 1989; Jona & Kolodner, 1991) is a much more promising area than expert systems ever were and that within the area of case-based reasoning the most useful and important (and maybe even somewhat easier) area to work in is case-based teaching. Building real, large, case bases, and then using them as a means by which users of a system can learn, is a problem we can attack now that has enormous import for both AI and the users of such systems.

Case-based teaching depends upon solving the following problems:

indexing memory chunks
setting up tasks based on indexing
matching student state to an index
anticipating the next question
knowledge navigation
problem cascades
Socratic teaching
button based interaction

I will not bother to explain each of these as I have done so elsewhere (Schank, 1991). The major point is that even though getting machines to tell what they know at relevant times is a simpler form of AI than the full blown AI problem, it is not simple. Even the kind of practical, developmental form of AI that I am proposing is full of enough complex problems to keep many a theoretician busy. I would just like theoretically minded AI people to stop counting angels on the head of a pin.

So, where is the AI? It is in the size, in the ideas, and in the understanding of what is significant that contributes to the behavior of intelligent beings.

References

Birnbaum, L., & Selfridge, M. (1979). Problems in conceptual analysis of natural language. Technical Report #168, Computer Science Department, Yale University, New Haven, CT.

Buchanan, B.G., & Feigenbaum, E.A. (1978). DENDRAL and Meta-DENDRAL: Their applications dimension. Journal of Artificial Intelligence, 11, 5-24.

Carbonell, J.G. (1986). Derivational analogy: A theory of reconstructive problem solving and expertise acquisition. In R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.), Machine learning: An artificial intelligence approach, Volume II (pp. 371--392). Los Altos, CA: Morgan Kaufmann.

Carbonell, J.G., & Gil, Y. (1990). Learning by experimentation: The operator refinement method. In Y. Kordratoff & R. Michalski (Eds.), Machine learning: An artificial intelligence approach, Volume III (pp. 191--213). San Mateo, CA: Morgan Kaufmann.

Cullingford, R. (1978). Script application: Computer understanding of newspaper stories. Ph.D. Thesis. Technical Report #116, Computer Science Department, Yale University, New Haven, CT.

Cullingford, R. (1981). SAM. In R.C. Schank & C. K. Riesbeck (Eds.), Inside Computer Understanding, (pp. 75--119). Hillsdale, NJ: Lawrence Erlbaum.

Davis, R., & King, J.J. (1977). An overview of production systems. In E. Elcock & D. Michie (Eds.), Machine intelligence 8 (pp. 300--332). Chinchester, England: Ellis Horwood.

DeJong, G.F. (1979a). Skimming stories in real time: An experiment in intergrated understanding. Ph.D. Thesis. Technical Report #158, Computer Science Department, Yale University, New Haven, CT.

DeJong, G.F. (1979b). Prediction and substantiation: a new approach to natural language processing. Cognitive Science, 3, 251--273.

DeJong, G., & Mooney, R. (1986). Explanation-Based Learning: An Alternative View. Machine Learning, 1, 145-176.

Dreyfus, H.L. (1979). What computers can't do: The limits of artificial intelligence (Revised edition). New York: Harper and Row.

Duda, R., Graschnig, J., & Hart, P.E. (1979). Model design in the PROSPECTOR consultant system for mineral exploration. In D. Michie (Ed.), Expert systems in the micro-electronic age (pp. 153-167). Edinburgh: Edinburgh University Press.

Feigenbaum, E.A. (1977). The art of artificial intelligence: Themes and case studies of knowledge engineering. In Proceedings Fifth International Joint Conference on Artificial Intelligence (pp. 1014-1029). Los Altos, CA: Morgan Kaufmann.

Hammond, K.J. (1989). Case-based planning: Viewing planning as a memory task. Boston, MA: Academic Press.

Hull, J.J. (1987). Character recognition: The reading of text by computer. In S.C. Shapiro (Ed.), Encyclopedia of Artificial Intelligence, 1ed, (pp. 82--88). New York: John Wiley & Sons.

Goel, V., & Pirolli, P. (1989). Motivating the Notion of Generic Design within Information-Processing Theory: The Design Problem Space. AI Magazine, 10(1), 18--36.

Jona, M.Y., & Kolodner, J.L. (1991). Case-based reasoning. In Encyclopedia of Artificial Intelligence, 2ed. New York: John Wiley & Sons.

Klein, G.A., & Calderwood, R. (1988). How do people use analogues to make decisions? In J. Kolodner (Ed.), Proceedings: Case-Based Reasoning Workshop (DARPA), (pp. 209--218). San Mateo, CA: Morgan Kaufmann.

Lancaster, J. S., & Kolodner, J. L. (1988). Varieties of learning from problem solving experience. In Proceedings of the Tenth Annual Conference of the Cognitive Science Society, (pp. 447--453). Hillsdale, NJ: Lawrence Erlbaum.

Lenat, D.B. (1983). The role of heuristics in learning by discovery: Three case studies. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (pp. 243-306). Palo Alto, CA: Tioga.

Lytinen, S.L., & Gershman, A. (1986). ATRANS: Automatic Processing of Money Transfer Messages. Proceedings Fifth National Conference on Artificial Intelligence, (pp. 1089--1093). Los Altos, CA: Morgan Kaufmann.

McDermott, D. (1981). Artificial intelligence meets natural stupidity. In J. Haugeland (Ed.), Mind Design, (pp. 143--160). Montgomery, VT: Bradford Books.

McDermott, D., Waldrop, M.M., Schank, R., Chandrasekaran, B., & McDermott, J. (1985, Fall). The dark ages of AI: A panel dicussion at AAAI-84. The AI Magazine, 6(3), 122--134.

Mitchell, T.M. (1982). Generalization as search. Artificial Intelligence, 18, 203-226.

Mitchell, T.M., Keller, R.M., & Kedar-Cabelli, S.T. (1986). Explanation-based generalization: A unifying view. Machine Learning, 1, 47-80.

Newell, A., & Simon, H.A. (1963). GPS, a program that simulates human thought. In E.A. Feigenbaum & J. Feldman (Eds.), Computers and thought, (279--293). New York: McGraw-Hill.

Newell, A., Shaw, J.C., & Simon, H.A. (1963). Chess-playing programs and the problem of complexity. In E.A. Feigenbaum & J. Feldman (Eds.), Computers and thought, (pp. 39--70). New York: McGraw-Hill.

Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1, 81-106.

Redmond, M. (1989). Combining explanation types for learning by understanding instructional examples. Proceedings of the Eleventh Annual Conference of the Cognitive Science Society, (pp. 147--154). Hillsdale, NJ: Lawrence Erlbaum.

Riesbeck, C., & Schank, R.C. (1976). Comprehension by computer: Expectation-based analysis of sentences in context. In W.J.M. Levelt & G.B. Flores d'Arcais (Eds.), Studies in the perception of language (247--294). Chinchester, England: John Wiley & Sons.

Riesbeck, C.K., & Schank, R.C. (1989). Inside case-based reasoning. Hillsdale, NJ: Lawrence Erlbaum.

Schank, R.C. (1990a). Teaching Architectures. Technical Report #3, The Institute for the Learning Sciences, Northwestern University, Evanston, IL.

Schank, R.C. (1990b). Tell me a story: A new look at real and artificial memory. New York: Charles Scribner's Sons.

Schank, R.C. (1991). Case-based teaching: Four experiences in educational software design. Technical Report #7, The Institute for the Learning Sciences, Northwestern University, Evanston, IL.

Schank, R.C. et al. (1977). Panel on natural language processing. Proceedings Fifth International Joint Conference on Artificial Intelligence (pp. 1007--1008). Los Altos, CA: Morgan Kaufmann.

Schank, R.C., Ferguson, W., Birnbaum, L., Barger, J., & Greising, M. (1991). ASK TOM: An experimental interface for video case libraries. Technical Report #10, The Institute for the Learning Sciences, Northwestern University, Evanston, IL.

Schank, R.C., Osgood, R. et al (1990). A content theory of memory indexing. Technical Report #2, The Institute for the Learning Sciences, Northwestern University, Evanston, IL.

Schank, R.C., & Jona, M.Y. (1991). Empowering the student: New perspectives on the design of teaching systems. The Journal of the Learning Sciences, 1, 7-35.

Shortliffe, E.H. (1976). Computer-based medical consultations: MYCIN. New York: Elsevier-North Holland.

Sussman, G.J. (1975). A computer model of skill acquisition. New York: American Elsevier.

Waterman, D.A., & Hayes-Roth, F. (Eds.) (1978). Pattern-directed inference systems. New York: Academic Press.

Winograd, T. (1972). Understanding natural language. New York: Academic Press.

Winograd, T. (1973). A procedural model of language understanding. In R. Schank & K. Colby (Eds.), Computer models of thought and language, (152--186). San Francisco: W.H. Freeman.

[1] Which is just another name for a production system, see Davis & King (1977) and Waterman & Hayes-Roth (1978).

[2] For example, in Computers and Thought, one of the early seminal AI books, there appears a chapter entitled "Chess-playing programs and the problem of complexity" (Newell, Shaw, & Simon, 1963). In the entry for optical character recognition in the Encyclopedia of Artificial Intelligence, Hull (1987) states: "The reading of text by computer has been an AI topic for more than 25 years" (p. 82).

[3] For example, Carbonell (1986), Carbonell & Gil (1990), DeJong & Mooney (1986), Hammond (1989), Lenat (1983), Mitchell (1982, 1986), Quinlan (1986), and Sussman (1975).

[4] Another panel on this same subject which I also participated in, entitled "The Dark Ages of AI," was held two years later at the1984 National Conference on Artificial Intelligence. For a transcript, see McDermott et al. (1985).

[5] See McDermott's (1981) essay lamenting this fact and its effect on the progress of AI.

[6] Dreyfus (1979) criticizes these and other overly ambitious claims of progress made by AI researchers.

[7] Now we have evidence that experts do indeed learn from and use their experiences in their daily reasoning and decision-making (see, e.g., Klein & Claderwood, 1988). For example, both expert and novice car mechanics were found to use their past experiences to help generate hypothesis about what kind of problem a car might have (Lancaster & Kolodner, 1988; Redmond 1989). Architects and mechanical engineers were observed using old design plans while creating new ones (Goel & Pirolli, 1989).

[8] For further discussion on indexing, and the indexing problem, see Jona & Kolodner (1991), Riesbeck & Schank (1989), and Schank et al. (1990).