Of Minds and Language (69 page)

Read Of Minds and Language Online

Authors: Pello Juan; Salaburu Massimo; Uriagereka Piattelli-Palmarini

BOOK: Of Minds and Language
12.56Mb size Format: txt, pdf, ePub

The next question is the one that Janet brought up and is about the computational system. Here some clarification should be made: there's a lot of talk about recursion and it's not a mystical notion; all it means is discrete infinity. If you've got discrete infinity, you've got recursion. There are many different ways of characterizing that step, but they are all some sort of recursive operation. Recursion means a lot more than that, but that's the minimum it means. There are different kinds of recursion – partial recursive, general recursive – but we don't need to worry about them. This core notion is that if you have a discrete infinity you have to have some device to enumerate them, and in the case of language, what are the objects that you want to enumerate? Here there's confusion and it leads to trouble. From the start, say the early 1950s, all of us involved in this took for granted that the objects you want to enumerate are expressions, where expressions are structured objects. So an expression is something that has a hierarchy, has interrelations, and so on. And that's illustrated by the example that I gave earlier here to begin with. If you take the sentence:

(1) Mary saw the man leaving the store.

it's three expressions, not one expression. There are three structural interpretations that give you three semantic interpretations, and they separate when you raise the wh- word; you only get one of them. Just about every sentence is like that. There is a string formula, which is just the sequence of those words (Mary-saw-the-man-leaving-the-store), but that's a very impoverished version of the three expressions.

If we talk about generation of language, there are two kinds of generation: one is called strong generation, where you generate the expression including the objects with their structures, and that yields the meaning and gives the basis for semantic and phonetic interpretation; and there's weak generation, where you just generate the string. Weak generation has no clear meaning; strong generation has a clear meaning – it's a biological phenomenon. There is a class of structured expressions and you can figure out what they are. Weak generation is highly problematic. First of all there's no obvious interest: there's no reason why we should be interested in an impoverished version of the object that's generated. It's uninteresting as well as unclear what the class is; you can draw the boundaries anywhere you want. We talk about grammatical and ungrammatical but that's just an intuitive distinction and there's no particular reason for this; normal speech goes way beyond this. Often the right thing to say goes outside of it; there are famous cases of this, like Thorstein Veblen, a political
economist who was deeply critical of the consumer culture that was developing a century ago, who once said that culture is indoctrinating people into “performing leisure.”
11
We understand what that means and that's just the right way to say it but from some point of view it's ungrammatical. Rebecca West was once asked some question about possible worlds, and she said that “one of the damn thing is enough.”
12
We know exactly what that means and that it's a gross violation of some grammatical rule, but there isn't any better way of saying it; that's the right way to say it.

Are such things proper strings or not proper strings? Well, it makes no difference what answer you give – they are what they are, they have the interpretation they have – it's given to you by the linguistic faculty, but they're all strongly generated. That's the only real answer.

This should be straightforward but it got to be problematic because it was intermingled with philosophical interpretations. This mostly comes from Quine, who insisted from the late 1940s that language has to be a class of well-formed formulas.
13
It's obvious that the model is formal systems, but in the case of a formal system it's unproblematic because you stipulate the well-formed formulas; you have a mode of generating the well-formed formulas and that stipulates them. And that's why Quine, for example, raised questions about the reality of phrase structure. He denied it because he said that if you have noun-verb-noun you could break it up to noun-verb versus noun or noun versus verb-noun, so it's all arbitrary and you have no evidence and so on. That's what the issue looks like if you formulate it as generating well-formed formulas or strings, but that doesn't make any sense. You're generating structures; the structure N versus VP is different from the structure NV versus object and you have empirical evidence to tell you which it is. This doesn't make it definitive, but the same is true in physics: nothing's definitive, it's just evidence.

The other problem that led to massive confusion about this, which goes on right until today (and is related to things we've talked about earlier in this
conference), is that a sort of mathematical theory came along for trying to select properties of these generative systems. That's Phrase Structure Grammar (PSG), and that theory made sense in the early 1950s. For one thing it made mathematical sense because it was an adaptation of more general ideas. At that time it had come to be understood that there were a number of ways of characterizing recursive functions (theory of algorithms): Turing machine theory, Church's lambda calculus, and others. All tried to capture the notion of mechanical procedure, and they were shown to be equivalent, so it was assumed – the assumption is called Church's thesis
14
– that there is only one such system and umpteen notations. One of the notations, by a logician named Emil Post,
15
happened to be very well adapted to looking at the specific properties of language. If you took Post's system, and you started putting some conditions on it, you got context-sensitive grammars, context-free grammars, and finite automata. Within the framework of the theory of algorithms, you did very naturally get these three levels. Why look at those three levels? It's just pure mathematics, you can get anything you want. But why look at these three levels? Because they captured some properties of language. Context-free grammars did capture the property of nested dependency (the third example that came up in the discussion of Angela Friederici's presentation – see page 191), and that's just a fact of that language. So if you look at agreement:

(2) The men are tall.

and you put another sentence in between:

(3) The men who John sees are tall.

you have agreement on the inner one nested within agreement on the outer one. You can't say:

(4) *The men who John see is tall.

There's no obvious reason for that; for parsing it would be great. The first noun comes along, you take the first verb and that would work great. But it's so
impossible it's almost impossible to say it. And all over language you find case after case of nesting: things stuck inside other things. And that property is captured by a context-free phrase structure grammar; a context-sensitive, a richer one, does it with contextual conditions. So that looked like it was capturing something about language, which is what you want from a mathematical model. A mathematical model doesn't capture the system, it just captures some properties of it.

The reason for going down to finite automata was just because they were fashionable, so fashionable in fact that they were taken to be universal. What was taken to be universal was a very particular subcase of finite automata, namely the ABAB type that Angela talked about, very local finite automata where you don't use the full capacities. The finite automata that I mentioned do allow unbounded dependencies, but nobody ever looked at them that way because the background was associationism (associating adjacent things), so nobody looked at the kind of finite automata which did yield unbounded dependencies. These narrow ones, which if you add probabilities to you get Markov sources, were taken to be universals for behavior altogether, so it was worth taking a look at those. That's the motivation for this hierarchy but no more than that, and one shouldn't be misled by it. A phrase structure grammar strongly generates a structure, such that you get the hierarchy and different levels and so on, but you can say that it weakly generates the things at the bottom, the terminal elements; it's not interesting, but you can say it. However, weak generation turns out to be mathematically feasible; strong generation is mathematically unfeasible as it's too complicated. Then comes the whole field of mathematical linguistics (Automata theory and so on), ending up being a small branch of mathematics, which completely studies weak generation; all the theorems and everything else are weak generation. I worked in it too, mainly because it was fun, but it had no linguistic significance as far as I could see.

In fact, of all the work in mathematical linguistics, I know of only one theorem that has any linguistic significance and that's about strong generation: it's a theorem that says that context-free grammars are strongly equivalent to nondeterministic push-down storage automata.
16
That's a theorem and it actually has meaning, because just about every parsing system is a push-down storage automata, and it says that there's an equivalence between these and context-free grammars. If you take a look at parsing systems they're using variants of that theorem. It is a very uninteresting mathematical theorem, so if you look at books on mathematical linguistics or computer science, they'll have
a lot of stuff about context-free grammars and so on, but they'll never mention that theorem, which is the only interesting one for linguistics.
17

All of that has again been misleading. We can get back to the starlings and you can see this. These systems (context-free grammar and finite automata) were there for a reason, but in between these, there's any number of other possible systems. One of the systems that is in between finite automata and context-free grammars is finite automata with counters. That's one of the systems that is between these two levels, but there's no point describing it. For one thing, it has no mathematical motivation and it has no empirical motivation – people don't use it, so who cares? But it's there.

When you look at the starling experiment (Gentner et al. 2006), there's every indication that this is exactly what they're using: they're just counting. What the experiment shows is that the starlings can count to three, which doesn't seem very surprising. Randy Gallistel was telling us about jays that can count to many thousands (see page 61), and if I remember correctly there was work by Ivo Kohler back around 1940, who had jays counting up to seven (if you put seven dots they'll go to the seventh cup, or something like that). And my suspicion is that if the starling people pursued their experiment a little further, they'd get up to close to seven. And there's a good reason for that: it was shown by George Miller fifty years ago in his famous paper called “The magical number seven, plus or minus two.”
18
He covered a lot of literature across species, and it turns out that short term memory is right about that range. If they do this experiment they'll probably find the same thing: the starlings will get up to five, or eight, or something like that. They think it's a context-free grammar because it's above the finite automata level in that hierarchy, but that doesn't tell you anything.

It does in the third example that came up in the discussion of Angela's presentation: when you get nesting, then you're using the properties of context-free grammar. And if you experiment with human processing by reading people sentences with nestings, you can get up to about seven before the capacity breaks down. George Miller and I wrote about this in the Handbook of Mathematical Psychology in the early 1960s.
19
We didn't do the experiments, we just tried it out on ourselves: you can force yourself up to about seven and it's still intelligible, but beyond that it's just totally unintelligible. You know
you have the capacity because if you add time and space – like a Turing machine – then you get the right answer. For example, in your head you can multiply up to, say, 23. That doesn't mean you don't know the rules of multiplication beyond that; you just need extra time and memory. Then the same rules apply. So you know it all but you can't use it beyond that. So any simple kind of performance experiment is probably not going to distinguish humans from animals. If you can get some animal that can do nested dependencies, you're not going to be able to show the difference between them and humans by elementary performance experiments, even though the difference almost certainly is there.

This is something that has been totally missed in the connectionist literature. One of the most quoted connectionist papers is Jeffrey Elman's work on how you can get two nested dependencies.
20
This is true, you can write a program that will do that. But Elman's program totally breaks down when you get to three, and you have to rewrite the whole program. In Turing machine terms, the control unit has to be totally changed, which means you're not capturing the rules. And to make things worse, his approach also works for crossing dependencies, so in the case of the example earlier:

(4) *The men who John see is tall.

it works just as well for those. It's not capturing the dependencies, it's just using brute force to go up to two things that you remembered. And that kind of work is never going to get anywhere. There's no point modeling performance that is bounded by time and space, just as you can't study arithmetic knowledge that way.

A last point about this: if you look at
Logical Structure of Linguistic Theory
, the 1955 manuscript of mine
21
(it was written just for friends, not for publication; you couldn't publish in this field then as it didn't exist), it was supposed to be about language and there was no mention of finite automata at all because they just don't capture enough to be worth mentioning. There is an argument against phrase structure grammar, but it's not an argument based on impossibility, like you can give for finite automata; it's an argument based on being wrong. It just gives the wrong results because it doesn't express the natural relationships or capture the principles. And that's still the main argument against it.

Other books

Victorious by M.S. Force
Dark Banquet by Bill Schutt
Crazybone by Bill Pronzini
Mind Blind by Lari Don
Juice by Stephen Becker
Killing Reagan by Bill O'Reilly
The Shining Stallion by Terri Farley
The Sunday Hangman by James Mcclure
Finding Miracles by Julia Alvarez