Of Minds and Language (48 page)

Authors: Pello Juan; Salaburu Massimo; Uriagereka Piattelli-Palmarini

BOOK: Of Minds and Language

8.97Mb size Format: txt, pdf, ePub

(3) When there is a choice to be made between grammars that are both (all) compatible with the available input sample, and the language licensed by one is a proper subset of the language licensed by the other, do not adopt the superset language.
⁴

SP is essential for learning without negative data. Without it, incurable overgeneration errors could occur. So it is evident that learners have some effective way of applying it. Our job is to find out how they do it â or even how they
might
do it, overcoming the technical snags that evaluation seems to face.

17.5 Enumeration of grammars

To get started, I must take you on another historical detour back to the 1960s. The work of Gold (1967) provides a straightforward and guaranteed solution to the problem of applying SP. Gold, a mathematical learning theorist, was not concerned with psychological reality, and you may well find his approach hopelessly clunky from a psychological point of view. Certainly it has not been taken seriously in any treatment of SP with psycholinguistic aspirations. But since it works, it is worth considering
why
it works and whether we can benefit from it. I will suggest that we can. Gold's approach needs a certain twist in order to make it psychologically plausible, but then it can solve not only the problem of how to apply SP but also another quite bizarre learnability problem that has never been noticed before: that under some very familiar assumptions,
obeying
SP can cause a learner to fail to arrive at the target grammar (Fodor and Sakas 2005).

Gold assumed an
enumeration
of all the possible grammars, in the sense of a total ordering of them, meeting the condition that a grammar that licenses a subset language is earlier in the ordering than all grammars licensing supersets of it. All the other grammars, not involved in subset-superset relations, are interspersed among these in an arbitrary but fixed sequence. (I will assume here that each grammar appears in the ordering just once.) The learner's hypotheses must respect this ordering. The learner proceeds through the list, one grammar at a time, moving on to consider a new grammar only when the preceding one has been disconfirmed by the input. The learner
thereby
obeys SP, without having to actively apply it or to know what the competing grammars for a given input sentence are. No decoding is required. The learner simply takes the next grammar in the sequence and finds out whether or not it can license
(parse) the current sentence. Of course, learning in this fashion is a very slow business in a domain of a billion or more grammars, as the learner plods through them one by one. Steven Pinker wrote a very instructive paper in 1979 in which he admonished against trying to create psychology out of enumeration-based learning techniques. He wrote (p. 227): “The enumeration procedure â¦ exacts its price: the learner must test astronomically large numbers of grammars before he is likely to hit upon the correct one.” After reviewing some possible enhancements to a Gold-style enumeration he concluded (p. 234), “In general, the problem of learning by enumeration within a reasonable time bound is likely to be intractable.” From our CoLAG perspective, enumeration-based learning is an especially frustrating approach because it extracts so little goodness from the input. It has no room for parametric decoding at all. It proceeds entirely by trial and error, considering grammars in an invariant and largely arbitrary sequence that has no relation whatsoever to the sentences the learner is hearing. It is also rather mysterious where this ordering of grammars comes from. It must presumably be innate, but why or how humans came to be equipped with this innate list is unclear.

17.6 From enumeration to lattice

Despite all of these counts against it, I want to reconsider the merits of enumeration. Our CoLAG research has tried to hold onto its central advantage (fully-reliable SP application without explicit grammar comparisons) while improving its efficiency. You may find the question of its origin just as implausible for our version as for the classic enumeration, but if I can persuade you to restrain your skepticism for a little while, I will return to this point before we are through. We have taken the traditional enumeration and twisted it around into a lattice (or strictly into a
poset
, a partially ordered set) which represents the subsetâsuperset relations among the grammars, just as Gold's enumeration did, but in a more accessible format. The lattice is huge. The 157 grammars depicted in
Fig. 17.1
constitute about one-twentieth of our constructed domain of languages. The domain is defined by 13 parameters, it contains 3,072 distinct languages, and in all there are 31,504 subsetâsuperset relations between those languages. (The real-world domain of natural languages is of course much more complex than this, which is why we have to seek an efficient mechanism to deal with it.)

This is how a learner could use the lattice. At the top of the lattice, as illustrated in
Fig. 17.1
, are nodes that denote the superset languages, with lines running downward connecting each one to all of its subsets, so that at the bottom there are all the languages that have no subsets. We call these
smallest languages
, and by extension the grammars that generate them are
smallest grammars
. These are the only safe (SP-permitted) hypotheses at the beginning of the learning process, and the learner may at first select only from among these. Because they have no subsets, the learner thereby obeys SP. As learning proceeds, these smallest grammars are tried out on input sentences and some of them fail. When this happens, they are erased from the lattice. That is: when a grammar is disconfirmed, it disappears from the learner's mental representation of the language domain, and it will not be considered again. This means the lattice gets smaller over time. More importantly: the pool of legitimate grammars at the bottom of the lattice gradually shifts. Some of the grammars that started out higher up in the lattice because they had subsets will trickle their way down to the bottom and become accessible to the learner, as the grammars beneath them are eliminated. They qualify then as smallest languages compatible with the learner's experience, so they have become legitimate hypotheses that the learner is permitted to consider.

Fig. 17.1. A fragment (approximately 5%) of the subset lattice for the CoLAG language domain. Each node represents one grammar. Each grammar is identified as a vector of 13 parameter values, but the grammar labels are suppressed here because of the scale. Superset grammars are above subset grammars.

This lattice representation of the domain provides a built-in guarantee of SP-compliance just like a classic enumeration, but it is much more efficient than an enumeration because there is no need for the learning device to work through every language on the way between the initial state and the target language. All it has to work through are all the subsets of the target language (beneath it in the lattice), which is exactly what SP requires. Our reorganization of the domain has cleared away the intervening arbitrarily ordered grammars which merely get in the way of SP in the one-dimensional enumeration. The lattice-based approach has other good features too. The erasure of grammars incompatible with the input makes syntax learning similar to phonological learning, where it is well established that infants start by making a great many phonetic
distinctions which they gradually lose with exposure to their target language, retaining only those relevant to the phonological categories that are significant in the target.
⁵Also, the lattice-based model solves the other dire problem that I mentioned earlier: the fact that, although obeying SP is essential to avoid fatal overgeneration errors, it can itself lead to fatal errors of undergeneration.

17.7 Incremental learning and retrenchment

This disagreeable effect of SP stems from the assumption of
incremental learning
, that is, that the learner makes a decision about the grammar in response to each sentence it encounters. After each input sentence, an incremental learner chooses either to retain its current grammar hypothesis or to shift to a new one. It does not save up all the sentences in a long-term database, to compare and contrast, looking for general patterns. Only the current grammar (the parameter values set so far) and the current input sentence feed into its choice of the next grammar, so it can forget all about its past learning events; it does not retain either sentences previously encountered or a record of grammars previously tested. Incremental learning thus does not impose a heavy load on memory, making it plausible as a model of children. Incremental learning was clearly implied in the original parameter-setting model, and was regarded as one of its many assets. However, SP and incremental learning turn out to be very poor companions. To avoid overgeneration, SP requires the learner to postulate the smallest UG-compatible language consistent with the available data. But when the available data consists of just the current input sentence, the smallest UG-compatible language consistent with it is likely to be very small indeed, lacking all sorts of syntactic phenomena the learner had acquired from prior sentences. Anything that is not universal and is not exemplified in the
current
sentence must be excluded from the learner's new grammar hypothesis. We call this
retrenchment
. SP insists on it, because if old parameter settings weren't given up when new ones are adopted, the learner's language would just keep on growing, becoming the sum of all of its previous wrong hypotheses, with overgeneration as the inevitable result. SP thus makes an incremental learner over-conservative, favoring languages that are smaller than would be warranted by the learner's whole cumulative input sample to date. That can lead to permanent
undershoot errors
in which the learner repeatedly guesses too small a language, and never attains the full extent of the target. This doesn't happen always, but we observe undershoot failures in about 7 percent of learning trials in our language domain.

An example will illustrate the point. Suppose a child hears “It's bedtime.” There is no topicalization in this sentence, so if the child is an incremental SP-compliant learner, there should be no topicalization in the language he hypothesizes in response to it (assuming that topicalization is something that some languages have and some do not). Similarly for extraposition, for passives, for tag questions, long-distance wh-movement, and so on. Even if the child had previously encountered a topicalized sentence and acquired topicalization from it (had set the appropriate parameter, or acquired a suitable rule in a rule-based system), that past learning is now lost. To make matters worse, this is the sort of sentence that the child is going to hear many times. So even if during the day he makes good progress in acquiring topicalization and extraposition and passives, every evening he will lose all that knowledge when he hears “It's bedtime.” This is obviously a silly outcome, not what happens in real life, so we must prevent it happening in our model.

The guilty party once again is the ambiguity of (many) triggers. If the natural language domain were tidy and transparent, so that there was no ambiguity as to which language a sentence belongs to, a learner would be able to trust her past decisions about parameter settings, and hold on to them even if they aren't exemplified in her current input. Then even a strictly incremental learner could accumulate knowledge. A parameter value once set could stay set, without danger of discovering later that it was an error. But the natural language domain is
not
free of ambiguity, so a learner can't be sure that her past hypotheses weren't erroneous. Hence previously adopted parameter values cannot be maintained without current evidence for them; retrenchment is necessary. But then the puzzle is how learners avoid the undershoot errors that retrenchment can lead to.

17.8 The lattice limits retrenchment

It seems that the familiar assumption of incremental learning may be too extreme. Incrementality is prized because it does not require memory for past learning events. But even an incremental learner could profit by keeping track of grammars it has already tested and found inadequate. Then it could avoid those grammars in future, even when the evidence that disconfirmed them is no longer accessible to it. Making a mental list of disconfirmed grammars would do the job, though it would be very cumbersome. But an ideal way to achieve the same end is provided by the erasure of disconfirmed grammars from the grammar lattice, which we motivated on independent grounds earlier. Erasing grammars will block repeated retrenchment to languages that are smaller than
the target. The smallest language compatible with “It's bedtime” is at first very small. But as time goes on, the smallest of the smallest languages will have been erased from the lattice, and then some larger smallest languages may be erased, and so on. As time goes by, the languages that the learner is allowed to hypothesize, the accessible ones at the bottom of the lattice, will actually include some quite rich languages. Hearing “It's bedtime” won't cause loss of topicalization and extraposition once all the grammars that don't license topicalization and extraposition have disappeared, eliminated by earlier input. Note that keeping track of disconfirmed grammars by erasing them from the innate lattice is a very economical way of providing memory to an incremental learner. The learner doesn't have to keep a mental tally of all the hundreds or thousands of languages he has falsified so far, a tally that consumes more and more memory as time goes on. Instead, memory load actually declines as learning progresses. To summarize: Like a traditional enumeration, the lattice model offers a fail-safe way to impose SP on learners' hypotheses; if combined with erasure of disconfirmed grammars it also provides a safeguard to ensure that SP doesn't get out of hand and hold the learner back too severely.

Other books

Forsaking Truth by Lydia Michaels

The Ravagers by Donald Hamilton

Latinalicious: The South America Diaries by Becky Wicks

Middle School: How I Got Lost in London by James Patterson

Julia London by Wicked Angel The Devil's Love

Blood Cries Afar by Sean McGlynn

TW11 The Cleopatra Crisis NEW by Simon Hawke

The Creed of Violence by Boston Teran

Marley's Menage by Jan Springer

The Viper by Hakan Ostlundh