Using a Sentence Generator as an Aid in Introductory Linguistics

Greg Lessard, French Studies

Michael Levison, Computing and Information Science

Queen's University, Kingston, Ontario

1.0 Overview

One of the principal concepts of contemporary linguistic theory is that language is a rule-based creative device: many (if not most) of our utterances are not pre-stored in our memory, but rather produced as we need them. A frequent method of driving this point home consists in pointing out that the vast majority of the sentences we produce or interpret are sentences which we are dealing with for the first time in our lives.

One of the consequences of this is that no corpus-based description can ever provide a complete description of the language under study (although it may prove invaluable in giving us starting points and in confirming hypotheses, as shown in Lessard, Levison and Olsen 1990). A language description is better seen as a generative model, a finite set of rules which generate (or interpret) an in principle infinite number of possible utterances.

In most current grammars, such models are formulated in terms of phrase structure rules, in which a given element is rewritten as a string of elements, as for example:

1. S -> NP VP


VP -> V

These rules state that a sentence is to be seen as the combination of a noun phrase plus a verb phrase, that a noun phrase is composed either of a determiner plus a noun, or of a determiner plus a noun plus an adjective, and that a verb phrase is composed of a verb. In introductory linguistics courses, students often have some difficulty grasping the significance of such rules. They tend not to appreciate their power, and they frequently fail to imagine the sorts of output a given rule may produce.

In an attempt to remedy this situation, we began in 1989-90 to use a sentence generator to teach syntax in a second year introductory linguistics course. The course is taught in French and deals primarily with the linguistic description of French. In what follows, we will look briefly at the system itself, we will describe and comment upon the experience of using it for teaching in the past year, and we will examine some of the possibilities for future use.

2.0 Description of the System

Developed in the Department of Computing and Information Science at Queen's University (see Lessard et alii 1990, Levison and Lessard 1990a,b for details) the generator (known as Vinci) enables a user to use a word processor or editor to produce data files describing, in a format which is normal for a linguist, the lexicon, morphology and syntax of a language and to have these data files read and interpreted, the output being a set of sentences, or more generally, strings. To take a very simple case, let us take the syntax:

2. %include "att" {Tell the system where to find attribute info.}

S = {Define the sentence. }

choose {Choose a number, sing or plural, and give it }

N : Number; {the label "N" }

NP[N] VP[p3,N] {The sentence is made up of a noun phrase (NP)}

{and a verb phrase (VP) which share the same }

{number. The VP is in the 3rd person (p3). }

# {End of the first rule}

NP = {Define the NP}

inherit {Inherit the number chosen previously and }

N : Number; {label it "N". }

choose {Choose a gender, masculine or feminine, and }

G : Gender; {label it "G". }

( {Begin a choice among options. EITHER }

DET[G,N] N[G,N] {Choose a determinant "DET" and a noun }

{"N" having the same number and gender.}

| {OR }

DET[G,N] N[G,N] ADJ[G,N] {Choose a determinant, an adjective }

{and a noun sharing the same number }

{and the same gender. }

) {End the choice of options. }

# {End of the second rule. }

VP = {Define the verb phrase. }

inherit {Inherit the number and person specified }

N : Number, {previously and label them "N" and "P". }

P : Person;

choose {Choose a tense (present, past, etc.) and give}

T : Tense; {it the label "T". }

V[vi,N,P,T] {Choose an intransitive verb (vi) having the }

{appropriate number, person and tense. }

# {End of the third rule. }

This is essentially a more detailed version of the simple rewrite rule given in [1]. Combined with the appropriate attribute file:


Gender (masc, fem)

Number (sing, plur)

Person (p1, p2, p3)

Tense (pres, imperfect)


with an appropriate lexicon:

4. "le"|DET|déf|Genre,Nombre|$9|$50|"la"|"les"|"les"|"l'"||||||"the"|






















and with specifications of possible terminal symbols (N,V,DET,ADJ) and morphology files,

5. rule 12


{verbes réguliers en "-er"}

{exemple: chanter}

[prés] : @1 + er_présent(Nombre,Personne);

[imparf] : @1 + imparfait(Nombre,Personne);

[fut] : @1 + "er" + futur(Nombre,Personne);

[cond] : @1 + "er" + imparfait(Nombre,Personne);

[inf] : @0;

[impér,p2,sing] : @1 + "e";

[impér,p2,plur] : @1 + "ez";

[impér,p1,plur] : @1 + "ons";

[pp,masc,plur] : @5 + "s";

[pp,fém,sing] : @5 + "e";

[pp,fém,plur] : @5 + "es";

[pp] : @5;

[pprés] : @1 + "ant";

[subj] : @1 + subjonctif(Nombre,Personne);

[psimple] : @1 + er_simple(Nombre,Personne);

[] : @0 + "*Gardes:R12";


the generator will produce any number of sentences of the form:

6. Le chat malade meurt.

Le canard court.

Le chien blessé arrive.

One need only enter the user specification of desired output, as in:

7. %include "att" {Indicate the attributes to be used. }

10 of {Specify the number of sentences to produce. }

S {Indicate the formula to be used. }

# {End of the rule. }

and include all information as parameters to a general rule:

8. interp ter tab lex

in order to run the system. The results of each rule are sent to the screen or to a file.

3.0 Student use and reaction

The system was developed on a SUN network, but since it was written in C, it was relatively easy to port to the university academic mainframe, an IBM 3081G running under VM/CMS. This option was chosen in order to allow students to use any of the terminal sites on campus. As a consequence of their enrollment in the linguistics course, all students received userids, which also gave them access to word processing tools and electronic mail. (Another requirement of the course was that students contact the professor by email at least once. About half the class used email after that.)

The class contained 35 students, 18 in their second year of the four year Honours concentration, 10 in their third year, and 7 in their final year. Students were registered either in the Language and Linguistics special field concentration, in the Translation special field concentration, in a concentration in French literature. None had studied linguistics before.

The course itself lasted two terms: September 1989-April 1990. The computer portion of the course occurred in the second term, beginning in January.

Brief documentation was provided in class on the use of the mainframe (since many of the students had little or no computing experience). The nature of assignments was incremental in difficulty.

3.1 First assignment

The first assignment, on adjective placement in French, involved no more than signing on to the computer and typing the single word command "exemple1", which called to the screen, or to a file, 10 strings defined by the syntax:

9. choose

Ca : Case, {Choose a case (human, physical, etc.) }

Ge : Gender, {Choose a gender }

No : Number; {Choose a number }

DET[def,Ge,No] {Pick a definite article with number and gender }

ADJ[Ca,Ge,No] {Pick an adjective with the same gender and number}

N[def,Ca,Ge,No] {Pick a noun with the same gender and number, and }

{the same case as the adjective }

The string of determinant, adjective, noun is common in French. However, in the syntax given above, since the adjective specifications make no mention of preposed or postposed forms, either can occur in the output. Here are examples of the output for a typical run of the generator:

10. *le naturel sourcil

le sombre poids

le second serpent

le vieux temps

le joli visage

*le vivant e1le1phant

*le semblable panier

?l'important pharmacien

Note that starred forms are unacceptable. The task of the students was to run the generator one or more times and to analyse the unacceptable forms, searching for a system which might underly them. Subsequently, the problem was discussed in class, and constraints were proposed to explain preposed versus postposed adjectives in French.

Apart from the attention which this exercise focusses on problems of adjective placement, discussion of the output of the generator illustrates possible solutions which might be adopted to deal with such problems. The simplest solution, which is descriptively adequate (a key term in linguistic theory) is to mark certain adjectives as preposed,

11. second, joli

and others as postposed

12. vivant, naturel

However, this simple solution fails to capture several generalizations:

First, the set of preposed adjectives shares some formal and semantic traits. On the formal level, preposed adjectives tend to be monosyllabic, while on the semantic level, such adjectives tend to share the trait of indicating either an evaluation or a physical trait, and to be the most general forms in their system (to be the hyperonyms of other related forms). Thus, one says:

13. un grand livre


14. un livre gigantesque

Second, the distinction between preposed and postposed adjectives in French is not entirely clearcut. Some forms, such as pauvre may occupy either position, while others, normally postposed, may, when they represent special emphasis, be preposed

15. un voyage extraordinaire / un extraordinaire voyage

Third, the preposed/postposed distinction is influenced by other syntactic factors. Thus, one says:

16. un grand livre

but, with a complex adjectival phrase:

17. un livre grand comme la main

Without even rewriting the generator output, students are thus sensitized to the necessity of probing below purely descriptive adequacy and grappling with the complexities of linguistic problems.

3.2 Second assignment

The second assignment dealt with the problem of selectional restrictions. After signing on, students typed the command "exemple2", which generated 10 sentences having the formula:

18. choose

Ge : Genre,

No : Nombre;

PRON[pronpers,p3,plur,clit,suj] {clitic personal pronoun: ils}

V[vi,p3,hum.suj,plur,prés] {intransitive verb, human subject}

DET[déf,Ge,No] {definite article}

N[déf,phys,Ge,No] {noun referring to physical object}

Now in principle this formula should produce unacceptable sentences, since an intransitive verb is not usually followed by a noun phrase. This is confirmed by many of the output sentences:

19. *ils reviennent le riz

*ils pensent le poivre

*ils rient l'outil

However, not all sentences are unacceptable, as we can see from the following examples:

20. ils poussent le veston

ils rentrent le steak

The difficulty is that many verbs have two homonymous forms, one intransitive, the other transitive. An adequate description of the language must take account of this, and show that it is not sufficient to characterize verbs in and for themselves: we must also describe the environments in which they may appear.

3.3 Third assignment

In recent years, there has been a resurgence of interest in agreement phenomena (see for example Barlow and Ferguson 1988). Such phenomena are particularly important in French, where agreement is marked between adjectives and nouns, between noun phrases and verbs, and between determiners and nouns, and they form a significant source of errors among second language learners of French. In the third assignment, students were introduced to some of these concepts through the examination of phrase structure rules in which agreement was explcitly expressed.

The assignment required students to use the XEDIT editor in the VM/CMS environment to write a simple userfile specifying a minimum six word sentence in French, in which appropriate agreement constraints and selectional restrictions were met, as for example:

21. %include "att" {Attribute specifications. }

10 of {Number of sentences to make. }

DET[def,masc,sing] {Masculine, singular definite article. }

N[def,hum,masc,sing] {Masculine, singular, human noun. }

ADJ[hum,masc,sing,post] {Masculine, singular, human adjective. }

V[vtd,pres,p3,hum.suj,phys.objd] {Transitive verb, 3rd person, present }

{tense, human subject, physical object.}

DET[def,fem,plur] {Feminine, plural, definite article. }

N[def,phys,fem,plur] {Feminine, plural, physical noun. }


Here, for the first time, students were obliged to use an editor to create files, run the files, debug them, and run them again until they were acceptable. Detailed step by step instructions were provided, as well as general models. At the same time, commands were simplified so that by typing "generer" followed by the name of the userfile, sentences would be produced.

The goal of this particular exercise was to demonstrate the complexity of selectional restrictions and agreement inherent in even a simple French sentence. Students were clearly aware of this at the end of the assignment. However, another benefit of the exercise was to show students the distinction between grammatically acceptable sentences and semantically and pragmatically appropriate ones. Thus, if we look at the output of a typical run, we note that some sentences are in some sense odd, although they respect the agreement constraints of French.

22. le garçon malade mange les tables

le père furieux adore les revues

le professeur riche mange les livres

Some students found this exercise quite easy, and in fact created more complex forms than required. Many, however, had considerable difficulty, primarily for three reasons:

(1) The XEDIT editor is not particularly user friendly, so students had some difficulty manipulating their data.

(2) The generation system itself, being still a prototype, is quite unforgiving of syntax errors, and provides little guidance when errors do occur.

(3) The process was explained to students, and one-on-one advice was given in many cases. However, no classroom time was scheduled in a computer lab with the entire class.

3.4 Student Reactions

At the end of the year, a questionnaire was circulated among the students concerning their experience with the system, and their views on its usefulness. Of the 35 students in the class, 26 returned their questionnaires. While we make no claim for statistical significance, the reactions to the various questions are enlightening. Questions and answers are translated here from French.


(1) Usefulness of the computer for understanding the concepts presented by means of computer exercises:

Very useful 2

Useful 12

Not very useful 8

Not at all useful 3

No response 1

(2) Ease of use of the computer:

Very easy 1

Easy 1

Not very easy 13

Hard 11

(3) Comprehensibility of the instructions provided:

Very easy to understand 3

Easy to understand 10

Not very easy to understand 11

Hard to understand 1

No response 1

(4) In principle, do you think it's a good idea to use computers to help in teaching linguistics?

Yes 16 No 10

(5) If you answered "yes" to question (4), do you think that more or less time should be devoted to computing aspects of the course?

More time 11

Less time 1

Same amount of time 5

(6) Would you rather that the program be available on diskette for use on a microcomputer?

Yes 20

No 3

No response 2

Don't know 1

(7) Suggestions for improvement?

(8) Please indicate your experience with computers before taking this course:

Word processing 14

Consulting library catalogue 15

Programming 4

Computing courses 10

Other 5

No experience 4

We can see from these results that a slim majority of the students see the value of these sorts of exercises and would like to see them continued and even expanded.

On the other hand, a not insignificant number of students have serious reservations. Many of these likely stem from the nature of the environment used. Thus, almost all found the mainframe environment difficult to use, and most would prefer to use microcomputers. At the same time, the instructions provided were seen as easy to follow by about half the students, not easy by the other half. A further key to this reaction is provided in the suggestions for improvement, where the almost unanimous comment was that some classes be conducted in a computer lab, with the instructor guiding the whole class through exercises.

This seems to be reasonable, if one looks at the computing background of the students surveyed. Barely more than half (14/26) have done word processing, and less than 2/3 (15/26) have consulted the menu-driven library catalogue. Some (10) have taken high school or university computing courses, but at the same time, a few (4) have never touched a computer.

The best scenario for future use of the system seems clear: use microcomputers, present all material in a computer lab, and make the environment as simple as possible. We are currently working toward these three goals: a prototype of the system has been ported to the IBM PC, and we are working on a unified interface which combines a text and record editor for datafiles, a command language to run the generator, and detailed error analysis and help files. Finally, there are now available on our campus larger numbers of microcomputer labs which would allow the classroom presentation requested by the students.

4.0 Future Areas of Use

Despite the difficulties encountered in this first implementation, we are convinced that a generation program has an important role to play in the teaching of modern linguistics. In the last part of the paper, we will look briefly at some of the more promising areas for applications of the machine.

4.1 Phonological phenomena

Since the sentence generator makes no difference between words, sentences or phonetic symbols, treating all as strings, it is possible to test concepts such as possible words and syllable division using the generator. For example, in some French words, we find two consonants followed by a vowel at the beginning: blond, trou, prix, station. Are there any constraints on this? To test this, we can run the sentence generator using the rule:

24. %include "att"

10 of

choose Oc1, Oc2 : Occlusion,

Ou : Ouverture;

C[Oc1] C[Oc2] V[Ou,oral]


This rule combines two consonants of any sort followed by an oral vowel. Output of the rule includes:

25. *z Z u

s l O

v r A

*r g EU

b r u

v r E

*l Z EU

*l z E

*l t a

*k S A

Starred forms are impossible combinations in French. Clearly, there are some constraints on the rule. Let us make it more precise:

26. %include "att"

10 of

choose Ou : Ouverture;

(C[occlusif] | C[fricatif]) C[liquide] V[Ou,oral]


The revised rule combines a first consonant which must be either occlusive or fricative with a following consonant which must be liquid, that is, either [r] or [l]. Output of this rule includes:

27. g l E

S r E

f r i

Z r EU

v r EU

v l O

*t l A

g l i

f l eu

Z l A

This is clearly much better. However, there is still one impossible form being generated [tlA]. The rule in French must exclude the strings [tl] and [dl], as follows:

28. %include "att"

10 of

choose Ou : Ouverture;

( C[fricatif] | C[occlusif,bilabial] | C[occlusif,dorsopalatal] )

C[liquide] V[Ou,oral]


This rule allows the first consonant to be fricative (of any sort) but if it is occlusive, it must be bilabial [p] or [b] or dorsopalatal [k] or [g]. Output of this rule is now acceptable in French:

29. g r i

b l i

g r O

z r EU

p l i

Z l EU

k r A

b l O

v r O

k r E

Additional work on the phonetic level might involve the use of transformations to model syllable divisions in French.

4.2 Word Formation Rules

Recent work on morphology, for example, Selkirk 1982 has demonstrated that many complex components in the lexicon of a language may be modelled by the application of phrase structure rules. The same observation applies to French, where, for example, there exists an open class of compound nouns formed on the model N -> V + N, as in ouvre-boîte 'can opener', brise-glace 'ice breaker', and so on.

It is quite simple to model this capacity by means of the following rule, which specifies a transitive verb taking a physical direct object followed by a noun which refers to a physical object.

30. %include "att"

10 of

V[vtd,p3,pre1s,sing,phys.objd] N[de1f,phys,sing]


Among the strings generated by such a rule, we note the following, with their English glosses:

31. pre1fe2re carotte 'carrot preferer'

ouvre pa3te1 'paté opener'

plie imperme1able 'umbrella folder'

relie pont 'bridge joiner'

chasse eau 'water chaser'

chasse poulet 'chicken chaser'

remplit essence 'gas filler'

bru3le ciment 'cement burner'

retient le1gume 'vegetable holder'

porte aspirine 'aspirin holder'

Of course, the following component of such an exercise consists in assigning a possible meaning to such strings. For example, it is interesting to wonder whether the implied subject of such formations need be human, or animate, or whether it might be a physical object.

4.3 Numeral Systems

Numeral systems have intrigued linguists for many years, partly because of their recursive nature. As Brainerd 1968:41 puts it:

32. when we employ thousand, million, billion in American English we can make

1012 - 1 number names which would take about 100 years to list at 300

names per second. Yet any speaker of English can construct any given one

of them in a few seconds.

In the case of French, the problem is particularly intriguing, since the language includes a 20-based component, in such forms as soixante-dix (70), literally 'sixty ten'. At the same time, there are morphological constraints which apply. For example, in French, the numbers from 20-90 take et when added to 1, but a hyphen when added to 2-9. Thus we say vingt et un but vingt-deux. This can be reflected in a fairly simple morphology. Applying these rules can generate any particular set of numbers, as in the following example, generated by a run of the system:

33. cent deux

quatre-vingt- dix-neuf

quatre-vingt- deux

cinquante et un

soixante et un



mille deux

sept cent

soixante- neuf

4.4 Metaphor

The subject of metaphor has concerned linguists, philosophers of language and literature specialists for a very long time. One of the difficulties in dealing with the phenomenon is to control the data with which one is working. If one applies a simple semantic model based on semantic traits, as described by Lyons 1977, it is possible to generate utterances and to control the degree of semantic distance between terms. For example, given the string DET N ADJ, it is normally the case that the noun and the adjective share the same selectional restrictions: a human noun will be modified by a human adjective. But what happens if we relax these restrictions? A rule like the following allows us to do this:

34. %include "att"

10 of

choose Ge : Genre;

DET[inde1f,Ge,sing] N[inde1f,Ge,sing,phys] ADJ[Ge,sing,hum]


Here, a physical object noun is modified by an adjective which carries the trait 'human'. We reproduce below some of the strings generated by the rule:

35. un banc capable 'a competent bench'

une fraise e1tonne1e 'an astonished strawberry'

une oeuvre morte 'a dead work'

un roman puissant 'a powerful novel'

un cahier de1sagre1able 'a disagreeable notebook'

une horloge libre 'a free clock'

un livre timide 'a timid book'

une fourrure souriante 'a smiling fur'

un ga3teau sympathique 'a sympathetic fur'

un disque de1contracte1 'a relaxed record'

Among other things, examples of the sort sensitize us to the facility with which we can transfer traits from a human activity to the physical object having something to do with the object: as in un disque décontracté, as well as allowing us to explore the worlds in which such structures might possibly be used.

4.5 Error Modelling

In recent years, linguists have begun to use grammars to model performance errors (Cutler 1982), first language acquisition errors (Fletcher and Harmon 1986) or second language acquisition errors (Giacomi and Véronique 1986). Such grammars need not be extremely complex. For example, a frequent error among learners of French consists in having the verb agree with a preceding clitic pronoun, rather than with the subject, as in Il les voient. See Lessard 1990. At the same time, it is interesting to note that where the preceding object pronoun agrees in number with the subject pronoun, an erroneous rule can nevertheless produce correct results: Il le voit, as has been pointed out by Corder 1981.

Now, the syntax for this sort of error is quite simple:

36. %include "att"

10 of


Ge : Genre,

No : Nombre;





and the sorts of sentences generated resemble quite closely those produced by language learners. (Starred sentences are unacceptable.)

37. elle l' e1tablit 'she establishes it'

*il les remarquent 'he notices them'

elle l' attache 'she attaches it'

elle la pre1pare 'she prepares it'

*il les boivent 'he drinks them'

*il les descendent 'he carries them down'

*il les portent 'he carries them'

elle la cre1e 'she creates it'

elle la sert 'she serves it'

5.0 Conclusions

Despite the problems encountered using the first prototype of the system, we are confident that generation systems have an important role to play in teaching linguistics, on three conditions:

(1) They must be extremely user-friendly, incorporating all aspects of the work to be done in a single environment (preferably on a microcomputer) and possess extensive help functions.

(2) They must be presented in teaching labs, so that problems may be dealt with by the entire group of learners.

(3) They must be compatible with the formalism normally used by linguists. So for example, it would be inappropriate to require linguistics students to learn PROLOG in order to generate sentences; rather, the generator must be seen as just another tool in the linguist's toolbox.


Barlow, M., Ferguson, C.A. (1988) Agreement in Natural Languages. Stanford: Centre for the Study of Language and Information.

Brainerd, B. (1968) A Transformational Generative Grammar for Rumanian Numerical Expressions. In Grammars for Number Names. Dordrecht: Reidel.

Corder, S.P. (1981) Error Analysis and Interlanguage. Oxford: Oxford University Press.

Cutler, A. (ed.) (1982) Slips of the Tongue. Amsterdam: Mouton.

Fletcher, P., Garman, M. (eds.) (1986) Language Acquisition: Studies in First Language Development. Cambridge: Cambridge University Press.

Giacomi, A., Véronique, D. (eds.) (1986) Acquisition d'une langue étrangère. Perspectives et recherches. Aix-en-Provence: Service de Publications, Université de Provence.

Lessard, G. (1990) Modelling Performance Errors in Advanced Learners of French. AILA Conference, Thessaloniki, Greece.

Lessard, G., Levison, M., Bastianutti, D., McDonald, J.K., Hurd, S., Smith, D. (1990) Vers un ELAO génératif: le projet VINCI. In CALL: Papers and Reports. La Jolla, CA: Athelstan Press.

Lessard, G., Levison, M., Olsen, M. (1990) Possible and Impossible Pronouns: The Role of Textbases and Natural Language Generation in Linguistic Research. ACH/ALLC Conference, Siegen, Germany.

Levison, M., Lessard, G. (1990a) A Transformation Mechanism for Natural Language Sentence Generation. Department of Computing and Information Science, Queen's University.

Levison, M., Lessard, G. (1990b) Application of Attribute Grammars to Natural Language Sentence Generation. Included in a forthcoming volume on attribute grammars, Springer Verlag.

Lyons, J. (1977) Semantics. Cambridge: Cambridge University Press.

Selkirk, E. (1982) The Syntax of Words. Cambridge, MA: MIT Press.