Experiments in Word Creation


Michael Levison, Department of Computing and Information Science


Greg Lessard, Department of French Studies


Queen's University, Kingston, Ontario.



1. Linguistic background


The use of word transformation rules to build words of a language from shorter words is a common practice. In English, the suffix "-ly" converts many adjectives to adverbs; in French, "-age", "-ement", "-ure" and others are often used to make nouns from verbs. The resulting words may be attested in standard dictionaries of the language. On the other hand, a speaker unaware of an existing word, or seeking a particular nuance, may create unattested words, either consciously or unconsciously, without

interrupting the flow of understanding by the listener. Consider, for example, the following sentences:


Although his handwriting was poor, it was still readable.


Although the car had not been used for several weeks, it was still startable.


Each contains an adjective created from a transitive verb by appending the suffix "-able". The former word ("readable") appears in most English dictionaries; the latter ("startable") does not. Both will surely be accepted without hesitation by the majority of anglophone listeners.


We may ask, therefore, whether this suffix always produces acceptable words, or whether some additional constraints apply which lead to limitations on its productivity. Clearly, this is a domain where quantitative data would be useful.


A number of approaches have been suggested to deal with this problem. Baayen and Lieber (1991), using corpus materials, propose a statistical model of productivity which, in essence, associates productivity with a higher percentage of hapax forms. On the other hand, it would be useful to estimate productivity for a given population of users without first building a corpus to represent their language.


Some years ago, Aronoff and Schvaneveldt (1978) and Gorska (1982) suggested and carried out on a limited scale pen and paper tests in which subjects were asked to evaluate lexical creations. Results were encouraging, but no further work appears to have been done, possibly because of the logistical problems involved.


In this paper we describe some experiments, conducted with the help of first-language speakers of English, to determine the productivity of certain affixes in the English language. The purpose of the paper is not only to discuss the results which were obtained, but also to present a research methodology to allows questions of this type to be investigated.



2. Choice of affixes


For the purposes of these experiments, six affixes were chosen, including three prefixes ("non-", "semi-" and "super-") and three suffixes ("-like", "-ish" and "-less"). All share the characteristic of being attachable to a base noun (see Marchand, 1969). In addition, the prefixes share a common semantic base, having to do with degree (including absence), while the suffixes all have to do with degree of similarity (including absence of a trait.) Intuitively we considered that the three prefixes and the three suffixes each formed a series of decreasing productivity; in other words, that "non-" would prove more productive than "-semi", and so on.



3. The experimental data


Discussions on statistical design led to a decision that each subject would be asked to judge 180 words, 30 with each of the six affixes; also that no subject would see the same base word with more than one affix. The statistician originally proposed that six subjects might see the same 180 base words with different affixes, and that the data should be re-used for further groups of six subjects. A plentiful supply of base words, together with computer programs to help in constructing the data, made this unnecessary.


The experiments were conducted within the ivi/VINCI program environment. VINCI is a program which, given a syntax, a lexicon and a morphology for a language, generates utterances in the language. The program has been fully described elsewhere (see, for example, Levison and Lessard, 1992). It also has the ability to create new words using the lexicon and some word formation rules known as lexical transformations. These features are embedded within an editor, ivi, which can edit various kinds of objects, including text and records. It would have been easy within this system to generate new words at random as subjects required them. The resulting words, however, would not have conformed to the distribution described earlier. Individual datafiles were therefore built in advance for each subject.


The base words were taken from the Computer Usable Version of the Oxford Advanced Learners' Dictionary (henceforth CUVOALD; see Mitton 1986). From this, it was decided to extract nouns marked K6% (singular count-nouns whose plural is formed with "-s"). The % symbol indicates that the words are neither among the most frequent 500 in some common wordlists, nor particularly rare (in the opinion of Mitton). Proper nouns were deleted, leaving almost 12000 nouns in total.


A simple awk program was used to extract the nouns, and make records in the form:


"chain"|N|||K6%|||r1|r2|


suitable for use as a VINCI lexicon. The items r1 and r2, inserted by awk in fields 8 and 9, denote uniformly distributed random numbers between 0 and 1 with no pairwise correlation. r1 was used, in effect, to determine both the subject who would see the word and the affix to be attached; r2 determined the order in which the words were presented to the subject.


In practice, the 12000 words were first sorted using r1 as key, so that they were ordered randomly. The first 600 were then processed using a VINCI lexical transformation to create records of the form:


"chainish"|ADJ|ish|?|0.667453|0.145922|


in which the new word, the result of adding suffix "-ish", appears in field 1, the suffix itself appears in field 3, the random numbers are copied to fields 5 and 6, and a ? symbol, later to be overwritten with

the subject's response, is placed in field 4.


The resulting words were transferred 30 at a time to each of 20 subject files. A similar operation was carried out with suffix "-like" on the next 600, and so on, until all affixes had been used, and the subject files contained 180 words each. Subject files were later re-ordered using r2 as key, so that the subjects would encounter affixes randomly. The process of assigning words to affixes and subjects is effectively equivalent to a random draw without replacement.


Since the VINCI morphology system is quite powerful, it would have been possible to write a complex rule for appending each affix in such a way as to ensure normal-looking orthography. In practice, this was considered unnecessary, so that words like "hilllike" (sic) occasionally appear. Subjects were advised that spelling alone should not disqualify a word. One of us (Lessard) scanned the lists quickly, modifying a few words where a "misspelling" was actually misleading. This led to an occasional doubling of a consonant before an added suffix.



4. The test environment


The experiments were administered using the record-editing features of the ivi editor. ivi permits each record in a record file to be viewed within a template which can be determined by the "user" (in this case, the experimenters). They initialized the editor so that two fields (1 and 4) would appear in the centre of the screen, with the others away in a bottom corner:


Word: "chainish"^^^^^^^^^^^^^^^^^^^^^^^^^


Acceptable (y/n) ?


The field initially containing ? was restricted to one character, and made overwritable only with y or n. All other fields were made unwritable. A function-key was set up to search for the next ? symbol.


A subject had merely to call the editor, fetch his/her own file, and type the function-key. This displayed the first word, leaving the editor cursor on the ? symbol. The ? was overtyped with the subject's judgement on that word, and the function-key was typed again to display the second word. After 180 words, a final message appeared, and the subject saved the file on the disk by overwriting the original version.


If a subject wished to interrupt the session at any time, it was merely necessary to overwrite the file on the disk, and leave. When the editor was called subsequently, the search for the next ? symbol would cause the experiment to resume with the first unjudged word.


Later, when each experiment had been completed, the authors used an awk program to tally the results (one line of Table 1).



5. The subjects


The subjects were upper year students taking a science degree at a Canadian university. All claimed English as their native language. Most, if not all, were educated in Canada. No experience in the use of

computers was necessary to carry out the experiment, though in fact all subjects happened to be regular computer users.


Typically they completed the experiment in one to two hours.



6. The results


The first set of results, obtained from nine subjects, is shown in Table 1.


affix

-ish

-like

non-

semi-

super-

-less

tot

y

n

y

n

y

n

y

n

y

n

y

n

y

s1

15

15

20

10

2

28

4

26

4

26

12

18

57

s2

9

21

25

5

26

4

21

9

17

13

20

10

118

s3

4

26

13

17

4

26

7

23

11

19

12

18

51

s4

23

7

25

5

6

24

18

12

25

5

18

12

115

s5

15

15

27

3

14

16

22

8

24

6

22

8

124

s6

11

19

5

25

10

20

9

21

16

14

11

19

62

s7

16

14

16

14

15

15

19

11

18

12

23

7

107

s8

12

18

15

15

7

23

21

9

14

16

16

14

85

s9

12

18

19

11

8

22

11

19

10

20

11

19

71















tot

117


165


92


132


139


145


790



The pairs of integers record "yes" and "no" for each subject/affix. (In two cases unanswered instances were added to the "no" column.)


All totals are for "yes".


As previously noted, each subject saw 30 words for each of the six affixes, so that subject totals are out of 180. Affix totals are out of 270 (9 subjects by 30 words). There were 1620 words in all, almost half

being judged acceptable.


Several statistical analyses were carried out on the results. The null hypothesis tested in each case was that the level of acceptance was 50%. Overall, the difference between the actual (0.4877) and hypothetical levels is statistically insignificant. This, however, is caused by cancellation between significant, but opposite, results.


At 0.4333, with z-value -5.4, the acceptance level for "-ish" is significantly less than 50%, while the level for "non-" (0.3407, z-value -13.5) is very significantly less. (So much for the authors' intuition!)

At 0.6111, with z-value 9.2, the level for "-like" is significantly greater than 50%. For "-less", "semi-" and "super-", the ratios do not differ significantly from 50%.


Evidently, the subjects differ in their rate of acceptances from very conservative from 51/180 (= 0.2833) to very liberal 124/180 (= 0.6889). In the results shown here, insufficient subjects have been tested to determine whether these rates are normally distributed, or have some other distributional features. It would also be interesting to know whether there are discoverable factors affecting the rate (gender of the subject, education level, etc.)


More complete results will be presented in the paper.


A question which the existing data might answer is whether subjects become more accepting of new words as they gain practice. In the authors' personal experience, they do.


Since each subject's answers constitute a time sequence of y's and n's, these will be analysed to determine whether the proportion of y's increases as the experiment progresses.



7. Conclusion


Perhaps more important than the results mentioned in the previous section is that the methodology described here offers a simple and painless way to obtain quantitative data about word formation. The subjects volunteered their time freely, and typically described their role as "fun". It would be simple to control the experiments not only for the first language of the subjects, but also for their gender, their level of education, and for a wealth of other parameters. This, as well as the number of words created,

is well beyond the capacity of a corpus-based study.


The Canadian background of the subjects caused one minor problem. Since CUVOALD was constructed in England, some of the base words themselves were not recognized by the subjects. For example, a participant in a preliminary test rejected the word "yob(b)ish", having never heard of the

word "yob", though both are common in English newspapers.


In future experiments, we may modify the environment, asking a series of questions, the second and third appearing only if the previous answer was "Yes" (y):


Do you recognize the word "yob"? y


Would the word "yobish" be acceptable to you? y


Type a short sentence or phrase using it: He was accosted by an individual with a yobish appearance.


The last would more time-consuming, but would add to our understanding of the listener's thought processes. It would, of course, be more complex to obtain exactly 30 yes/no answers for each affix, because some would be discarded after the first question.



8. Acknowledgements


We are grateful to Drs Terry Smith and T.W.F. Stroud of Queen's STATLAB for their advice and assistance on statistical design and analysis.



9. References


Aronoff, Mark and Roger Schvaneveldt (1978), "Testing morphological productivity", Annals of the New York Academy of Sciences 318: 106-114.


Baayen, Harald and Rochelle Lieber (1991), "Productivity and English derivation: a corpus-based study", Linguistics 29: 801-843.


Gorska, Elzbieta (1982), "A way of testing the productivity of word formation rules (WFRs)?", Studia Anglica Posaniensa 14/1: 169-174.


Levison, Michael and Greg Lessard (1992), "A System for Natural Language Sentence Generation", Computers and the Humanities 26: 43-58.


Marchand, Hans (1969), The Categories and Types of Present-Day English Word-Formation. Beck, Munich.


Mitton, Roger (1992), A description of a computer-usable dictionary file based on the Oxford Advanced Learner's dictionary of current English. Computer-readable file, Oxford Text Archive (ota.ox.ac.uk).