vinci files and file commands

Language Description Files

Eight kinds of files may be used to describe an object language to vinci. They are:

The first three are mandatory for sentence generation; the rest need be specified only if the language description uses the features.

Three other types of files may be used in connection with the student-error checking process:

For vinci to make use of these files, they must be installed, using the commands provided in ivi/vinci for this purpose.

Before dealing with these, however, we shall mention some features common to all the files.

Common Features

The file installation commands share a subprocess which provides common features to all the previously mentioned files.

Some of the following have been mentioned in earlier sections:

  1. Character sequences within double quotes are strings of the object language (i.e. words, phrases, affixes, ... ), and vice-versa.
  2. The symbols " | { and } should not be used as characters of the object language.
  3. Alphabetic/digit strings outside double-quotes are identifiers (i.e. attribute types and values, metavariables, tags, properties, ... )
  4. An identifier may be any string of letters, digits or underscores, starting with a letter. Upper and lower case letters are distinct. There must be no clash between identifiers used for attribute types, values and variables. Certain other clashes do not cause problems, but they are not advisable.
  5. Certain identifiers are keywords. These are "tokenized" (turned into single bytes) by the common subprocess, and cannot be used to name any elements in the description. There is no problem with their appearance in object language strings.
  6. Blank spaces occurring outside double quotes separate identifiers (so that mascplur is not the same as, or related to, masc plur). They also play a role in macros. Apart form this, however, they are discarded and ignored. The same is true of line breaks, except that in a lexicon a line break terminates a lexicon entry. So each entry occupies one line. In other words, with the noted exceptions, the use of spaces and linebreaks to improve readability does not affect the description. Beware, however, of the phantom line breaks, beloved of modern word processors, which appear in the display because of autowrap but are not really present in the file. These normally produce a space, but do not terminate lexicon entries.
  7. The common subprocess provides for comments. All material occurring within { } is regarded as comment and is discarded.
  8. The common subprocess warns about comments or strings which are very long, or which extend across a line boundary. These could result from omitted closing quotes or closing braces or the like. It also warns about vertical bars and opening comment braces inside comments, and about vertical bars and opening and closing braces inside double quotes.
  9. The process provides for file inclusion, which allows, say, a lexicon, to be assembled from sublexicons. If a file contains the command: %include <filename> the named file will be interpolated into the current file. Interpolated files may themselves include interpolations, to a depth of 6.


There is a means of defining abbreviations, more commonly known in the computing field as macros. A macro is defined by:

 %define macroname <string>


The three sections are separated by single spaces. The macro is used (called) by the expression %macroname , terminated with a space, which becomes replaced by the sequence being abbreviated.

An example. In a French lexicon, one might define the macro vtous:

    %define vtous Mode, Genre, Nombre, Personne, Temps

to abbreviate a list of attribute types. This may then be used in verb entries to abbreviate the attribute list:

    "parler"|V|vtd, auxa, ..., %vtous ||$er||

It is not advisable to define macros which contain one of a pair of double quotes, or a opening comment bracket, or any other such tricky device. If you really have to know what will happen, please feel free to experiment.

Note also that:

Installation Commands

The following commands read a file containing some component of a language description and set up the corresponding vinci data structures. Each requires a <filename> as parameter. In most cases, installing a file discards (uninstalls) any previously installed file of the same kind. Some fuller comments follow the table.

ATtribute Install attributes
TMnls Install terminals
LExicon Install lexicon
VAddlex Add to lexicon
SYntax Install "main" syntax
USer Install "user" syntax
MOrphologyInstall morphology rules
TBls Install morphology tables
SMtransf Install semantic transformations
LXtransf Install lexical transformations
MRpherror Install morphological variants
IPa Install phonological variants
TGinfo Install lexical variants

As usual in ivi, only the first two letters of the command are typed; ivi supplies the rest. This is followed by the name of the file; and <RETURN> triggers the installation.

The use of the first two letters only is, in fact, the reason for the sometimes unfriendly names. At last count, ivi included 75 commands, using up many of the pronounceable initial pairs!

Terminology. For the sake of abbreviation, we often refer to files by the first two letters of the command which installs them: TM files, AT files, and so on.

The LExicon and VAddlex commands read a lexicon either in textual form or as a set of ivi records (or indeed as a mixture). The LExicon command discards an existing lexicon. The VAddlex command does not, simply adding new lexical entries to the ones already present. This may well be used after new words have been generated with lexical transformations.

The internal data structure used by the installed lexicon relies on the attribute and terminal data, and changing these files invalidates the structure. For this reason, the AT and TM files must be installed before the lexicon, and these commands uninstall any lexicon installed previously. These are just two of the dependencies between files; the complete set is given below.

The situation with SYntax and USer is more complicated. As we noted in the Syntax section, vinci provides for two layers of syntax files. One, the SY file, usually contains a library of rules which will be used for many different sets of generated sentences; the other, the US file, contains rules which specify the particular sentences to be generated on this occasion. (So typically a ROOT rule will be in a US file.) There is no difference in form between the two varieties, and either file may be installed by either command. The difference is found in the action of the commands. The SY command uninstalls all previous syntax, and installs its file. The US command uninstalls only the rules last installed by a US command, adding its rules to any which remain. It is the combination of the two sets which form the current syntax. The combination may consist of one, or the other, or both. (In other words, there may be only an SY file, only a US file, or both.)

During generation, the rules are scanned from the last up. So a rule from the US file supersedes one from the SY file (or indeed, one of the same name higher up the US file), and so on.

A caution in advance for the Preselections section. Preselection has been described in the Overview, and mention made of global and local preselections. It is important to realize that the terms global and local in that context refer to the time at which the preselections are made, not to the file in which the PRESELECT rule occurs. Global preselections are made once for many sentences; local ones are made over and over again for each individual sentence. It is very likely that the global PRESELECT rule will occur by itself in a US file, because if it hangs around and is not superseded by a local PRESELECT rule, it will be carried out again as if it were the local one, thus overriding the global selections. This, by the way, is why we have refrained from using the terms global and local for the two levels of syntax.

Warnings and error messages

Warnings and errors detected by the installation commands are reported in ivi's corefile 7, which also keeps a log of the installations. It should be appreciated that errors in some files may have serious repercussions for later ones. For example, errors in defining attributes may cause many lexicon entries to be rejected. Persons writing language descriptions are therefore advised to monitor corefile 7 during input.

During installation of language files, vinci displays a "rolling" progress message on the second-to-last line of the screen, indicating the various stages of the process. This is one of a number of transient messages, called progress messages, which vinci writes during installation, sentence generation, word creation, and so on. In vinci's early days, when some of these activities took minutes, these messages confirmed to the user that the operations were still progressing. Today, the user may remain blissfully unaware that most of them even exist, since the time between their posting and their removal may be very short, and if this takes place between successive screen refreshes, the message will not even be displayed.


Installed files may be uninstalled by the command RM_VFILES. This takes a parameter consisting of one or more of the command letter-pairs; for example:

    RM LE, MO, TB

uninstalls the lexicon, and the morphology rules and tables files. Any of the command letter-pairs may appear except VA. (vinci does not differentiate between lexicon entries installed by LE and VA.) In addition, the letter-pair GP is used to discard global preselections, and ALL to uninstall all entities.

We have already noted that some of the data structures resulting from installation relies on data previously installed, so that uninstallation (or new installation) of the latter requires uninstallation of the former. The dependencies (subject to review) are:

Parameter Also causes removal of

GP also discards local preselections as well as the most recently generated set of utterances. (The error-checking process, which might be called to operate on the current sentences, may fail if lexicon or preselections are no longer present.)

The files, including GP, actually discarded is reported in corefile 7.

Useful Hints

In order to keep track of the files which describe the sample languages we use for testing, research and teaching, we have found it convenient to give each language a short name:

    french, fairytale, ...

and to name each of the files (except for multiple US files) by this name plus a suffix. The suffix consists of two first letters of the installation command:,, french.le, ...

There are, of course, many user files for each language, and some other meaningful names must be assigned to them, usually with the suffix .us

To simplify installation we write an ivi procedure with the name of the language as file name. The procedure executes the installation commands. So the file french contains:

    LE french.le

Installation is then carried out by the ivi command:

    PR french

reducing work for the user, while ensuring that no files are missed, and that they are in the correct order. (Be sure, though, that the first command is on the first line of the procedure file; otherwise the line break on the first line will switch ivi to Typing Mode.)