vinci: a gentle introduction

This introduction contains the following sections:


vinci is a natural language generation (NLG) environment which is embedded in an editing environment (ivi) which itself runs in a terminal window.

In what follows, we will step through some basic examples which will show you how to use ivi/vinci to call up and modify elements of a language specification, use them for generation, and save and manipulate the generated output.

As a prelude to this introduction, you should:

  1. download a version of the ivi editor for your operating system (Linux, MacOS, Windows, etc.). For details, see the Downloads page.
  2. read at least the ivi Basics documentation to understand how to use the ivi editor.

After you have finished with this introduction, you should be ready to read more detailed documentation and create and test grammars on your own.

A note on typographical conventions

Before we begin, please note the following conventions:

Operating systems and terminal windows

ivi/vinci runs on a variety of operating systems, but it requires a terminal window to do so. In what follows, we will provide you with a few basic instructions for working in these environments, but you should be aware that Unix-like terminal environments (including Linux, BSD, MacOS, Windows10 Linux shell and so on) are extremely powerful and complex environments in which an enormous range of operations are available. Many tutorials are available to help you become familiar with them. For help, do a web search for linux command line and linux terminal, or something similar.

All of these environments have in common that they provide access to a terminal window in which commands may be entered and programs run. A typical terminal window looks like this:

In some of these environments, a terminal window will be open by default, but in others you will need to start one up using the commands appropriate to the operating system.

Creating a working directory

Once a terminal window is open, the first step is to create a working directory. In what follows, we will assume it's called MyVinci, but you are free to choose whatever name you wish.

To do this, first, go to your home directory by typing in the terminal window:

cd <Return>

You should see a prompt which looks something like this:


Next, create the MyVinci directory by typing:

mkdir MyVinci <Return>

Move to the newly created MyVinci directory by typing:

cd MyVinci <Return>

To check that you're in the correct directory type:

pwd <Return>

You should see something like this:


You are now ready to begin work using the ivi/vinci environment.

Downloading example datafiles

To help with the explanations to follow, a number of simple datafiles have been created by the authors of vinci. To download these, click here. Make sure they are placed in the MyVinci directory. Verify that this has been done by opening a terminal window, going to the MyVinci directory and typing:

ls <Return>

You should see the file tutorial_datafiles.tar.gz in the file listing. If it is there, you are safe to proceed. If it's not, there has been a problem with the download and you should repeat the steps above or consult a more experienced computer user.

Assuming that the file has been safely downloaded, you now need to uncompress it by typing:

gunzip tutorial_datafiles.tar.gz

followed by Returnand then

tar xvf tutorial_datafiles.tar

again followed by Return.

In the terminal window, enter the command ls to see the list of files. Among others, you should see the files and lex1.le.

If all is well, you are now ready to begin testing the generation environment. If there is a problem, repeat the previous steps, or consult a more experienced computer user.

Steps in generation

In the ivi/vinci environment, generation involves four steps:

  1. creating new language description files, or alternatively locating and possibly modifying existing files;
  2. installing language description files into the generator;
  3. generating output and examining it inside ivi;
  4. saving generated output as files.

In a single working session, you may go through these steps any number of times.

Examining, editing or creating language description files

We will begin by examining some of the files we have just downloaded. To do this, we start up ivi (by typing ./ivi). Our first goal is to examine a file which defines the parts of speech available for generation. In what follows, we will sometimes refer to these as terminals, and as a result, the file which describes them is called a terminals file. Files used by vinci may have any name composed of letters or digits or underscores. By convention, we use a one or two letter suffix to help in sorting files, but in creating new files, feel free to adopt whatever convention suits you.

Call in the terminals file for inspection using the FEtch command, by typing:

FE <Return>

The editor screen should now look like this:

Note that the first line of the file is enclosed in brace brackets. This is a comment; it is there for the human reader only. Anywhere in vinci language files, anything inside brace brackets is invisible to the generator. It is good practice to add many comments to your files to make them easier to interpret by others, or by you at some later date.

The next line of the file contains the letter N followed by a comment telling us that N stands for a noun, and the following line contains the sequence DET and a comment telling us that it stands for a determiner (sometimes called an article).

It should be clear that the terminals file defines the permissible parts of speech to be used by the generator. We will see later that other language description files will refer back to this information.

A note: the names of parts of speech are defined by you and may take any form, as long as they begin with a letter or underscore and contain no spaces. However, by convention, we always use only uppercase letters.

Now call in another file which describes a simple syntax rule. To do so, type:

FE <Return>

The file should look like this:

There are several important elements to the syntax file. The first is the existence of comments, just like in the terminals file.

The second element is the keyword ROOT. Keywords are defined within vinci and cannot be changed. ROOT tells the generator that a new syntax tree is to be started. It is followed by an equals sign and then the definition of the tree. In this instance, the tree has only one node, called N. This refers back to the N as defined in the terminals file we have just seen.

The third element is the percent sign, which tells the generation system that the rule has ended.

This is a very simple rule, which basically says that there is a syntax tree with a ROOT and one child.

Now let us read in another language description file which contains lexical information. To do so, type the command:

FE lex1.le <Return>

The file should look like this:

This file is a bit different in that it's formed of records, each on a single line. Each record defines a lexical entry. Records are composed of fields separated by vertical bars. The role of some fields is set by vinci, but the user may use others for a variety of purposes.

The first field gives the headword. Note that it's in double quotes. The second field gives the part of speech. It is not in quotes. Material in quotes belongs to the language being generated (the object language), while other symbols belong to the metalanguage.

The first entry in the lexicon is "cat"; its part of speech is N (standing for Noun, as defined in the terminals file; note how different files refer to common information).

The third and fourth fields are empty and the fifth contains the symbol #1. This is a simple morphology rule. For the moment, it is sufficient to know that #1 tells the generator to use the first field when this lexical entry is called in generation.

We have now seen three files. In order to generate output, we need to make them available to the generator itself, a process we call installing the files. There exist separate commands to install each file. In the next section, we will see how they work.

Installing files

One of the special features of ivi is that diagnostic messages are shown in Core 7. As a result, it is sometimes useful to test generation files from within Core 7. To do this, enter the command CO 7, where CO is short for COre. After the command has been entered, the screen should look like this:

The Setting random seed message is there because when ivi is started, a random seed is set which will control choices made in generation. We will see later that this may be used to repeat precisely the same output in subsequent generations. For the moment, it is safe to ignore the message.

It is now time to install the files needed for generation. Files must be installed in order, since some make use of others. Begin by installing the terminals file by means of the TMnls command. To do this, type:

TM <Return>

If the command is successful, a message should appear in the text area of Core 7 which looks like this:

On the other hand, if you have made a typing mistake, or the file is not available, you will see an error message on the status line, just above the command line, which reads:

File not found (or no read permission)

In that case, just ensure that the file exists and retype the command correctly.

Once the terminals file has been installed, you can install the syntax file using the SYntax command by typing:

SY <Return>

If all goes well, you should see the message:

Finished reading Syntax Input 
from file '' [6 lines.]

Finally, install the lexicon file by means of the LExicon command, by typing:

LE lex1.le <Return>

You should see the message:

Finished reading lexicon input 
from file 'lex1.le' [3 lines.]
Total Words: 3. Number of Errors: 0

Generating output

You are now ready to begin generation. To do this, move to text mode (by hitting the enter key while in command mode). The cursor should move to the text area. Now generate an utterance by typing <esc> <g> (depress and release the escape key and then depress and release the g key).

You should see something like this:

Congratulations! You have just generated your first utterance.

Now, generate another utterance by typing esc-g again. You may see either "cat" or "dog" (two of the words from the lexicon file). Generate several more utterances and note that the two words are chosen at random. However, the third item from the lexicon file ("the") is never chosen because its part of speech (DET) is not specified in the syntax file.

We will now remedy that.

Revising language description files

First, return to Core 1. To do this, type <Control c> to return to Command mode, then type CO 1 <Return>. You should now be in Core 1. Now, call the initial syntax file back in for editing by typing:

FE <Return>

You now want to change the file. First, press <Return> to enter text mode and then move the cursor on top of the N. Then, hit the Insert key or type Ctrl w to enter Insert Mode. The command line should now read Expecting Insert). Now type DET so that ROOT is now equal to the sequence DET N.

Save the revised file under a new name by typing:

SA <Return>

To ensure that the new syntax file exists, you can use the FEtch command by typing:

FE <Return>

The new file should appear on the screen. (Typing the command FE should recall the old syntax file.)

Generating new output from revised files

Let us now return to Core 7 and install the new syntax file, thereby replacing the old one. To do this, enter the command:

SY <Return>

A confirmation of the new file should appear in the text area. If it does, you are ready to generate a new set of utterances. First return to text mode (by hitting Return) and then type <esc> <g> several times. You should see something like this:

Generating output in other corefiles

So far, all output has appeared in Core 7, interspersed with diagnostic and error messages. It would be nice simply to see the output alone. To do this, move to an empty corefile (in this case, Core 2) by entering the command CO 2. In Core 2, enter text mode by typing Return and enter the command:

esc m 0 <Return>

You should see in the text area either

the cat


the dog

Now, type esc m 0 <Return>; three more times. You should see three more occurrences of the same string you saw the first time. This is because esc m 0 <Return> simply inserts into the current corefile the currently generated string. To get a new string, you need to type esc g and then esc m 0 <Return>. If you do this enough times, you will see a different string in the corefile.

Saving output to a file

To save the results of this output, go to command mode (<ctrl c>) and enter the command SA fred <Return> (feel free to choose something else in place of fred). To exit ivi, type the command GO and hit <Return>. You should find yourself back in the terminal window. Next time you return to ivi, typing FE fred will call the output file back.

You have now called in some already existing files, used them for generation, edited one of the files, generated again, saved our output and exited ivi. Everything which follows will be a variation on this.

Using attributes

One of the problems with generated output so far is that there is no way of inflecting words to show number (cat - cats), tense (run - ran) and so on, or selecting words according to their meaning. In vinci, one of the ways this may be accomplished is by means of attributes.

In their simplest form, attributes are sets of values allocated among distinct classes. Users may define any names they choose for classes and values, as long as names begin with a letter or underscore and include no spaces. By convention, in what follows, and elsewhere in our research, we have adopted the convention whereby attribute classes begin with a capital letter, while attribute values are all in lowercase.

To be used, attributes must be specified in a file which is installed before any other file which uses attributes. To illustrate this, we will begin by examining a simple attribute file. Make sure you are in the MyVinci directory and then start ivi. Inside ivi, type:

FE <Return>

You should see a file which looks like this:

Examination of this file shows that it defines a class Number with values sing and plur and a class Things with values animal and plant.

Once a set of attributes has been defined, it may be used in other files. To show how this can be done, we will examine a variant lexicon file by typing:

FE lex2.le <Return>

This will bring up a file which looks like this:

Note that in a lexicon file, attributes appear in the third field. In the example shown here, each lexical entry contains one of the values for the class Number and one for Things. The distinction between singular and plural nouns is captured by having two entries for each noun. Within the third field, attributes are separated by commas.

Similarly, a syntax file may refer to attribute values or classes in order to specify in more detail the nodes of a tree. To illustrate this, let us begin by calling up the file, by typing:

FE <Return>

We see this:

In a syntax file, attributes attached to a node are placed within square brackets which immediately follow the node. In this case, the attribute specifies that the N chosen from the lexicon by the syntax rule must carry the value sing. In other words, only singular nouns will be chosen.

In order to make attribute classes and values available to other files, the ATtribute command is used.

In order to generate utterances using the files we have just seen, once ivi has been started, the following commands must be entered:

CO 7 <Return>
AT <Return>
TM <Return>
LE lex2.le <Return>
SY <Return>

These move the focus to Core 7 and then install the various files. On the basis of this, we would expect to generate a series of singular nouns, and this is in fact what we see when we move to text mode and type a series of esc g. Output should look something like this:

As an experiment, edit the file to replace sing by plur, SAve it under a the name, install the new syntax file using the SYntax command, and generate output. You should see plural nouns.

As a further experiment, change the syntax file again to obtain only plural animal names. (Hint: attributes in syntax rules are also separated by commas.)

Morphology rules: the basics

In the lexicons we have seen so far, inflected forms of words appeared as separate lexical entries. In languages with simple morphology like English, this is perhaps not an insurmountable obstacle as the lexicon increases in size, but in others with richer morphologies the result would be an unreasonable expansion. To deal with this, vinci includes mechanisms for inflecting lexical entries based, among other things, on the attributes present on the syntax nodes. In what follows, we will show how this may be done in a few simple cases.

Consider first the case of cat and cats. The only difference between the two is the addition of s to the plural form. We can capture this by means of a simple rule like thisi. Note that anything is brace brackets is a comment and ignored by the generation system:

  {First, we name the rule}
rule 1                  

  {Now we specify the subrules}
  {If the attribute 'sing' is present,
    use field 1}
sing : #1;           
  {If the attribute 'plur' is present, 
   add s to field 1}
plur : #1 + "s";     

  {Now we end the rule}

This rule is defined in a morphology file. (The file has been included with the tutorial materials as For it to be used to inflect words, it must be referred to in each lexical entry to which it will apply. This is done by editing the contents of fields 3 and 5 of each lexical entry. If we modify our old lex2.le file, the result would look like this:

"cat"|N|Number, animal||$1|
"dog"|N|Number, animal||$1|
"bush"|N|sing, plant||#1|
"bushes"|N|plur, plant||#1|
"tree"|N|Number, plant||$1|

Note how in the case of cat, dog and tree the #1 in field 5 has become $1. The symbol $ followed by a string of digits or letters points to a rule name in the morphology file. (The revised lexicon file has also been included in the tutorial package as lex3.le.) Note also that in these same lexical entries, the attribute class Number appears. This means that all values in the class are now possible (in this case, sing and plur).

To test the new morphology file and lexicon, enter the following commands in ivi. Note the addition of a new command MO which installs the morphology file. Note also the syntax file which calls for the production of a determiner followed by a plural noun.

CO 7 <Return>
AT <Return>
TM <Return>
MO <Return>
LE lex3.le <Return>
SY <Return>

Type <Return> to enter text mode and generate some utterances. If you have installed files appropriately, you should see appropriately formed plural nouns.

The perspicacious reader will have noted that there are still two entries in the lexicon for bush whose plural requires addition of es to the base form. As an exercise, revise the morphology file to add a second rule which captures this fact, and the lexicon to call this new rule, and generate some utterances using these revised files. You are now on your way to producing your own morphological descriptions. Of course, vinci's morphological mechanisms include a rich set of operations which are beyond the scope of this simple tutorial. See the Overview and the Manual for details.


You should now have a good sense of how to use ivi/vinci to generate simple utterances, and how to extend language descriptions to capture richer sets of possible structures. Over the past twenty years, the authors of ivi/vinci have used it to generate a wide range of output in several languages. Discussion of this appears in our various publications.