The STore command
In this section, we will illustrate and discuss the several textual representations produced by the STore command. To do this, we make use of examples produced for the utterance:
the fairy godmother passed a magic sword to the prince
The utterance is, of course, a variation on those associated with our much loved generous professor. In this case, it is a good supernatural who presents a magic artifact to the hero of a fairy tale, to assist him in rescuing the victim/heroine.
The sentence, taken from a generated fairy story, was created using a global preselection (a dramatis personae):
PRESELECT = twit: PN/"Midas"; hero: PN[male, brave, handsome]; victim: PN/_pre_ twit/@14: daughter; villain: PN/"Merlin"; goodfairy: PN[female, good, supernatural]; magicobj: N[physobj, magic] %
a local preselection for the current utterance:
PRESELECT = vtdi; action: V[give]; agent: PN/_pre_ goodfairy; beneficiary: PN/_pre_ hero; theme: N/_pre_ magicobj %
and syntax for vtdi sentences:
ROOT = ... < _pre_ vtdi: NPP[agent, def, nomin] V[p3, sing, past]/_pre_ action ( {this "dative shift" applies only to some vtdi verbs} NPP[beneficiary, def, accus] NPT[theme, indef, accus] | NPT[theme, indef, accus] NPP[beneficiary, def, dative] ) ... %
where PN is a word category for proper nouns, NPP is a noun phrase for proper nouns (which can yield: "Midas", "the king", "a king" or a pronoun, depending on the attributes), NPPX is a subsidiary used by NPP, and NPT is a common noun phrase (yielding the sword or a sword; not currently a pronoun). The attribute dative, and certain others, cause both NPP and NPT to prefix their phrase with an appropriate preposition.
Note that the syntax allows for the so-called English dative shift: "she gave the prince a sword", as an alternative to: "she gave a sword to the prince".
Single Tree
In the first instance, let us assume that the cluster of trees produced by the generation consists of a single ROOT tree.
The full tree representation in path-signature style, produced by the command STore 1 | 0, is:
$u $t 999999 N|G|ROOT||0|""|| S|G|NPP|agent, def, nomin|0|""|| SS|G|NPPX|agent, def, nomin|0|""|| SSS|G|DET|def|0|""|LEX_7| SSSR|G|PN||1|""|LEX_4| SR|G|V|p3, sing, past|0|""|LEX_6| SRR|G|NPT|theme, indef, accus|0|""|| SRRS|G|DET|indef|0|""|LEX_8| SRRSR|G|N||0|""|LEX_5| SRRR|G|NPP|beneficiary, def, dative|0|""|| SRRRS|G|PREP|dative|0|""|LEX_9| SRRRSR|G|NPPX|beneficiary, def, accus|0|""|| SRRRSRS|G|DET|def|0|""|LEX_10| SRRRSRSR|G|PN||1|""|LEX_1| $t 0 N|G|ROOT||0|""|| S|G|NPP|agent, def, nomin|0|""|| SS|G|NPPX|agent, def, nomin|0|""|| SSS|G|DET|def|0|"the"|LEX_7| SSSR|G|PN||1|"fairy godmother"|LEX_11| SR|G|V|p3, sing, past|0|"passed"|LEX_6| SRR|G|NPT|theme, indef, accus|0|""|| SRRS|G|DET|indef|0|"a"|LEX_8| SRRSR|G|N||0|"magic sword"|LEX_5| SRRR|G|NPP|beneficiary, def, dative|0|""|| SRRRS|G|PREP|dative|0|"to"|LEX_9| SRRRSR|G|NPPX|beneficiary, def, accus|0|""|| SRRRSRS|G|DET|def|0|"the"|LEX_10| SRRRSRSR|G|PN||1|"prince"|LEX_12| $L {LEX_0} "Midas"|PN|male, rich, vain, weak||#1||||||||type:"king"/N|daughter:"Marie"|home:"castle"/N| {LEX_1} "Braveheart"|PN|male, kill.monster, handsome, brave, strong||#1||||||||type:"prince"/N|| {LEX_2} "Marie"|PN|female, beautiful, kind||#1||||||||type:"princess"/N|| {LEX_3} "Merlin"|PN|male, bad, supernatural||#1||||||||type:"sorcerer"/N|| {LEX_4} "Wanda"|PN|female, good, supernatural||#1||||||||type:"fairy godmother"/N|| {LEX_5} "magic sword"|N|neuter, magic, physobj||#1|| {LEX_6} "pass"|V|Number, Personne, Tense, give||$v7||"passed"| {LEX_7} "the"|DET|def||#1|| {LEX_8} "a"|DET|indef||#1|| {LEX_9} "to"|PREP|dative||#1|| {LEX_10} "the"|DET|def||#1|| {LEX_11} "fairy godmother"|N|female||#1|| {LEX_12} "prince"|N|male||#1|| %
The textual representation begins with $u and ends with %. (In the intended design, the identifier for an individual cluster of trees was to appear on the $u line.) For a ROOT tree only, it has three parts, heralded by:
$t 999999 $t 0 $L
The first part describes the ROOT tree before it has undergone any syntax transformations. $t 0 describes the transformed ROOT tree, the final tree for the utterance. In this case the tree-shape is unchanged from the first tree, though its leaf nodes now include the words created by the morphology. The third part contains copies of all lexicon entries used in the generated sentences.
In the tree sections, each line represents a single tree node, and has seven fields, containing its:
- path signature
- colour
- metavariable
- attribute list
- number of attached indirections
- generated word
- reference (pointer) to a lexicon entry
The last five fields are self-explanatory.
The path signatures define the shape of the tree. The root node has path N (which actually denotes an empty path). In all other cases, the path describes how to get to the node from the root. Reading it sequentially, S tells us to go to leftmost child from the current position, R tells us to go to the next right sibling.
Thus, in the present example,
- NPP (path S) is the leftmost child of ROOT
- NPPX (path SS) is the leftmost child of this NPP
- DET (path SSS) is the leftmost child of this NPPX
- PN (path SSSR) is the second child of NPPX
- V (path SR) is the second child of ROOT
- NPT (path SRR) is the third child of ROOT
- etc.
We will not discuss the properties of the path notation here. Suffice it to say that the nodes are in an order (parent before child, left sibling before right) which allows the tree to be easily rebuilt node by node.
In our environment, the trees are sometimes transmitted to, or imported by, a program called disptree, which displays them graphically. The colour field tells disptree what colour to use for the node, in every case here G (green). Nodes in other colours were sometimes added later.
disptree was oriented to an earlier environment (Solaris 8 and Openwindows), and has not been compiled for any other. In preference to making it available, we may, at some future time, add a further style to STore to produce data for a commonly available download such as graphviz. The "commands" ($u, $t, $L, ...) are part of a much larger set, which can serve to direct a more comprehensive user interface.
The lexicon section contains all the lexicon entries used in the tree cluster, making the representation independent of any changes to the lexicon between storage and recovery. It also avoids the need for REcover (see below) to re-scan the lexicon or to re-perform indirections in order to carry out student error-checking.
The order of the LEX entries is dependent on the order in which tree nodes were developed. In this example, LEX_0 through LEX_5 are the global preselections, LEX_6 the local one. (Only one lexicon entry, action, is preselected locally; the other local preselections involve references to global ones.) LEX_7 through LEX_10 are the various determiners and prepositions requested by the syntax. LEX_11 and LEX_12 arise from LEX_4 and LEX_1 respectively, by way of indirections.
LEX entries may be marked unused implying that vinci
did not search the lexicon for some terminal node. This arises in the example of The Generous Professor if his gift, say, is pronominalized. Since the syntax itself specifies its gender and the noun node is discarded by the pronomial transformation, there is no need for vinci
to select the noun entry itself.
Note that in the example above, LEX_0, LEX_2 and LEX_3 are not used in the utterance. They are not marked unused, however, because they have corresponding lexicon entries.
LEX entries may also display an error message if the corresponding search was unsuccessful.
Indentation Style
The full tree in indentation style, produced by the command STore 2 | 0, corresponding to $t 0, is:
$t 0 ROOT||0|""|| NPP|agent, def, nomin|0|""|| NPPX|agent, def, nomin|0|""|| DET|def|0|"the"|LEX_7| NP||1|"fairy godmother"|LEX_11| V|p3, sing, past|0|"passed"|LEX_6| NPT|theme, indef, accus|0|""|| DET|indef|0|"a"|LEX_8| N||0|"magic sword"|LEX_5| NPP|beneficiary, def, dative|0|""|| PREP|dative|0|"to"|LEX_9| NPPX|beneficiary, def, accus|0|""|| DET|def|0|"the"|LEX_10| NP||1|"prince"|LEX_12|
This is visually more friendly to a human reader. We can clearly see the four children of ROOT: NPP, V, NPT and NPP, along with their children and grandchildren. It is a little less convenient for a subsequent computer algorithm.
List of Leaf Nodes
The abbreviated output, with leaf nodes only, produced by STore 0 | 0 for the same utterance is:
$u $l 999999 DET|def|0|""|LEX_7| PN||1|""|LEX_4| V|p3, sing, past|0|""|LEX_6| DET|indef|0|""|LEX_8| N||0|""|LEX_5| PREP|dative|0|""|LEX_9| DET|def|0|""|LEX_10| PN||1|""|LEX_1| $l 0 DET|def|0|"the"|LEX_7| PN||1|"fairy godmother"|LEX_11| V|p3, sing, past|0|"passed"|LEX_6| DET|indef|0|"a"|LEX_8| N||0|"magic sword"|LEX_5| PREP|dative|0|"to"|LEX_9| DET|def|0|"the"|LEX_10| PN||1|"prince"|LEX_12| $L {LEX_0} "Midas"|PN|male, rich, vain, weak||#1||||||||type:"king"/N|daughter:"Marie"|home:"castle"/N| {LEX_1} "Braveheart"|PN|male, kill.monster, handsome, brave, strong||#1||||||||type:"prince"/N|| {LEX_2} "Marie"|PN|female, beautiful, kind||#1||||||||type:"princess"/N|| {LEX_3} "Merlin"|PN|male, bad, supernatural||#1||||||||type:"sorcerer"/N|| {LEX_4} "Wanda"|PN|female, good, supernatural||#1||||||||type:"fairy godmother"/N|| {LEX_5} "magic sword"|N|neuter, magic, physobj||#1|| {LEX_6} "pass"|V|Number, Personne, Tense, give||$v7||"passed"| {LEX_7} "the"|DET|def||#1|| {LEX_8} "a"|DET|indef||#1|| {LEX_9} "to"|PREP|dative||#1|| {LEX_10} "the"|DET|def||#1|| {LEX_11} "fairy godmother"|N|female||#1|| {LEX_12} "prince"|N|male||#1|| %
The trees are replaced by lists of their leaf nodes; the list headers are marked by $l (lowercase L) instead of $t; and path and colour fields are omitted. Otherwise the representation is the same.
REcover creates a tree corresponding to $l 0 with ROOT as root and the eight lines as children.
Multiple Trees
The output for multiple trees is very similar. To illustrate this, context-free rules were added for QUESTION and for R_6, both simply being copies of ROOT. vinci
selected a different 'give' verb:
Question : the fairy godmother handed a magic sword to the prince R_6 : the fairy godmother handed a magic sword to the prince
The output for STore 0 | 0 is:
$u $l 1 DET|def|0|"the"|LEX_7| PN||1|"fairy godmother"|LEX_11| V|p3, sing, past|0|"handed"|LEX_6| DET|indef|0|"a"|LEX_8| N||0|"magic sword"|LEX_5| PREP|dative|0|"to"|LEX_9| DET|def|0|"the"|LEX_10| PN||1|"prince"|LEX_12| $l 6 DET|def|0|"the"|LEX_7| PN||1|"fairy godmother"|LEX_13| V|p3, sing, past|0|"handed"|LEX_6| DET|indef|0|"a"|LEX_8| N||0|"magic sword"|LEX_5| PREP|dative|0|"to"|LEX_9| DET|def|0|"the"|LEX_10| PN||1|"prince"|LEX_14| $L {LEX_0} "Midas"|PN|male, rich, vain, weak||#1||||||||type:"king"/N|daughter:"Marie"|home:"castle"/N| {LEX_1} "Braveheart"|PN|male, kill.monster, handsome, brave, strong||#1||||||||type:"prince"/N|| {LEX_2} "Marie"|PN|female, beautiful, kind||#1||||||||type:"princess"/N|| {LEX_3} "Merlin"|PN|male, bad, supernatural||#1||||||||type:"sorcerer"/N|| {LEX_4} "Wanda"|PN|female, good, supernatural||#1||||||||type:"fairy godmother"/N|| {LEX_5} "magic sword"|N|neuter, magic, physobj||#1|| {LEX_6} "hand"|V|Number, Personne, Tense, give||$v7||"handed"| {LEX_7} "the"|DET|def||#1|| {LEX_8} "a"|DET|indef||#1|| {LEX_9} "to"|PREP|dative||#1|| {LEX_10} "the"|DET|def||#1|| {LEX_11} "fairy godmother"|N|female||#1|| {LEX_12} "prince"|N|male||#1|| {LEX_13} "fairy godmother"|N|female||#1|| {LEX_14} "prince"|N|male||#1|| %
The list for the untransformed ROOT is now omitted, but lists are shown for QUESTION ($l 1) and R_6 ($l 6). Since syntax transformations and indirections are now applied independently for each tree, we see separate LEX entries for "fairy godmother" and "prince".
The corresponding output for STore 1 | 0 and STore 2 | 0 are self-evident and need no further comment.
Interplay of STore, SAve and REcover
The combination of STore and SAve allows ivi/vinci
to create a 'replayable' grammar. For example, consider the following steps:
- Load a
vinci
grammar - Generate an utterance with <Esc g>
- Go to a blank corefile and run the command STore
- SAve the data in the corefile
- Quit
ivi
- Sometime later, reopen
ivi
- Reload the grammar from step (1)
- Run the command REcover <filename>, where <filename> is the one to which the previously STored file has been SAved
- Go to a corefile and type one or more of <ESC m > <digit> <return>
The string generated originally will appear in the corefile. Alternatively, the REcovered grammar output may be used as the basis for error analysis. So, for example, if a string appears in some corefile and the command <Esc K> is typed at the beginning of the string, vinci
will compare the string against the model produced by the grammar.