The House that Jack Built

© Michael Levison

This began as an assignment in two courses. In one, humanities students were given some basic vinci syntax and asked to generate a few sentences by hand. The syntax yielded (meaningless) Jack-like verses. Subsequently I expanded the lexicon to cover the complete vocabulary, and tried to impose restrictions to bring this closer to the original poem. The other involved computing students who were studying non-imperative programming languages. It called for a Haskell program to generate the complete poem, given lists of the components.

This analysis was written in 2000-1, and is shown here for entertainment (of the author, at least, if not the reader!) and because it illustrates very well some of the features and the limitations of vinci. Changes to vinci since then affect some of the remarks, but apart from adding a couple of notes and altering this introduction, I have not attempted to revise the original.

The Poem

The poem is of the "iterative expanding" class, which begins with a simple verse and builds up by adding a new line in each successive verse. The final one is:

    This is the farmer sowing the corn,
    That kept the cock that crowed in the morn,
    That waked the priest all shaven and shorn,
    That married the man all tattered and torn,
    That kissed the maiden all forlorn,
    That milked the cow with the crumpled horn,
    That tossed the dog,
    That worried the cat,
    That killed the rat,
    That ate the malt
    That lay in the house that Jack built.

In effect each verse introduces a new character with three elements: a name, a description and an action. Thus:

Name Description Action
house that Jack built <none>
malt <none> lay in
rat <none> ate
cat <none> killed
dog <none> worried
cow with a crumpled horn tossed
maiden all forlorn milked
man all tattered and torn kissed
priest all shaven and shorn married
cock that crowed in the morn waked
farmer sowing the corn kept

Labelling these elements N[1], D[1], A[1], N[2], D[2], A[2], ... each verse begins:

    This is the N[i] D[i]
    That V[i] the N[i-1] D[i-1]
    That V[i-1] the N[i-2] D[i-2]
    That V[2] the N[1] D[1]

The Basic Syntax

I have mentioned a Haskell program to generate the complete poem from lists of the components. In contrast, we will consider generating Jack-like poems using vinci syntax.

A simple syntax which does this is:

    ROOT     = START (MIDDLE | ) END %
    START    = TI N (ADJ | ) %
    RELATIVE = RP V N (ADJ | ) %
    END      = RP N V %

where part of the lexicon is:

    "lay in"|V|||#1||

    "the house"|N|||#1||
    "the farmer"|N|||#1||

    "with the crumpled horn"|ADJ|||#1||
    "all forlorn"|ADJ|||#1||
    "that crowed in the morn"|ADJ|||#1||
    "sowing the corn"|ADJ|||#1||

    "This is"|TI|||#1||

Here, START is the first element ("This is the maiden all forlorn"), and MIDDLE represents a sequence of RELATIVEs ("that milked the cow with the crumpled horn"). The order RELATIVE MIDDLE causes the leftmost RELATIVE to be high on the tree, while the rightmost is deeper. In departure from the schema shown earlier, we have treated END ("that Jack built") as a relative clause in its own right, rather than an adjectival phrase describing the previous noun. In contrast to the RELATIVEs, however, the relative pronoun is the object of the verb not the subject.

Some typical sentences generated by the syntax follow. Long ones are rare, but this could be altered by inserting several copies of the second alternative for MIDDLE. (The output has been reformatted for ease of reading.)

    This is the cow all forlorn
    that ate the cow
    that ate the farmer all tattered and torn
    that the cock milked

    This is the cow
    that tossed the rat
    that married the cow
    that tossed the cow
    that the cock milked

    This is the rat
    that tossed the dog all shaven and shorn
    that the man worried

    This is the man
    that kissed the rat
    that tossed the cow
    that the dog worried

As we see, nothing in this example prevents the same noun being chosen several times in a sentence. One solution to this is to apply the frequency variation N/$0 on the various nouns and verbs to ensure that each is chosen only once.

Adding Restrictions

Suppose that we wish to constrain the poem so that each character can only be described by the correct adjectival phrase ("maiden all forlorn") and each performs only their proper action ("priest that married"). We assign each of the characters a value in the attribute type Actor, placing this in the attribute field of the noun. We also place it in the attribute fields of the appropriate adjectival phrase and the verb for which it is the proper subject.

Typical lexical entries become:

    "lay in"|V|malt||#1||

    "the house"|N|house||#1||
    "the farmer"|N|farmer||#1||

    "with the crumpled horn"|ADJ|cow||#1||
    "all forlorn"|ADJ|maiden||#1||

    ""|ADJ|jack, house, malt, rat, cat, dog||#1||

We have added a verb "shelter" as suitable action for "house", since this can now occur anywhere in the sequence of relative clauses. (In the poem itself, it is the only noun which takes no action.) We have also added the empty string as an adjectival phrase for those nouns which don't have one. An alternative would be to use guards in the syntax rules so that the ADJ alternative is not followed for specific Actor-values.

The revised syntax is:

    ROOT =
           choose Ac: Actor;
           START[Ac] (MIDDLE[Ac] | ) END %

    START =
           inherit Ac: Actor;
           TI N[Ac] (ADJ[Ac] | ) %

    MIDDLE =
           inherit Ac1: Actor;
           choose Ac2: Actor;
           (RELATIVE[Ac1.v, Ac2] | RELATIVE[Ac1.v, Ac2] MIDDLE[Ac2]) %

           inherit Ac3: Actor/v;
           inherit Ac4: Actor;
           RP V[Ac3] N[Ac4] (ADJ[Ac4] | ) %

    END =
           choose Ac: Actor;
           RP N[Ac] V[Ac] %

There is a small complication here. Commonly, the nodes of a syntax tree which are to be in agreement are children of a single parent. In this case, one of the agreeing nodes, V, is the one on the next level down. We could get around the problem by making MIDDLE a sequence of phrases N RP V ("the priest that married"), which places all the agreeing items on one tree level, but this is unnecessary.

Each time MIDDLE is expanded, a new Actor-value is chosen and given to RELATIVE for its N and ADJ. It is also passed to the MIDDLE child. The value inherited by MIDDLE from above is given to RELATIVE for its V node. Since MIDDLE is passing two Actor-values to RELATIVE, it marks one of them by compounding it with attribute v, which RELATIVE removes.

Two long sample sentences:

    This is the maiden all forlorn
    that milked the cat
    that killed Jack
    that built the cow with the crumpled horn
    that tossed the man all tattered and torn
    that kissed the cow
    that tossed the maiden
    that the house sheltered

    This is the farmer sowing the corn
    that kept the cow with the crumpled horn
    that tossed the cat
    that killed the dog
    that worried the cat
    that killed the maiden
    that milked the rat
    that the man kissed

Suppose we change END to RP V N ("that built Jack"). Can we ensure agreement between the verb in END and the noun in the final RELATIVE, even though this is deep down the tree and remote from it?

Yes. We simply choose an Actor-value for END when ROOT is expanded, and pass it both to END and MIDDLE. (The copy passed to MIDDLE, of course, must be compounded, say with e, to distinguish it from the local Actor-value). Each MIDDLE passes this to its MIDDLE child. When MIDDLE is expanded for the last time (i.e. when its left alternative is selected), it hands RELATIVE this value (with e removed), rather than a newly chosen one.

A Diversion

What if we try to generate a similar poem in a gendered language in which either the relative pronoun or the verb must agree in gender with the referent? Using only the basic features of vinci, we find we can't do it!

The basic method in vinci for ensuring agreement between nodes is to choose an attribute-value higher up the tree and pass it down to each of them. The words selected for the nodes are then restricted to (or morphologically converted to) the value. The implication in the present case is that nouns receive not only an Actor-value, but a gender-value as well. But then, only by chance is a word selection possible, since the sole noun for a particular Actor-value may have the wrong gender.

We can circumvent this by augmenting the lexicon ("la chienne", "la chatte" and "Jill" as well as "le chien", "le chat" and "Jack"), but we haven't overcome the underlying problem, only concealed it. What we really want is to choose a noun, and pass its gender to the agreeing nodes. We can achieve this in vinci using preselection, but there are two resultant issues, one technical, the other philosophical.

The technical difficulty stems from the need to preselect several nouns, assigning them tags such as actor.n1, actor.n2, ... If we try this, we find that we would like a variable to store an n-value, and a successor function to compute the next one, features which vinci does not have. We might get by if we use actor, actor.n, actor.n.n, ..., but I haven't considered the details.

The philosophical issue is whether this really constitutes a solution to the problem. In writing the preselection rule, we directly specify the resulting poem, leaving nothing to the generation. vinci is reduced to the role of Haskell in the coursework problem, filling the blanks in a predefined template.

We can actually restore random generational capacity to this by way of a two-level grammar. The upper level is a vinci grammar which randomly selects a sequence of nouns and generates a preselection rule. The lower level, the grammar referred to above, uses this rule to generate the poem.

Further Restrictions

Returning to the original problem, we have previously constrained the poem so that characters perform only their proper actions. It would have been just as easy to ensure instead that actions are applied only to their proper objects ("milk the cow", "eat the malt", "lay in the house"). Doing both simultaneously provides the ultimate realization of the original goal. Only the poem itself, or some subsequence of it, fits both sets of constraints.

What happens if we try this?

In practice, vinci will rarely generate a sentence at all. Verbs after the first will be handed two attributes, one to specify the subject, the other the object. But only one in 12 times will there be a verb which fits. So short sentences will show up occasionally; those near maximum length, only once in billions of tries!

This time, vinci preselection doesn't help. What we need is to preselect a sequence of verbs. The first can be chosen at random. Once chosen, its object type must be extracted, and this fixes the subject type of the next verb, and so on. In other words, we must preselect the lexical entry for each verb before we can specify features for the preselection of the next. Unfortunately, the current version of preselection requires us to specify all preselected words before any is looked up in the lexicon.

Multiple levels of grammar might be used. The first preselects a verb, inspects its lexical entry, and creates a preselection rule for the second, and so on. (Each level also passes on all preselections from earlier levels.) Hardly an elegant solution! We need a generalization of the vinci preselection feature. The desired generalization in which later preselections can depend on features found in earlier ones has now been implemented.

Prolog (and Epilogue!)

The Haskell program referred to earlier trivially generates verses of the poem. But it does so by substituting into templates strings taken in order from fixed lists. The lists themselves constitute the solution to the problem, and have been generated by the programmer.

A complete vinci grammar, given the present version of vinci, will generate the desired verses at random, but only rarely. We may not see a complete long one in the lifetime of the computer.

An enhanced preselection mechanism, in which words were preselected one by one with their features available for the next preselection, would permit vinci to generate the verses instantly.

A final thought: what if we try the program in Prolog? I haven't attempted this, but we can presumably express the generation problem and the constraints in form of a Prolog program. Prolog will then search for a verse of the poem which satisfies the desired relations. To do this, it follows a tree of possibilities, backtracking where necessary, until a solution is found. When it finds one, it may be asked to "resatisfy" the constraints to locate a further solution. In the present context, it may find some verses quickly, but may take many billions of attempts to locate others. In fact, this is quite similar to the present vinci, differing from it in two regards. Firstly, the search is systematic rather than random; secondly, the order of search (and thus the time taken for any solution to appear) is determined by the order which the programmer chooses for the Prolog facts and rules. The programmer can therefore arrange for a complete verse to arise early in the search.

We end with an addendum from one of the authors to the other in recognition of his recent spell in high office.

    This is the sun which glows in the dawn
    That wakened the dean not shaven or shorn
    That looked at the budget carefully drawn
    That paid the professor weary and worn
    That lectured the student stifling a yawn
    That read from the textbook tattered and torn
    That lay on the shelf
    That held up the wall
    That stood in the Hall that jerry built