VinciLingua: how it works

There are two intertwined facets to the VinciLingua project: use of natural language generation to produce exercises on the one hand, and use of web technologies to allow learners to explore data and arrive at concepts in textbooks and other web environments. Each makes use of specific pieces of software, as we will see below, but they are complementary and offer the potential for an overarching synthesis. We will begin with natural language generation. An overview is presented in the What section. We will be interested here in how the system works at a more detailed level. Those wanting even more detail can consult the Vinci Project page.

As the following figure shows, the generative part of VinciLingua may be thought of as a choreography involving four elements, as well as two people.

the architecture of the VinciLingua system

The starting point here is the ivi/Vinci environment, a natural language generation system embedded in an editing environment. Ivi/Vinci is written in C and runs on a variety of platforms. It provides a linguist with a friendly environment for writing grammars. A linguist can use the ivi/Vinci environment, and specifically its various metalanguages (for semantics, lexicon, syntax, morphology, etc.) to define some grammar. The grammar is used to generate utterances. Examination of these allows for debugging until output is satisfactory. Of course, once defined, grammars may be used in a variety of exercises.

At the same time, the linguist and the computer scientist define the instr module (currently written in C) which acts as a proxy for a human instructor. In its simplest form, it decides which grammar to run, noting this in a database, along with details on which student is doing which exercise and what results were obtained. In a more complex form, it can provide for adaptive generation, where previous results are used to direct subsequent generation. There is thus a dialogue between ivi/Vinci and instr.

Once instr has decided on a grammar to run, it asks ivi/Vinci to produce an utterance. This is handed to stdnt, essentially an interface module, which sets up a web page for some specific kind of exercise (fill-in-the-blanks, multiple choice, audio, visual, etc.) and using php, javascript, css and html, produces a webpage. stdnt also collects student responses and passes them back to ivi/Vinci for analysis by instr. The learner sees only the web interface created by stdnt, not the underlying architecture. At the same time, a record is kept of what the student has done and what analysis has been made. Examination of this log provides clues about a student's progress.

This, in a nutshell, is how the system works. We have ignored a variety of issues, including different classes of users. Thus, teachers can create exercises and tests, while students can only take them. When the system is used in a for-marks context (as is currently the case with the version run by Continuing and Distance Studies at Queen's University), individual users are identified and logs kept of their interactions. However, in the freely available experimental version, users are anonymous. Anonymized logs are kept of interactions, but there is no 'memory' of individual users.

We turn now to the interactive environment used in textbooks and pedagogical experiments. The following image shows its primary components.

components of the VinciLingua system

The starting point is some html code to provide the contents of a page. It is important to note that this html includes semantic information about the nature of the elements on the page. For example, for an exploratory activity on parts of speech, part of the background html looks like this:

a lexicon with part of speech markup

Using this information, it is possible to combine CSS (Cascading Style Sheets) and jQuery to allow a learner to select a text, choose a part of speech, and click on words in the text to determine if they are members of this part of speech. If the learner is correct, the word turns green. If not, it turns red. The result looks like this:

an exercise for finding parts of speech in a text

Using the results, a learner can begin to form a conceptualization of the different parts of speech. Of course, it is possible for multiple texts to be made available. More interestingly, since the semantic html itself is just a textual document, it is possible for ivi/Vinci to generate the html and to adapt it in light of a learner's progress. Exploring this interaction between generation and exploration is a key part of our ongoing research.