ivi and Vinci - A History
Michael Levison, December 2016
Prelude by Greg Lessard
Sadly, Michael Levison passed away in 2019. The following history he wrote three years before is now almost ten years old. Nevertheless, it provides in my opinion a remarkable insight into the history of an academic project in general and the Vinci project in particular. Some of the work he describes is still used and the publications to which it gave rise and the research teams over the years appear elsewhere on this site. However, the work on VinciLingua has advanced, with some changes and additions:
- new languages have been added, with a particular focus on indigenous languages
The ivi/Vinci project and program, which in recent years has become known as VinciLingua (VL for short), is the amalgamation of two components: ivi and Vinci. The former, originally the "ivi text-editor", became the text-editor of choice for a large section of Queen's University's administration at the end of the 1970s and the early years of the 1980s. It was known to them as Q'Text, its trademarked name.
In recent years, I have sometimes seen misleading information about its "history", which has led me, as its progenitor, to clarify its story from its origins in 1974-5 to the present.
In 1974-5, students in Computer Science learned the concepts of programming at many levels: from the initial description of an algorithm in a natural language, such as English; to its representation in some "high-level" programming language, say, Fortran or Pascal; to its rendering in the basic "machine-language" of a particular computer; down to the implementation of specific machine instructions by logic circuitry; and perhaps, for a student versed in electronics, to the implementation of the underlying logic elements by physical components: thermionic valves, transistors, etc.
At this time, I was teaching a second-year undergraduate course in Computer Science which dealt with programming at the machine-language level. A hampering aspect of this is that, at this level, while the instructions for each make of machine bear a resemblance to one another, they differ considerably in detail. Thus, in an era when new machines were being introduced every year or two, it was difficult to maintain a consistent environment for student practice.
My solution was to make use for learning purposes of an idealized "machine", which could be simulated ("interpreted") on any actual computer. The interpreter could also allow students to execute program instructions one at a time, while looking at the contents of memory cells on a display screen -- a feature useful for learners but not available on an actual computer. It would also be feasible to have variant designs for the simulated machine, specified by individual instructors, with the interpreter being initialized acccordingly. Thus was born the Interactive Visual Interpreter or "ivi".
I discussed this with an MSc student, R.L. Stevens, who indicated that he would like to develop such a concept under my supervision for his MSc thesis.
Towards the end of this project, I noted a deficiency in the plans which we had discussed -- it would be useful to provide an editor, allowing students to type their programs and enter them in the memory of the simulated computer.
I sketched out a design for a so-called wysiwyg ("what you see is what you get") editor, and this was added to the overall system.
In fact, the system was never used for its intended purpose. In the academic year following its construction, courses were shuffled, and I was needed elsewhere. The relevant course was assigned to another professor, who had different thoughts about its shape and contents.
And there, "ivi", along with the "ivi text-editor", would have died, except for a coincidence.
In 1975, three Queen's professors: John Matthews, D.M. Schurman and J.A.W. Gunn, created the Disraeli Project, whose purpose was to publish the collected letters of Benjamin Disraeli. Many of these had been previously unknown, but had been tracked down by two of these professors as a sabbatical project. As it happens, one of the two was the next-door neighbour of a colleague in my Department. My colleague was aware of the ivi editor, and he advised the Disraelites to consider using a computer as an editing tool for building and annotating their manuscripts. He put his neighbour and colleagues in contact with me.
Recall that, 40 years ago, such an idea was revolutionary, not routine, and there was some reluctance to proceed with it. [Note 1] In our discussions, however, I pointed out to the authors that they would wish the text of each letter to be typed, annotations to be added, and these to be revised and expanded as additional details were observed. The use of a computer would greatly simplify the work involved. And, of course, I had an easy-to-use basic text-editor which could be extracted from ivi, and would would be ideal for their purposes. In the event, the project purchased a PDP 11/03, an early, relatively cheap almost-desktop personal computer, in whose machine language (or technically speaking, assembly language [Note 2]) ivi was written. On my advice, they also paid Stevens as a part-time programmer to maintain the text-editor and to make additions or improvements which were deemed necessary. I also became computer advisor to the project.
The arrangement with the Disraeli project worked very well for several years, and as mentioned earlier, a number of other units or individual researchers bought similar machines, and adopted the ivi text-editor to produce reports and papers. Their varied applications brought up requirements which often called for additional editor features. These were discussed with me on a frequent basis -- every few days, in fact -- and I commonly suggested designs which Stevens then implemented. Thus, I was the designer, Stevens, the implementer of the program.
At some point around the start of the 1980s, an arrangement was made with a software company (ECAP, at Casselman, Ontario) for them to market and further develop the text-editor, presumably with Steven's help. This was the point at which Queen's registered the trademark Q'Text, which became the name of the program.
Some time after this, I lost contact with Stevens, as did ECAP later.
Two events -- one technical, one "soft" -- bypassed everyone.
At the point that we had reached, the PDP-11 was out-of-date, and new personal computers -- faster, smaller and cheaper -- were taking over, and bringing computers to a much wider spread of the population. At the same time, new text-editors -- WordPerfect, Word, and the like -- were being developed by far larger teams, and new features being added at a much faster rate, than academia could match. Were these new editors easier for non-experts to use? This is not for me to say, but many ivi users later told me of their preference for ivi.
There was a significant change in software availability too. Why, one might ask, was ivi written in the assembly language of a specific computer, rather than the widely available Fortran (or, heaven forbid, COBOL!) Here the answer is simple: Fortran was designed for mathematical use, and COBOL for business. Neither has features particularly suited to manipulating text; and, at the time in question, there were few alternatives spread widely across different machines. By the early 1980s, however, the language C was becoming available on most models of computers, including the newer model of the PDP-11 to which I now had access.
As it happens, I could kill two birds with one stone, because the version of the ivi source code I had been left was far from recent. So Summer students created a new implementation in the C language under my supervision, which is our version to this day. Later, I was contacted by ECAP, who said they could not cope with the assembly language form, and asked me if I could pay someone to make some additions. The senior technician in our Department made these changes in the C version, and I provided them with the complete new source. [Note 3]
Over the intervening years, ivi has been developed with the addition of many powerful features. These included a "driver" feature, which allows any program to emit characters to control ivi, carrying out operations on text and executing ivi commands, so that more complicated tasks can be carried out, and performed on multiple texts; a generalization which permitted ivi to process "records" and thus create simple databases; and later in the 1980s, the natural language generation features of Vinci, discussed below.
As might be expected with a program written by several students with different levels of competence, the quality of the C version of ivi varies considerably. Some of the files are well commented and easy to read; others are obscure. Some later, and better, of our programmers have improved it considerably, and I too have rewritten a number of the files.
Around 2010, one student, Craig Thomas, updated the program to conform to a standardization of C sanctified over the years. We now have little problem in "compiling" it for a wide range of machine/operating system environments.
And this brings us to the world of natural language generation.
In the late 1970s, two professors in the Spanish and Italian department: James McDonald (Spanish) and Diego Bastianutti (Italian), arranged with a programmer to create simple programs to assist in teaching their respective languages. I have no detailed recollection of the content, but I believe that the programs generated sentences of just a few words based on fixed syntax and small vocabularies, calling upon the student to provide the correct conjugation of the verb and declension of the nouns and adjectives. These programs and the registered trademark were named Q'Vinci.
In 1985/6, the Dean of Arts and Science at Queen's arranged some meetings of representatives from each language department to consider the wider use of computers in language learning. As he was aware that my research area related to the literary and linguistic applications of computers, he asked me to be present to offer technical advice. At some point in the weeks which followed, I sketched out a broad generalization of the earlier programs, involving wider syntax, substantial lexicons, rules to capture morphology, and so on. It quickly became clear that, with the exception of McDonald, Bastianutti and the representative from French Studies, Greg Lessard, there was little interest in the project. (In fact, there was active discouragement in some Departments where literature was considered paramount and linguistics an unfortunate intrusion.)
For a few years, the four of us met, worked together, and wrote several joint papers. McDonald, however, then retired, and Bastianutti left Queen's. Greg and I have now been research partners over a period spanning about thirty years.
I will not attempt to provide a detailed history, but will note a few points in our progress. In 1987 and every year since 1989, I have compiled a year directory, along with an overall index, of files discussing all aspects of the design, etc. of ivi and its companion program, Vinci. These can be found in my home directory on the computer system of the School of Computing.
We first elaborated the design of a natural language generator, which I had sketched earlier. Over the next few years, with the help of graduate students and Summer student employees, we
- developed a substantial French lexicon, based on Le Micro Robert -- the latter used only as a source of lemmas, and as a reference for part-of-speech and basic grammatical information like verb type. We also created morphology rules for conjugation and inflection, added semantic information, equivalents in English, and pointers to synonyms and antonyms, as well as including phonological data,
- specified a representation for syntax rules (context-free plus rules for syntax transformation) and morphology rules; and
- implemented a natural language generation program, based on these features.
Our first implementation involved a "supervisory" program, which employed subsidiary programs to read lexicon, syntax and morphology, and to generate utterances based on extra user-specific rules of syntax and random lexical choices. In recognition of the earlier work of McDonald and Bastianutti, we named this Vinci.
Subsequently, we made use of the command-execution feature of ivi to introduce commands to read the individual components of the language description and to generate utterances. We referred to the combined system as ivi/Vinci.
Over the intervening period of time, we have:
- built a comprehensive English lexicon. For this purpose, we sought permission of its originator (who happened to be a faculty member of my former university department at Birkbeck College London) to convert his CUVOALD: "The Computer Users Version of the Oxford Advanced Learners' Dictionary", to the Vinci lexicon format. In addition, we lemmatized the Vinci version. (A lemmatized lexicon subsumes all conjugated/declined forms of verbs, nouns and adjectives under a single entry. In the case of French, a single verb entry covers some 60+ forms, which are created from the base by rules of morphology.) It has been enriched algorithmically by drawing some content from WordNet.
- added substantial semantic information to both of our lexicons; and
- implemented the concepts of question, answer and error analysis. In essence, Vinci can generate a "question" -- an exercise instance -- for a student, as well as the expected answer, typically a transformation of the question. It should emphasized that the questions are not canned, but generated "on the fly" for each student. If the student response doesn't match the expected answer, an error analysis procedure is invoked. Variations of word order, of attributes such as gender and number, of the orthography corresponding to the phonology of the answer, and so on, are tried in an attempt to construct what the student has produced. If this is successful, it suggests the problem/s which may underlie the student's error/s.
In 2011, we began the task of putting the ivi/Vinci combination onto a webpage. A choreography was designed, proposing the interaction between ivi/Vinci and two subsidiary programs: "stdnt" and "instr", whose names suggest their roles.
"stdnt" is the interface between ivi/Vinci and the webpage. In effect, ivi/Vinci regards it as the student, sending it the output which would normally adjust the display/window. stdnt modifies the webpage and writes information on it accordingly. Conversely, the webpage regards stdnt as ivi/Vinci, sending it text streams, mouse-clicks, and so on. stdnt transforms these to the "keystrokes" which ivi/Vinci is expecting.
"instr" is an ivi "driver" program of the kind mentioned above, and plays the role of the course instructor. It "knows" about the various natural languages and exercises, and also about the progress achieved by (registered) students. It causes ivi/Vinci to emit questions, receives student responses, triggers error analysis, logs the result, and where necessary, simplifies it. This is very much a work-in-progress. Ideally, instr should analyze the analyses and advise the student on misunderstandings as a human instructor might do; in fact, if these analyses were also carried out across student boundaries, instr might also warn the human instructor about common difficulties.
The webpage version of the program and the website are referred to as "VinciLingua". For the record, the website -- found at vincilingua.ca -- is "open-access", providing temporary anonymous guest accounts or the opportunity to sign in to see more detailed feedback from exercises.
The site forms part of some full online courses offered by Arts and Science Online at Queen's University and, together with other materials [Note 4], provides the opportunity to gain academic credit.
Two courses created by Dr Lessard have now been taught twice each, with enrollments:
|FRST 105 Reading French
|Summer 2015 - 87
|Summer 2016 - 100
|FRST 125 Basic Business French
|Winter (Jan-April) 2016 - 64
|Fall (Sept-Dec) 2016 - 47
So both have apparently been successful, and other courses have been prepared or are currently in preparation.
Students taking the courses are drawn both from Queen's and from the outside world, including students in Canada from both sides of the country.
ivi/Vinci can also function as an experimental tool in a terminal environment and has been used to produce a variety of studies, from limericks, to fairy tales, to various grammar models, including word formation. Results appear in a variety of our papers.
Indeed, the program might be used for learning applications far outside the field of language learning, and "toy" examples have been built for several areas.