vltaglist manual

What it is

vltaglist is a simple web-based program that reads in a wordlist and allows the user to add up to three tags per word. Tags may be edited, added to or removed. vltaglist forms part of a pipeline of tools, with vlmakelist upstream, to create a wordlist from a text, and vltagtext downstream, to apply tags to a text. vltaglist is aimed at researchers who are looking for a light on the ground tool for producing tagging files or basic lexicons or dictionaries.

How to use it

The starting point is a csv file encoded in UTF-8, located on a user's device. This may be prepared on a spreadsheet editor or by the vlmakelist software. In all cases, it must be saved in .csv format. vltaglist works happily with wordlists up to the size of a novel; larger files can take too long to process.

vltaglist includes two views:

Initial page

The initial page looks like this:

vltaglist initial page

Before loading a file, the user should set the desired number of tag columns. The default is one, although up to three may be defined. (NB: if the number of tag columns chosen here is less than the number of columns in the file loaded, the user will be warned that this will lead to data loss. Caveat usor!) It is also possible to set the maximum length of tags. The default is 5 characters, but anything down to one is permitted. Note that this applies to all tag columns.

A csv file may then be loaded by clicking on the Browse... button. This will open a dialogue on your device which you can use to select the file you wish to load. Click on your device's dialogue to choose a text then click on the Import file button to load the file.

Files may also be downloaded from an online page, saved to the user's device, and then loaded into vltaglist as explained above. As a convenience, the online vltaglist page includes a Samples directory (at https://vincilingua.ca/Tools/Samples) which includes several csv files that may be downloaded in this way, including proust.csv, from a long novel in French taken from Project Gutenberg and bees.csv, based on a short text in English about bumblebees taken from a the Canadian BroadCasting Corporation with html codes removed. It is the latter that we will use in subsequent examples.

Tag editing view

Once a word list file has been loaded, you will see a tag editing page, the top part of which looks like this (based on the bees.csv file which has had simple part of speech tags added). There are several rows of commands.

vltaglist edit view

Reset and removal commands

The first row contains two buttons: a Restart button which, if clicked, returns the user to the initial page, and a Delete checked lines button which removes any lines that have been checked.

An important note: when the Restart button is clicked, no work is saved. It is the user's responsibility to save the tag file at regular intervals. To help remind the user of this, when the Restart button is clicked, a warning appears on the screen reminding him or her to save the file beforehand. This warning may be ignored if the user wishes to discard the working file.

View commands

The next line allows the user to select a subset of the wordlist by entering a regular expression in the input box and then clicking on the Showbutton. For example, entering ^a.*ing$ will find all words beginning with a and ending with ing (here, according, acting, amazing, astonishing). (A slightly longer presentation of regular expressions appears at the end of this manual and many tutorials may be found on the web.) Clicking on the Show all button shows all words again.

Export commands

The export commands line contains an input box where the user can specify the name of the file where lines checked by the user will be saved. By default, this file is placed in the Downloads directory on the user's device. The input box is followed by an Export checked lines button which performs the save operation.

In regular usage of these two commands, the user will begin by checking those items to be exported, including clicking the Check all button, then specifying the filename where data will be saved, in csv format, with each field separated from the next by a tab character, then clicking to perform the operation.

Note that if the user chooses the same export filename for subsequent exports, the system will append an increasing digit to each subsequent export, as in fred(1).csv, allowing for multiple versions of work to be kept.

Tab commands

The next line on the screen sets parameters for how the tab key works. By default, clicking on Tab moves the cursor to the next input field. However, when entering materials, it is sometimes useful to move only within a single column. If the user clicks on the checkboxes button, the cursor moves to the first checkbox and subsequent tabs and shift-tabs move up and down the checkboxes, while the space key checks and unchecks the focused tab. In this way, the combinations of two keystrokes allow quick selection of many items in a relatively short time.

In a similar fashion, clicking on the tag col1 button focuses on the first tag column and tabs and shift tabs move up and down that.

As more tag columns are added (see below), more buttons appear for focusing on specific columns. Finally, clicking on the across (default) button causes the environment to revert to the state where tabs move across entries, from checkbox to tag columns, then down to subsequent lines.

Change commands

It is sometimes useful to perform global changes to tags. For example, if no tags have been defined, the editor will place a question mark in each tag field. If the user notices that most items in the list are verbs, for example, he or she could place ? and V in the two input boxes and click on the in column 1 button. Well-chosen defaults like this can save significant typing time.

As in the case of tab specifications, if more tag columns are added, additional buttons will appear.

Word addition

The following line provides the option of adding additional entries to the list, including a new word and up to three tags. If the word exists already, a warning is given and the change is ignored. However, if the word doesn't exist, it will be placed at the bottom of the list. It can then be put back in its proper place by clicking on the appropriate Sort up or Sort down button.

Tag lists

The most basic taglist contains three columns: a list of checkboxes, a list of words and a list of boxes where tags may be entered. If a wordlist with no tags is imported, the list of tag boxes will be filled with ? symbols and the user may enter material up to the tag length limit specified at the outset.

Each column is preceded by buttons which apply to the column. In the case of checkboxes, the Check all and Uncheck all buttons simplify data entry in the cases where all buttons must be checked. For example, to save the list of all words ending in ing, the user will start by defining the pattern by a show words command. The list of words will then be shrunk down to only those words. Then, clicking on the Check all button selects only the visible words which have been checked. This list may then be exported or deleted by clicking on one of the file command buttons on the top of the screen.

In the case of words and tags, the Sort up and Sort down buttons perform the expected sorts. In the case of tags, this is useful to check the consistency of one's tagging by having all tags of the same value fall together.

Multiple columns

As noted earlier, it is possible to specify up to three tag columns. If that is done, the result looks like this:

vltaglist multiple column mode

All commands work as in single column mode with the exception that more buttons are available for dealing with the extra columns. The use of multiple columns permits more delicate tagging, as in, for example, using the first column for part of speech, the second for morphological information and the third for semantic information.

The constant which sets the maximum number of columns is defined in the JavaScript file and may be altered if the user wishes.

Tagging conventions

vltaglist imposes few requirements with respect to tags. Any UTF-8 character is permitted in tags, including spaces and punctuation, although both of these risk introducing issues at other stages of the word analysis process.

One convention that the user will see in the bees.csv file is the use of the backslash \ to separate alternative values, as in N\V for access which may be either a noun or a verb.

Regular expressions

As noted above, the Show words command permits the use of regular expressions to represent patterns. A full discussion of regular expressions is beyond the scope of this page, but the following examples will illustrate the essentials.

It is useful to play with these patterns to make them more familiar and to better understand the underlying structures within a wordlist. There exist also various discussions of regular expressions on the web.

Basic architecture

vltaglist follows the old Unix philosophy of doing one thing (and with luck doing it well). As a result, it has only a small number of basic features. It is written in pure JavaScript so users can see and, if they wish, modify the code. It uses no cookies.

Once downloaded, vltaglist can be used offline, making it suitable for contexts where internet is not available. It is usable on most devices including desktops, tablets and phones. By doing most of the heavy lifting of analysis on arrays rather than DOM, it allows work on relatively large texts, up to about the size of a novel.

Sharing

Like all the software on the VinciLingua site, vltaglist is freely available for personal use. Users should feel free to share the software but should not remove mention of the original author.

Users are encouraged to share stories of use, to signal any problems or to suggest any improvements by writing to info@vincilingua.ca. We are particularly interested in cases of use involving teaching and learning of underresourced languages. Note though that this is still a work in progress. Suggestions for improvement are welcome, but we cannot commit to making any requested changes.

Institutions that charge for teaching and learning who wish wish to incorporate vltaglist in courses or teaching materials should contact info@vincilingua.ca to discuss terms of access.