Implementing the Glossary

Early on in the project, we decided that a glossary would be extremely useful for project with so many obsolete words and words specific to the Sheffield industries, which few readers would be familiar with.

To make glossary expansions easy to consult, we made use of pop-up texts with the HTML <acronym> tag which displays a short text over the cursor position when the mouse is ‘hovered’ over the text.

Taking an example from claim 4528, if the text of a ‘particular’ is
24 Besoms spoilt
then the resulting HTML needs to be
24 <acronym title='besom: broom made of twigs'>Besoms</acronym> spoilt

We wanted texts with such glossary expansions to be visible to readers, but not to impinge to an excessive degree on the reading of the text, so we placed a pale grey background behind words with glossary expansions (using CSS).

To make the glossary easy to create and maintain, it is implemented as a table within the SFCA database, with one field for the glossary term, and another field for the expansion. This same table is used as the source for the glossary page.

Implementation

Warning: geek territory! The following explanation will be of interest to programmers only.

A variety of approaches were tried for the glossary. We ended up using a JavaScript server-side script to replace glossary terms with the expanded glossary entry shown above. The glossary entries needed to be pre-processed for performance reasons, and the Visual Basic programming language available within MS-Access has inadaquate regular expression handling, hence the use of a JavaScript ASP script.

Using, for example, the expansion of the term ‘besom’, the following code fragment would surround the word besom with the <acronym> tag:

re = /([^a-zA-Z]|^)(besom[s]?)([^a-zA-Z]|$)/gi;

particular = particular.replace(re, "$1<acronym title='besom: broom made of twigs'>$2</acronym>$3");

The regular expression identifies full words, in order to avoid false expansions such as coalskips for ‘kip’, the hide of a young beast. To find full words, it is not sufficent to look for the search term surrounded by spaces: sometimes the word might be followed by punctuation, or might be at the start of the field. In this case it looks for besom or besoms, preceded by a non-alpha character (or the start of the field), and followed by a non-alpha character (or the end of the field). It does this for multiple occurances, and ignoring case.

The search string is divided into three subexpressions by surrounding them with parentheses; this enables us to refer back to the subexpressions in the replacement operation. This replaces the matched text with the opening non-alpha character ($1), followed by the <acronym> tag, followed by the matched word ($2), followed by the close </acronym> tag, followed by the terminating non-alpha character ($3).

In practice, of course, the regular expression, and the replacement operation, are constructed dynamically from the terms and expansions in the database.

One of the difficulties of this project has been that it becomes too interesting, and it is difficult to avoid distractions. In this case, I ended up researching some of the glossary expansions myself. Why was Edward Frith Foster reimbursed 2/- for 24 Bed Ticks? Clue: bed ticks are unrelated to sheep ticks...

Chris Veness, Movable Type

Accessibility | Privacy policy