Users of multimedia language data face two problems at the moment: they can't share data because different tools use different data formats, and they can't mark up data with annotations that have complex structures, or combine two simple-structured annotations on the same data, because the data representations employed by the tools at best allow for trees.

This picture (or this high quality version) gives a quite simple constructed example of the richness of structure needed for multimedia annotation. The HCRC Map Task Corpus and the Switchboard Corpus, available from TalkBank, are examples of non-multimedia corpora which have run into the analogous difficulty about representation of structure, but for speech and language annotation without video. Such heavily annotated corpora do not exist yet for multimedia data because of the lack of tools, but are needed for work in, e.g., animation and human-computer interfaces.

End users of a data set want tools that they can just start up and run for displaying, annotating, and analysing data. This creates a basic tension. Where the annotations are simple in structure, it is possible to write general-purpose tools that have reasonable data displays and interfaces (such as TASX, Anvil or The Observer). These tools are fine when the structure of the annotation needed fits the model they have in mind. However, the more exotic the structure needed, or the more annotations given on the same data, the less likely that a pre-defined general interface will fit user needs. This is why in speech and language annotation, most corpus projects ensure that they keep around a developer who has technical skills and can dedicate time to making one-off tools for each task that needs to be done with their specific data set.

The NITE XML Toolkit is aimed at the developer and will allow him or her to build the more specialized displays, interfaces, and analyses that are required by end users when working with highly structured or cross-annotated data. We are not alone in taking this approach; this web page discusses the differences between NXT and the most comparable efforts elsewhere.

Last modified 04/13/06