NITE XML Toolkit - Working with multiply-annotated data, including reliability testing

This web page contains documentation of the facility for loading multiply-annotated data that forms the core of NXT's support for reliability tests, plus a worked example from the AMI project, kindly supplied by Vasilis Karaiskos. For more information, see the JavaDoc corresponding to the NOM loading routine for multiply-annotated data, for CountQueryMulti, and for MultiAnnotatorDisplay.

The facilities described on this page are new for NXT v 1.3.3.

Generic Documentation

Many projects wish to know how well multiple human annotators agree on how to apply their coding manuals, and so they have different human annotators read the same manual and code the same data. They then need to calculate some kind of measurement statistic for the resulting agreement. This measurement can depend on the structure of the annotation (agreement on straight categorization of existing segments being simpler to measure than annotations that require the human to segment the data as well) as well as what field they are in, since statistical development for this form of measurement is still in progress, and agreed practice varies from community to community.

NXT 1.3.3 and higher provides some help for this statistical measurement, in the form of a facility that can load the data from multiple annotators into the same NOM (NXT's object model, or internal data representation, which can be used as the basis for Java applications that traverse the NOM counting things or for query execution).

This facility works as follows. The metadata specifies a relative path from itself to directories at which all coding files containing data can be found. (The data can either be all together, in which case the path is usually given on the <codings> tag, or it can be in separate directories by type, in which case the path is specified on the individual <coding-file> tags.) NXT assumes that if there is annotation available from multiple annotators, it will be found not in the specified directory itself, but in subdirectories of the directory specified, where the subdirectories is called by the names (or some other unique designators) of the annotators. Annotation schemes often require more than one layer in the NOM representation. The loading routine takes as arguments the name of the highest layer containing multiple annotations; the name of a layer reached from that layer by child links that is common between the two annotators, or null if the annotation grounds out at signal instead; and a string to use as an attribute name in the NOM to designate the annotator for some data. Note that the use of a top layer and a common layer below it allows the program to know exactly where the multiply annotated data is - it is in the top layer plus all the layers between the two layers, but not in the common layer. (It is possible to arrange annotation schemes so that they do not fit this structure, in which case, NXT will not support reliability studies on them.) The routine loads all of the versions of these multiply-annotated layers into the NOM, differentiating them by using the subdirectory name as the value for the additional attribute representing the annotator.

NXT is agnostic as to which statistical measures are appropriate. It does not currently (June 05) implement any, but leaves users to write Java applications or sets of NXT queries that allow their chosen measures to be calculated. (Being an open source project, of course, anyone who writes such applications can add them to NXT for the benefit of others who make the same choices.) Version 1.3.3 provides two end user facilities that will be helpful for these studies, which are essentially multiple annotator versions of the GenericDisplay GUI and of CountQueryResults.

MultiAnnotatorDisplay

This is a version of the GenericDisplay that takes additional command line arguments as required by the loading routine for multiply-annotated data, and renders separate windows for each annotation for each annotator. The advantage of using the GUI is, as usual, for debugging queries, since queries can be executed, with the results highlighted on the data display.

To call the GUI,

java net.sourceforge.nite.gui.util.MultiAnnotatorDisplay -c METADATAFILE -o OBSERVATION -tl TOPLAYER [-cl COMMONLAYER] [-a ANNOTATOR]

-c METADATAFILENAME

-o OBSERVATION

-tl TOPLAYER

-cl COMMONLAYER

-a ANNOTATOR

CountQueryMulti

To call:

java CountQueryMulti -corpus METADATAFILE -query QUERY -toplayer TOPLAYER -commonlayer COMMONLAYER [-attribute ANNOTATOR] [-observation OBSERVATION][-allatonce]

where arguments are as for MultiAnnotatorDisplay, apart from the following (which are as for CountQueryResults):

-observation OBSERVATION

-query QUERY

-allatonce

Example reliability study

The remainder of this web page demonstrates an annotation scheme reliability test in NITE. The example queries below come from the agreement test on the named entities annotation of the AMI corpus. Six recorded meetings were annotated by two coders, whose marking were consequently compared. The categories and attributes that come into play are the following:

named-entity

ne-type

name

coder

Loading the data into the GUI

The tests are being carried out by loading the annotated data on the NXT display MultiAnnotatorDisplay (included in nxt_1.3.3 and above). The call can be incorporated in a shell script along with the appropriate classpaths. For example, the following is included in our multi.sh script run from the root of the NXT install (% sh multi.sh). All the CLASSPATHs should be in a single line in the actual script.

#!/bin/bash
# Note that a Java runtime should be on the path.
# The current directory should be root of the nxt install.
# unless you edit this variable to contain the path to your install
# then you can run from anywhere.
NXT="."

# Adjust classpath for running under cygwin.
if [ $OSTYPE = 'cygwin' ]; then

export CLASSPATH=".;$NXT;$NXT/lib;$NXT/lib/nxt.jar;$NXT/lib/jdom.jar;$NXT/lib/JMF/lib/jmf.jar;$NXT/lib/pnuts.jar;$NXT/lib/resolver.jar;$NXT/
lib/xalan.jar;$NXT/lib/xercesImpl.jar;$NXT/lib/xml-apis.jar;$NXT/lib/jmanual.jar;$NXT/lib/jh.jar;$NXT/lib/helpset.jar;$NXT/lib/poi.jar;$NXT/
lib/eclipseicons.jar;$NXT/lib/icons.jar;$NXT/lib/forms-1.0.4.jar;$NXT/lib/looks-1.2.2.jar;$NXT/lib/necoderHelp.jar;$NXT/lib/videolabelerHelp.jar;
$NXT/lib/dacoderHelp.jar;$NXT/lib/testcoderHelp.jar"

else

export CLASSPATH=".:$NXT:$NXT/lib:$NXT/lib/nxt.jar:$NXT/lib/jdom.jar:$NXT/lib/JMF/lib/jmf.jar:$NXT/lib/pnuts.jar:$NXT/lib/resolver.jar:$NXT/lib/
xalan.jar:$NXT/lib/xercesImpl.jar:$NXT/lib/xml-apis.jar:$NXT/lib/jmanual.jar:$NXT/lib/jh.jar:$NXT/lib/helpset.jar:$NXT/lib/poi.jar:$NXT/lib/
eclipseicons.jar:$NXT/lib/icons.jar:lib/forms-1.0.4.jar:$NXT/lib/looks-1.2.2.jar:$NXT/lib/necoderHelp.jar:$NXT/lib/videolabelerHelp.jar:$NXT/lib/
dacoderHelp.jar:$NXT/lib/testcoderHelp.jar"

# echo "CLASSPATH=.:$NXT:$NXT/lib:$NXT/lib/nxt.jar:$NXT/lib/jdom.jar:$NXT/lib/JMF/lib/jmf.jar:$NXT/lib/pnuts.jar:$NXT/lib/resolver.jar:$NXT/lib/
xalan.jar:$NXT/lib/xercesImpl.jar:$NXT/lib/xml-apis.jar:$NXT/lib/jmanual.jar:$NXT/lib/jh.jar:$NXT/lib/helpset.jar:$NXT/lib/poi.jar:$NXT/lib/
eclipseicons.jar:$NXT/lib/icons.jar:lib/forms-1.0.4.jar:$NXT/lib/looks-1.2.2.jar:$NXT/lib/necoderHelp.jar:$NXT/lib/videolabelerHelp.jar:$NXT/
lib/dacoderHelp.jar:$NXT/lib/testcoderHelp.jar\n";
fi

java net.sourceforge.nite.gui.util.MultiAnnotatorDisplay -c Data/AMI/AMI-metadata.xml -tl ne-layer -cl words-layer

A GUI with a multitude of windows will load (each window contains the data of one of the various layers of data and annotations), thus allowing comparisons between the choices of these coders. In our examples below the annotators are named Coder1 and Coder2.

Selecting Search off the menu bar will bring up a small GUI where the queries such as the ones below can be written. Clicking on any of the query results, highlights the corresponding data in the rest of the windows (words, named entities, coders' markings etc.). Simultaneously, underneath the list of matches, the query GUI expands whichever n-tuple is selected. For a the low-down on the NITE query language (NiteQL), look at the query language documentation or the Help menu in the query GUI.

Querying data related to a single annotator

($a named-entity): $a@coder=="Coder1"
Give a list of all the named entities marked by Coder1.

($w w)(exists $a named-entity): $a@coder="Coder1" && $a ^ $w
Gives a list of all the words marked as named entities by Coder1.

($a named-entity): $a@coder=="Coder1" ::
($w w): $a ^ $w
Gives all the named entities marked by Coder1 showing the words included in each entity.

($a named-entity)($t ne-type): ($a >"type"^ $t) && ($t@name == "EntityType") && ($a@coder == "Coder1")
Gives the named entities of type EntityType annotated by Coder1. The entity types (and their names) to choose from can be seen in the respective window in the GUI (titled "Ontology: ne-types" in this case).

($a named-entity)($t ne-type): ($a >"type"^ $t) && ($t@name == "EntityType") && ($a@coder == "Coder1")::
($w w): $a ^ $w
Like the previous query, only each match also includes the words forming the entity.

($t ne-type)::
($a named-entity): $a@coder=="Coder1" && $a >"type"^ $t
Gives a list of all the named entity types (including "root"), and for each type, the entities of that type annotated by Coder1. By writing the last term of the query as $a >"type" $t, the query will match only the bottom level entity types (the ones used as actual tags), that is it will display MEASURE entities, but not NUMEX ones (assuming here that MEASURE is a sub-type of NUMEX).

($a named-entity)($t ne-type): $a@coder=="Coder1" && $a >"type"^ $t::
($w w): $a ^ $w Like the previous query, only each match (n-tuple) also includes the words forming the entity.

Querying data related to two annotators

Checking for co-extensiveness

The following examples check for agreement between the two annotators as to whether some text should be marked as a named entity

($a named-entity)($b named-entity): $a@coder=="Coder1" && $b@coder=="Coder2" ::
($w1 w) (forall $w w) : ($a ^ $w1) && ($b ^ $w1) &&(($a ^ $w) -> ($b ^ $w)) && (($b ^ $w) -> ($a ^ $w))
Gives a lost of all the co-extensive named entities between Coder1 and Coder2 along with the words forming the entities (the entities do not have to be of the same type, but they have to span exactly the same text).

($a named-entity)($b named-entity): $a@coder=="Coder1" && $b@coder=="Coder2" ::
($w1 w) (exists $w w) : ($a ^ $w1) && ($b ^ $w1) &&(($a ^ $w) -> ($b ^ $w)) && (($b ^ $w) -> ($a ^ $w))
Like the previous query, but includes named entities that are only partially co-extensive. The words showing in the query results are only the ones where the entities actually overlap.

($a named-entity)(forall $b named-entity)(forall $w w): $a@coder=="Coder1" && (($b@coder=="Coder2" && ($a ^ $w))->!($b ^ $w))
Gives the list of entities that only Coder1 has marked, i.e. there is no corresponding entity in Coder2. Switching Coder1 and Coder2 in the query, gives the respective set of entities for Coder2.

($a named-entity)(forall $b named-entity)(forall $w w): $a@coder=="Coder2" && (($b@coder=="Coder1" && ($a ^ $w))->!($b ^ $w)) ||
$a@coder=="Coder1" && (($b@coder=="Coder2" && ($a ^ $w))->!($b ^ $w))
Like the previous query, only this time both sets of non-corresponding entities is given in one go.

Checking for categorisation agreement

The following examples check how the two annotators agree on the categorisation of co-extensive entities

($a named-entity)($b named-entity) ($t ne-type): $a@coder=="Coder1" && $b@coder=="Coder2" && ($a >"type" $t) && ($b >"type" $t) ::
($w1 w) (forall $w w) : ($a ^ $w1) && ($b ^ $w1) &&(($a ^ $w) -> ($b ^ $w)) && (($b ^ $w) -> ($a ^ $w))
Gives all the common named entities between Coder1 and Coder2 along with the entity type and text; the entities have to be co-extensive (fully overlapping) and of the same type.

($a named-entity)($b named-entity) ($t ne-type): $a@coder=="Coder1" && $b@coder=="Coder2" && ($a >"type" $t) && ($b >"type" $t) ::
($w1 w) (exists $w w) : ($a ^ $w1) && ($b ^ $w1) &&(($a ^ $w) -> ($b ^ $w)) && (($b ^ $w) -> ($a ^ $w))
Like the previous query, but includes partially co-extensive entities. The words showing in the query results are only the ones that actually do overlap.

($a named-entity)($b named-entity) ($t ne-type): $a@coder=="Coder1" && $b@coder=="Coder2" && ($a >"type" $t) && ($b >"type" $t)::
($w2 w):($a ^ $w2) && ($b ^ $w2)::
($w w):(($b ^ $w) && !($a ^ $w)) || (($a ^ $w) && !($b ^ $w))
Gives the list of entities which are the same type, but only partially co-extensive. The results include the entire set of words from both codings.

($a named-entity)($b named-entity) ($t ne-type)($t1 ne-type): $a@coder=="Annotator1" && $b@coder=="Annotator2" && ($a >"type" $t) && ($b >"type" $t1) && ($t != $t1)::
($w1 w) (exists $w w) : ($a ^ $w1) && ($b ^ $w1) &&(($a ^ $w) -> ($b ^ $w)) && (($b ^ $w) -> ($a ^ $w)) ::
($w2 w): ($b ^ $w2)
Gives the list of entities, which are partially or fully co-extensive, but for which the two coders disagree as to the type.

($a named-entity)($b named-entity)($c ne-type)($d ne-type):
$a@coder=="Coder1" && $b@coder=="Coder2" && $c@name="EntityType1" && $d@name="EntityType2"&& $a>"type"^ $c && $b>"type"^ $d::
($w2 w):($a ^ $w2) && ($b ^ $w2)
Gives the list of entities which are partially or fully co-extensive, and which Coder1 has marked as EntityType1 (or one of its sub-types) and Coder2 has marked as Type2 (or one of its sub-types). This checks for type-specific disagreements between the two coders.

($t ne-type): !($t@name="ne-root") ::
($a named-entity)($b named-entity): $a@coder=="Coder1" && $b@coder=="Coder2" && (($a >"type"^ $t) && ($b >"type"^ $t))::
($w1 w) (forall $w w) : ($a ^ $w1) && ($b ^ $w1) &&(($a ^ $w) -> ($b ^ $w)) && (($b ^ $w) -> ($a ^ $w))
The query creates a list of all the entity types, and slots in each entry all the (fully) co-extensive entities as marked by the two coders. The actual text forming each entity is also included in the results.

($t1 ne-type): !($t1@name="ne-root") ::
($a named-entity)($b named-entity): $a@coder=="Coder1" && $b@coder=="Coder2" && (($a >"type"^ $t1) && ($b >"type"^ $t1))::
($w1 w) (exists $w w) : ($a ^ $w1) && ($b ^ $w1) &&(($a ^ $w) -> ($b ^ $w)) && (($b ^ $w) -> ($a ^ $w))
Like the previous query, but includes partially co-extensive entities. The words showing in the query results are only the ones that actually do overlap.

Last modified 04/19/06