Documentation of how to use these default tools for your own corpus.


Class Summary

Package Description

Documentation of how to use these default tools for your own corpus.

The dacoder package contains a customizable dialogue act coder developed for the AMI Project (Augmented Multiparty Interaction).

The necoder package contains a 'sparse text tagger' developed for the AMI Project as a named entity coder.

How to use the tools provided in this package on any corpus

The tools provided in these packages are pretty generic and should be useful for many purposes on many different corpora. As an example, the NECoder can be used for tagging named entities in speech transcriptions, or for labeling many other simple labels on transcription such as pronouns, vocatives or NP and VP chunks. However, in order to adapt the tools to a certain use for a certain corpus, a few things have to be taken care of. Most importantly, the corpus metadata file has to be extended to refer to the appropriate tool in the CallableTools section and the tool configuration has to be changed to contain the correct information for the corpus, such as the names of the relevant layers and ontologies.

Extending the metadata file for the corpus

In the first place the metadata file should refer to the tools as being callable for this corpus. The following XML fragment gives an example of how to do this.
    <callable-program description="Dialogue Act Annotation" name="">
        <required-argument name="" type="corpus"/>
        <required-argument name="" type="observation"/>
    <callable-program description="VideoLabeller" name="">
        <required-argument name="" type="corpus"/>
        <required-argument name="" type="observation"/>
    <callable-program description="Named Entity tagger" name="">
        <required-argument name="" type="corpus"/>
        <required-argument name="" type="observation"/>
Note: these fragments will change a little bit when the support for multiple coders has been implemented!

In the second place the metadatafile should of course describe all relevant layers for the tool one wants to use. It is not very useful to use a dialogue act coder on a corpus that contains no transcriptions or a dialogue act layer. Again, as an example a fragment from an existing metadata file is included.

    <!-- ontologies-section omitted -->
    <codings path="./codings">
            <!-- the words -->
            <coding-file name="word-transcription">
                <time-aligned-layer name="word-layer">
                    <code name="word" text-content="false">
                        <attribute name="word" value-type="string"/>
            <!-- a basic segmentation to use for display, when there are no dialogue acts yet -->
            <coding-file name="basic-segmentation">
                <structural-layer name="basic-segment-layer" points-to="word-layer">
                    <code name="basic-segment" text-content="false"/>
            <!-- Dialog acts provide a non overlapping, complete, segmentation on the word layer
                 augmented with pointers to certain dialogue act types. -->
            <coding-file name="dialog-act">
                <structural-layer name="da-layer" points-to="word-layer">
                    <code name="dact" text-content="false">
                        <pointer number="+" role="type" target="da-types"/>
                        <attribute name="addressee" value-type="String"/>
            <coding-file name="nees">
                <structural-layer name="ne-layer" points-to="word-layer">
                    <code name="named-entity" text-content="false">
                        <pointer number="0" role="type" target="ne-types"/>
            <coding-file name="dialogue-act-relations">
                <featural-layer name="dialogue-act-relations-layer">
                    <code name="dialogue-act-relation" text-content="false">
                        <pointer number="0" role="source" target="da-layer"/>
                        <pointer number="1" role="target" target="da-layer"/>
                        <pointer number="3" role="type" target="da-rel-types"/>

Specifying the configuration of the coding tools

Not every corpus uses the same names for the same layers. Therefore, to make the tools usable, one should have the possibility to specify in some configuration which layers are the ones that are relevant for any specific tool. To this end the nxtConfig.xml is used. This file should be present in the /lib/ directory of the NXT distribution. It specifies, for each tool in the tools packages, the relevant settings. The settings are separated in corpus specific settings, such as the names of the ontologies and layers, and gui specific settings, such as whether the log window should be visible. For tool there should be an entry for your corpus which specifies the name of the metadatafile and the id of the gui and corpus settings element. In the example below, the AMI corpus file is in the location "d:/java/data/project/ami/m4corpus/m4-metadata.xml", and the corpus settings for the dialogue act coder are defined in the "dac-cs-ami" element.

    <!-- This section contains the configuration settings for the AMI DACoder and NECoder tools developed in Twente
         for the AMI project. For documentation purposes it contains one corpussettings entry that is not connected
         to an actual corpus but rather explains all possible corpus-dependent settings. Furthermore it contains 
         appropriate default settings for at least the AMI pilot corpus and the ICSI corpus (Edinburgh CVS).
         If you want to use the DACoder or NECoder on your own corpus, duplicate a corpussettings entry, fill in 
         the appropriate attributes with the correct settings for your corpus and add an entry directly below this 
         comment to connect the metadatafile to those new settings. If you want to use those tools on the ICSI corpus
         or the AMI pilot corpus, change the metadatafile entries below so they point to the correct metadata files.
         For the ICSI corpus you should also add the appropriate layer definitions and the ontology files (see README 
         in Contributions/Anno22L).
         If all works as it should, the tool should be able to automatically find this config file (it should reside
         on the CLASSPATH), and will use the correct corpussettings and guisettings for any corpus for which an entry
         'metadatafile' exists.
         If things don't work that way, please drop a note to dennisr at hmi dot utwente dot nl

    <metadatafile file="d:/java/data/project/ami/m4corpus/m4-metadata.xml"      corpussettings="dac-cs-ami"   guisettings="dac-gs-default"/>

        id                      = "dac-cs-example"
        gloss                   = "Example element containing short explanation of all possible settings"
        segmentationelementname = "Element name of the segmentation elements that pre-segments the transcription layer. Used for the initial display of the text."
        transcriptionlayername  = "LAYER name of the transcription layer"
        transcriptionattribute  = "Name of the attribute in which text of transcription is stored. Leave out if text not stored in attribute."
        transcriptiondelegateclassname = "full class name of TranscriptionToTextDelegate. Leave out is no delegate is used"

        daelementname           = "element name of dialogue act instances"
        daontology              = "ontology name of dialogue acts"
        daroot                  = "nite-id of dialogue act root"
        datyperole              = "role name of the pointer from a dialogue act to its type"

        apelementname           = "element name of adjacency pair instances"
        apontology              = "ontology name of adjacency pairs"
        aproot                  = "nite-id of adjacency pair root"
        defaultaptype           = "nite-id of default adjacency pair type"

        neelementname           = "element name of named entity instances"
        neontology              = "ontology name of named entities"
        neroot                  = "nite-id of named entities root"
        nenameattribute         = "attribute name of the attribute that contains the name of the named entity"
        netyperole              = "role name of the pointer from a named entity to its type"
        abbrevattribute         = "name of the attribute which contains an abbreviated code for the named entity for in-text display"
    <!-- Corpus settings for the AMI Pilot corpus -->
        id                      = "dac-cs-ami"
        gloss                   = "The corpus settings for the dialogue act coder for the official AMI corpus"
        segmentationelementname = "trans-segment"
        transcriptionlayername  = "word-layer"
        transcriptionattribute  = "word"
        daelementname           = "dact"
        daontology              = "da-types"
        daroot                  = "cmrda"
        datyperole              = "da-aspect"
        dagloss                 = "gloss"
        apelementname           = "adjacency-pair"
        apontology              = "ap-types"
        aproot                  = "apt_0"
        defaultaptype           = "apt_1"

        neelementname           = "named-entity"
        neontology              = "ne-types"
        neroot                  = "ne_0"
        nenameattribute         = "name"
        netyperole              = "type"
        abbrevattribute         = "abbrev"
        id                      = "dac-gs-example"
        gloss                   = "Example element containing short explanation of all possible settings"
        showapwindow            = "If true, the Adjacency Pair window is shown."
        showlogwindow           = "If true, the log feedback window is shown."
        applicationtitle        = "The title that you want to see in the main frame... (no reason to make this a setting, except that it's funny :-)"
        id                      = "dac-gs-default"
        gloss                   = "Default settings"
        showapwindow            = "true"
        showlogwindow           = "true"
        applicationtitle        = "AMI Dialogue act coder"


Starting the tools

To start one of these tools, one simply starts the basic NXT GUI program (net.sourceforge.nite.nxt.GUI), opens the metadata file, and selects the appropriate program from the list.