NITE XML Toolkit - How to use Metadata

The main investment involved in allowing your own data to be used by the NITE XML toolkit is the production of a metadata file and the provision of your data in a conformant fashion (especially as regards file-naming). Understanding the format of metadata files will be important if you wish to import your data, though we provide several example metadata files to help. Once you have a metadata file that describes your data, you will be able to use all the NITE tools to validate, analyse and edit your data.

What metadata files do

Metadata files describe all aspects of a corpus including:

where on disk the parts of the corpus reside;
the codings that can validly be made on the data;
the observations that have been made already along with their status;
the NITE editors and viewers that can be used on the corpus;
much more (see below).

There's a full discussion of the elements and attributes that make up a metadata file here.

What metadata files look like

Metadata files are XML and conform to a DTD. There is one metadata DTD for simple (single file) corpora and one for standoff corpora. They both share much in common, so import the same basic DTD. The set of DTDs (zipped) can be downloaded here. If you are more familiar with XML Schema and have a schema validator installed you may prefer this set of zipped schemas.

Metadata examples

Save these to disk and have a look at them in your favourite XML or text editor.

Metadata for NITE's simple example (you may also want to see the data it describes - 5K zip)
Metadata for the Maptask corpus (here is a single maptask observation - 165K zip)
Metadata for the Smartkom corpus (simple corpus case) (here's a single Smartkom interaction file - 15K XML)

Using Metadata to validate data

Since metadata describes the format of the data and where to find it on disk, it is used by the NITE software to validate the data as it is loaded and edited. This sort of direct validation is useful, but we also provide schema validation of data using a schema derived automatically from the metadata (via a stylesheet).

Assuming you have already downloaded and installed NOM, you already have the schema-generating stylesheet (it's in the lib directory). Armed with this and a stylesheet processor (xalan is also in the NOM distribution), you can run this command on your metadata file:

java org.apache.xalan.xslt.Process -in <your-metadata> -xsl generate-schema.xsl -out extension.xsd

This creates a schema file called extension.xsd which imports two other static schema files: typelib.xsd and xlink.xsd - also in the lib directory of your NOM distribution. Put these static schema files in the same directory as the newly generated extension.xsd.

If you have a schema validator (I use xsv) you are now ready to validate some data files. Try putting these declarations:
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extension.xsd" in the root element of your data file and then execute:
xsv <your-file>

One of the major reasons behind this approach to schema validation is that we can validate data that is either a single file "as-serialized" by NITE, or files that have been transformed to replace their nite:child elements with the pointed-to elements recursively, and also replacing pointers with their actual elements. This is useful for validating the types of elements that can be children of a specific element and pointed to by that element. In this way an entire corpus could be schema validated. You have a stylesheet that does this transformation in the lib directory of your NOM distribution.

If this all seems rather involved, and your data already loads into the NOM, the program PrepareSchemaValidation.java will make a new directory for you which is fully ready for schema validation.

Validation limitations:

all stream elements must be named nite:root;
all ID, Start and End time attributes must use the NITE default names: nite:id, nite:start and nite:end.
all children and pointers must use XLink / XPointer style links.
stream elements will be permitted to contain inadvisably mixed elements (so long as all those elements are valid and defined themselves)

Last modified 04/17/06

NITE XML Toolkit - How To Use Metadata

What metadata files do

What metadata files look like

Metadata examples

Using Metadata to validate data

Validation limitations: