net.sourceforge.nite.nom.nomwrite
Interface NOMCorpus

All Superinterfaces:
NOMControl
All Known Implementing Classes:
NOMReadCorpus, NOMWriteCorpus

public interface NOMCorpus
extends NOMControl

NOMCorpus is similar to the nomread version. The additions are methods that change the way the NOM is serialized.

Author:
jonathan

Field Summary
static double UNTIMED
           
 
Method Summary
 void clearData()
          Removes all data in the NOM
 void clearDataForObservation(NObservation ob)
          Removes any currently loaded data relating to the given observation
 void clearDataForObservation(java.lang.String ob)
          Removes any currently loaded data relating to the named observation
 void completeLoad()
          finish loading *all* files we know about from the corpus: this only makes sense if lazy loading is switched on, otherwise it will do nothing.
 boolean edited()
          returns true if the corpus has unsaved edits
 void forceAnnotatorCoding(java.lang.String annotator, java.lang.String coding)
          Force one coding to be loaded for a specific annotator when loadData is called.
 java.lang.String generateID(java.lang.String element_name)
          generates an Identifier that's globally unique - used when creating elements
 boolean getBatchMode()
          Used by NOM program - not for client program use
 java.lang.String getCodingFilename(NObservation no, NCoding co, NAgent ag)
          Return the actual file to which this data should be serialized (including any annotator-specific subdirectory).
 double getCorpusDuration()
          returns the duration of the corpus (last end time - earliest start time) (or UNTIMED if there is are no timed elements)
 double getCorpusEndTime()
          returns the latest end time of any element in the corpus (or UNTIMED if there is no timed element)
 double getCorpusStartTime()
          returns the earliest start time of any element in the corpus (or UNTIMED if there is no timed element)
 NOMElement getElementByID(java.lang.String id)
          Return a NOMWriteElement which has the given element ID: you can either pass an unadorned ID in which case NXT searches for the element in all already-loaded files, or you can specify the 'full' ID like this: colour#id (e.g.
 NOMElement getElementByID(java.lang.String colour, java.lang.String id)
           
 java.util.List getElementsByName(java.lang.String name)
          Return a list of NOMWriteElements which have the given element name.
 java.lang.String getHrefAttr()
          Link syntax information: get the name of the 'href' attribute
 java.lang.String getLinkAfterID()
          Link syntax information: get the String that appears after an ID
 java.lang.String getLinkBeforeID()
          Link syntax information: get the String that appears before an ID
 java.lang.String getLinkFileSeparator()
          Link syntax information: get the String that separates a filename from an ID
 java.util.List getLoadedObservations()
          returns a List of NObservation elements - each one the name of an observation that has been asked to be loaded (how much, if any of the observation data actually loaded depends on lazy loading).
 NOMMaker getMaker()
          This is used by internal corpus-building routines to make sure we always use the right constructors.
 int getMaxDepth(NLayer layer)
          Return the deepest nesting of elements in this recursive layer (if the layer is not recursive, returns 1 or 0)
 NMetaData getMetaData()
          returns the metadata associated with this NOM
 java.util.List getPointersTo(NOMElement to_element)
          Return the reverse index of pointers to the given element
 java.lang.String getRangeSeparator()
          Link syntax information: get the String that appears between IDs in a range
 java.util.List getRootElements()
          returns a List of NOMElements: the top level "stream" elements
 NOMElement getRootWithColour(java.lang.String colour)
          returns the root NOMElement which has the given colour
 boolean isEditSafe()
          Return true if the corpus can be edited safely - for internal use.
 boolean isLazyLoading()
          Set to true (default) to lazy-load any future calls to load data; false means everything in future load calls is loaded up-front.
 boolean isLoadingFromFile()
          Returns true if data is currently being loaded from file.
 boolean isValidating()
          returns true if the corpus is validating (i.e.
 void loadData()
          Load all data for the corpus into the NOMCorpus.
 void loadData(java.util.List observations, java.util.List codings)
          Load data for a specific set of observations into the NOMCorpus.
 void loadData(NObservation observation)
          Load data for a single observation into the NOMCorpus.
 void loadReliability(NLayer top, NLayer top_common, java.lang.String coder_attribute_name, java.lang.String path, java.util.List observations)
          Load data for the purpose of comparing different coders' data.
 void loadReliability(NLayer top, NLayer top_common, java.lang.String coder_attribute_name, java.lang.String path, java.util.List observations, java.util.List other_layers)
          Load data for the purpose of comparing different coders' data.
 boolean lock(NOMView view)
          lock the corpus for edits - returns false if another view has locked the corpus.
 java.util.Iterator NOMWalker()
          Provides an iterator which visits each element in the NOM exactly once.
 void preferAnnotatorCoding(java.lang.String annotator, java.lang.String coding)
          Prefer one coding to be loaded for a specific annotator when loadData is called.
 void registerID(java.lang.String colour, java.lang.String id)
          registers an Identifier as having been used and if necessary, notes an Integer in the ID hash for quick generation of IDs
 void removePointerIndex(NOMPointer point)
           
 NOMElement resolveLink(java.lang.String xlink)
          Resolve an individual xlink expression which points to exactly one NOM element.
 NOMElement resolveLink(java.lang.String xlink, int linktype)
          Resolve an individual xlink expression which points to exactly one NOM element - the second argument explicitly names the link type involved.
 void serializeCorpus()
          Serialize the entire loaded corpus
 void serializeCorpus(java.util.List observations)
          Serialize all loaded files for the given list of observations
 void serializeCorpusChanged()
          Serialize all files which have been changed.
 boolean serializeInheritedTimes()
          True if we should allow inherited times to be serialized
 boolean serializeMaximalRanges()
          True if we should serialize ranges
 void setDefaultAnnotator(java.lang.String annotator)
          Set the preferred annotator for *all* codings that is used on subsequent loadData calls.
 void setForceStreamElementNames(boolean bool)
          Set to true to make future serialization calls serialize with stream element names conforming to meta.getStreamElementName().
 void setLazyLoading(boolean bool)
          Set to true (default) to lazy-load any future calls to load data; false means everything in future load calls is loaded up-front.
 void setSchemaLocation(java.lang.String location)
          If this method is used with a non-null argument, we make sure the schema instance namespace is output on every stream-like element on serialization along with this as the noNamespaceSchemaLocation
 void setSerializeInheritedTimes(boolean bool)
          Set to true to make future serialization calls serialize with inherited times on structural elements.
 void setSerializeMaximalRanges(boolean bool)
          Set to true (default) to make future serialization calls serialize with ranges where possible.
 void setValidation(boolean validate)
          Set validation for the corpus.
 boolean unlock(NOMView view)
          unlock the corpus - returns false if the view isn't the one that has the lock.
 
Methods inherited from interface net.sourceforge.nite.nom.link.NOMControl
deregisterViewer, notifyChange, notifyChange, notifyChange, registerViewer
 

Field Detail

UNTIMED

static final double UNTIMED
See Also:
Constant Field Values
Method Detail

loadData

void loadData()
              throws NOMException
Load all data for the corpus into the NOMCorpus. Incremental loading of data is the default, so a new call to loadData will not zero-out the data loaded in a previous call.

Throws:
NOMException

loadData

void loadData(java.util.List observations,
              java.util.List codings)
              throws NOMException
Load data for a specific set of observations into the NOMCorpus. Incremental loading of data is the default, so a new call to loadData will not zero-out the data loaded in a previous call. If the list of codings is non-null, it will be expected to be a list of NCodings that is the maximal set to be loaded whether lazy loading is on or off.

Throws:
NOMException

loadData

void loadData(NObservation observation)
              throws NOMException
Load data for a single observation into the NOMCorpus. Incremental loading of data is the default, so a new call to loadData will not zero-out the data loaded in a previous call.

Throws:
NOMException

loadReliability

void loadReliability(NLayer top,
                     NLayer top_common,
                     java.lang.String coder_attribute_name,
                     java.lang.String path,
                     java.util.List observations)
                     throws NOMException
Load data for the purpose of comparing different coders' data. The layer 'top' is where we start the per-coder information and 'top_common' is the layer at which we expect everything to be common between coders. 'coder_attribute_name' is ised as the name of the attribute that gets the name of the coder and 'path' is where the coder data is. We assume the data will use standard NXT-filenames but be held in a directory per coder (the name of the coder is assumed to be the name of the directory) under 'path'. If 'observation' is null we attempt to load all the data, otherwise we only attempt to load one observation. loadReliability is incompatible with lazy loading.

Throws:
NOMException

loadReliability

void loadReliability(NLayer top,
                     NLayer top_common,
                     java.lang.String coder_attribute_name,
                     java.lang.String path,
                     java.util.List observations,
                     java.util.List other_layers)
                     throws NOMException
Load data for the purpose of comparing different coders' data. This is the same as the other call except for the final argument which is a list of Strings: names of layers that should be loaded in 'gold-standard' mode

Throws:
NOMException

setDefaultAnnotator

void setDefaultAnnotator(java.lang.String annotator)
Set the preferred annotator for *all* codings that is used on subsequent loadData calls. This will be overridden by any codings that are forced to a specific annotator using 'forceAnnotatorCoding'. Note that this is the preferred annotator only, and if there is no annotator data for any coding but gold-standard data is present, that will be loaded instead.


forceAnnotatorCoding

void forceAnnotatorCoding(java.lang.String annotator,
                          java.lang.String coding)
                          throws NOMException
Force one coding to be loaded for a specific annotator when loadData is called. This loads from the annotator's directory even if it's empty, and there is gold-standard data available.

Throws:
NOMException

preferAnnotatorCoding

void preferAnnotatorCoding(java.lang.String annotator,
                           java.lang.String coding)
                           throws NOMException
Prefer one coding to be loaded for a specific annotator when loadData is called. This means if there's no annotator data for the coding we take any 'gold-standard' data instead.

Throws:
NOMException

clearData

void clearData()
Removes all data in the NOM


clearDataForObservation

void clearDataForObservation(NObservation ob)
Removes any currently loaded data relating to the given observation


clearDataForObservation

void clearDataForObservation(java.lang.String ob)
Removes any currently loaded data relating to the named observation


getMetaData

NMetaData getMetaData()
returns the metadata associated with this NOM


getLoadedObservations

java.util.List getLoadedObservations()
returns a List of NObservation elements - each one the name of an observation that has been asked to be loaded (how much, if any of the observation data actually loaded depends on lazy loading).


NOMWalker

java.util.Iterator NOMWalker()
Provides an iterator which visits each element in the NOM exactly once. We guarantee to traverse each "document" in document order, where "document" refers to a file that is read in or a pseudo-file that is created internally when data is loaded for a particular purpose. These "documents" are not considered to be ordered.


isValidating

boolean isValidating()
returns true if the corpus is validating (i.e. if it is checking against the metadata whether changes are valid). The default value for validation is true


setValidation

void setValidation(boolean validate)
Set validation for the corpus. The default value for validation is true.


getBatchMode

boolean getBatchMode()
Used by NOM program - not for client program use


isLoadingFromFile

boolean isLoadingFromFile()
Returns true if data is currently being loaded from file.


getMaxDepth

int getMaxDepth(NLayer layer)
Return the deepest nesting of elements in this recursive layer (if the layer is not recursive, returns 1 or 0)


getElementsByName

java.util.List getElementsByName(java.lang.String name)
Return a list of NOMWriteElements which have the given element name.


getElementByID

NOMElement getElementByID(java.lang.String colour,
                          java.lang.String id)

getElementByID

NOMElement getElementByID(java.lang.String id)
Return a NOMWriteElement which has the given element ID: you can either pass an unadorned ID in which case NXT searches for the element in all already-loaded files, or you can specify the 'full' ID like this: colour#id (e.g. q4nc4.f.moves#move.3 would refer to element 'move.3' in the file q4nc4.f.moves.xml)


getRootElements

java.util.List getRootElements()
returns a List of NOMElements: the top level "stream" elements


getRootWithColour

NOMElement getRootWithColour(java.lang.String colour)
returns the root NOMElement which has the given colour


getCorpusStartTime

double getCorpusStartTime()
returns the earliest start time of any element in the corpus (or UNTIMED if there is no timed element)


getCorpusEndTime

double getCorpusEndTime()
returns the latest end time of any element in the corpus (or UNTIMED if there is no timed element)


getCorpusDuration

double getCorpusDuration()
returns the duration of the corpus (last end time - earliest start time) (or UNTIMED if there is are no timed elements)


getLinkFileSeparator

java.lang.String getLinkFileSeparator()
Link syntax information: get the String that separates a filename from an ID


getLinkBeforeID

java.lang.String getLinkBeforeID()
Link syntax information: get the String that appears before an ID


getLinkAfterID

java.lang.String getLinkAfterID()
Link syntax information: get the String that appears after an ID


getRangeSeparator

java.lang.String getRangeSeparator()
Link syntax information: get the String that appears between IDs in a range


getHrefAttr

java.lang.String getHrefAttr()
Link syntax information: get the name of the 'href' attribute


resolveLink

NOMElement resolveLink(java.lang.String xlink,
                       int linktype)
Resolve an individual xlink expression which points to exactly one NOM element - the second argument explicitly names the link type involved. It can be one of XPOINTER_LINKS or LTXML1_LINKS (defined in the NMetaData class)


getPointersTo

java.util.List getPointersTo(NOMElement to_element)
Return the reverse index of pointers to the given element


setLazyLoading

void setLazyLoading(boolean bool)
Set to true (default) to lazy-load any future calls to load data; false means everything in future load calls is loaded up-front.


isLazyLoading

boolean isLazyLoading()
Set to true (default) to lazy-load any future calls to load data; false means everything in future load calls is loaded up-front.


completeLoad

void completeLoad()
finish loading *all* files we know about from the corpus: this only makes sense if lazy loading is switched on, otherwise it will do nothing.


edited

boolean edited()
returns true if the corpus has unsaved edits


setSerializeInheritedTimes

void setSerializeInheritedTimes(boolean bool)
Set to true to make future serialization calls serialize with inherited times on structural elements. Set to false (default) to only serialize start and end times on timed elemets.


setForceStreamElementNames

void setForceStreamElementNames(boolean bool)
Set to true to make future serialization calls serialize with stream element names conforming to meta.getStreamElementName(). Default is that stream elements will be output as they are input.


setSchemaLocation

void setSchemaLocation(java.lang.String location)
If this method is used with a non-null argument, we make sure the schema instance namespace is output on every stream-like element on serialization along with this as the noNamespaceSchemaLocation


serializeInheritedTimes

boolean serializeInheritedTimes()
True if we should allow inherited times to be serialized


setSerializeMaximalRanges

void setSerializeMaximalRanges(boolean bool)
Set to true (default) to make future serialization calls serialize with ranges where possible. Set to false to explicitly list all nite children.


serializeMaximalRanges

boolean serializeMaximalRanges()
True if we should serialize ranges


serializeCorpusChanged

void serializeCorpusChanged()
                            throws NOMException
Serialize all files which have been changed.

Throws:
NOMException

serializeCorpus

void serializeCorpus()
                     throws NOMException
Serialize the entire loaded corpus

Throws:
NOMException

serializeCorpus

void serializeCorpus(java.util.List observations)
                     throws NOMException
Serialize all loaded files for the given list of observations

Throws:
NOMException

generateID

java.lang.String generateID(java.lang.String element_name)
generates an Identifier that's globally unique - used when creating elements


registerID

void registerID(java.lang.String colour,
                java.lang.String id)
registers an Identifier as having been used and if necessary, notes an Integer in the ID hash for quick generation of IDs


isEditSafe

boolean isEditSafe()
Return true if the corpus can be edited safely - for internal use. The corpus is always safe to edit if it is not shared; if the corpus is shared, edits are permitted only if a process has locked the corpus.


lock

boolean lock(NOMView view)
lock the corpus for edits - returns false if another view has locked the corpus.


unlock

boolean unlock(NOMView view)
unlock the corpus - returns false if the view isn't the one that has the lock.


resolveLink

NOMElement resolveLink(java.lang.String xlink)
Resolve an individual xlink expression which points to exactly one NOM element. Note that the format of the link depends on the metadata link syntax setting.


removePointerIndex

void removePointerIndex(NOMPointer point)

getMaker

NOMMaker getMaker()
This is used by internal corpus-building routines to make sure we always use the right constructors.


getCodingFilename

java.lang.String getCodingFilename(NObservation no,
                                   NCoding co,
                                   NAgent ag)
Return the actual file to which this data should be serialized (including any annotator-specific subdirectory).