From this page we make available as much of our data and processing
pipelines as possible. The data listed is available via anonymous ftp
from ftp.cogsci.ed.ac.uk/pub/disp, or by clicking on the links below.
A research licence for LT TTT is needed for the pipelines - see below.
Data
The Ohsumed Corpus is available by anonymous ftp from
medir.ohsu.edu/pub/ohsumed
and is divided across five very large files, one for each year between
1987 and 1991. We found it convenient to split the files into smaller
units and we make available the smaller files both in their raw form
as well as converted to XML and annotated in various ways. The
experiments which used CASS and the Tag Sequence Grammar are reported
on in Grover, Lapata and Lascarides (submitted) and Grover, Klein,
Lascarides and Lapata (2002) (see DISP papers).
TSG.tar.gz
Sentences from the abstracts in Ohsumed files converted
into the format needed by the Tag Sequence Grammar (Carroll and
Briscoe 2001). Created using tsgpipe,
with the TOK files as input. This tar file also includes the output of
the TSG for the entire Ohsumed corpus, though we do not distribute the
TSG from this website. A sample of the TSG output is here.
The TSG will shortly be available from http://www.cogs.susx.ac.uk/lab/nlp/rasp/).
Pipelines
Following the links below will provide you with the pipelines we have
used to process the data. However, in order to download the
executables and resource files needed to run the pipelines, you must
first fill out a research licence agreement for LT TTT. When you have
done this, please email Claire
Grover in order to be informed of the location from which the
files can be downloaded. Existing licence holders will also need to
contact Claire since the DISP executables and some resource files are
upgrades of those contained in the current release of LT TTT. All
please note that the executables are Solaris only. We hope to make
a Linux version available in the near future as part of the next
release of LT TTT.