DISP
   Data Intensive Semantics and Pragmatics   



From this page we make available as much of our data and processing pipelines as possible. The data listed is available via anonymous ftp from ftp.cogsci.ed.ac.uk/pub/disp, or by clicking on the links below. A research licence for LT TTT is needed for the pipelines - see below.

Data

The Ohsumed Corpus is available by anonymous ftp from medir.ohsu.edu/pub/ohsumed and is divided across five very large files, one for each year between 1987 and 1991. We found it convenient to split the files into smaller units and we make available the smaller files both in their raw form as well as converted to XML and annotated in various ways. The experiments which used CASS and the Tag Sequence Grammar are reported on in Grover, Lapata and Lascarides (submitted) and Grover, Klein, Lascarides and Lapata (2002) (see DISP papers).

Pipelines

Following the links below will provide you with the pipelines we have used to process the data. However, in order to download the executables and resource files needed to run the pipelines, you must first fill out a research licence agreement for LT TTT. When you have done this, please email Claire Grover in order to be informed of the location from which the files can be downloaded. Existing licence holders will also need to contact Claire since the DISP executables and some resource files are upgrades of those contained in the current release of LT TTT. All please note that the executables are Solaris only. We hope to make a Linux version available in the near future as part of the next release of LT TTT.