Work Package 1: Identification and Description of Test Cases We will need a number of independently designed document types, application data models, and pairs of the two to use in the other work of the project. Examples will be sought from existing document-oriented DTDs, such as the XHTML DTD (see http://www.w3.org/TR/2000/REC-xhtml1-20000126/), from existing data-oriented DTDs, such as the e-commerce transaction DTDs of the Open Applications Group (see http://www.openapplications.org/news/990118.htm) in addition to the schema, DTD and data model for XML Schema itself, which we have already worked with in the unmarshalling direction. Starting from the application data end, possible examples include the patient record model of the European Committee for Standardization CEN TC251 (see http://www.centc251.org/) in UML, the Dublin Core bibliographic metadata model (see http://purl.org/dc/) in RDF, and a teachers-courses-students database developed jointly between the Language Technology Group and Microsoft in the Entity-Relation model. The deliverables will include not only an inventory of designs, but at least some example documents in each case, as well as schemas or partial schemas, which will have to be created in many cases.
Work Package 2: Schema Annotation
XML Schema provides for arbitrary annotations to be added to the
declarations and definitions (called schema components) which together
define a document type, that is, a set of documents with a common
structure and common purpose. The deliverable from this work package
is a design for a set of annotations which specify the correspondence
between schema components and application data model components. This
breaks down into two tasks:
a) design the syntax of the annotation mechanism, that is, whether to use elements or attributes, which aspect of schema extensibility to exploit, etc.;
b) determine what vocabularies are needed to identify the different kinds of application model components, e.g. entity, relation, attribute etc. for Entity-Relation, or class instance, instance variable, list for an object-oriented programming language.
Work Package 3: Language-specific Unmarshalling: Direct prototype
This Work Package is the proof of concept for the architecture we have in mind, developing a prototype for the simplest part of the overall design, and testing it on real examples.
Task 3.1: Design and Construction
The deliverable of this Work Package is an implementation of a compiler from XML Schemas containing language-specific annotations as designed in Work Package 2 above into an XSLT stylesheet for unmarshalling, using the XSLT element extension mechanism. Three tasks here:
a) design one set of language-specific extension elements;
b) Implement the schema 1#1 stylesheet compiler, specialised to one language-specific vocabulary and targeting those extension elements;
c) implement the extension classes in the context of an XSLT implementation. The choice of implementation will determine the choice of language: C if we use our own streaming XSLT processor, Java if we use XT, James Clark's public domain XSLT processor.
Task 3.2: Deployment and Testing
This implementation gives us the basis for actually annotating schemas and unmarshalling from real document instances to test the viability of our approach, and explore its appropriateness to the three broad use-cases as outlined above in Work Package 1. The deliverables will be annotated versions of schemas for a number of the test cases delivered by Work Package 1 above, along with improved versions of the deliverables from Task 3.1.
Work Package 4: From Low- to High-level Annotation
To achieve our goal of appropriate modularisation, we need to elaborate the first deliverable from Work Package 3 above (the schema-annotation to stylesheet compiler) so that instead of implementation-language-dependent annotation vocabularies which can be compiled directly to application-structure-building stylesheets, we can annotate a schema in a domain-appropriate high-level modelling language such as Entity-Relation, EXPRESS or UML, and then parameterise the compilation process by a further specification of the correspondence between terms in that high-level model and implementation terms in a particular operating environment. The deliverables are thus parallel to those of Tasks 3.1 and 3.2 above, but based on high-level annotation and parameterised compilation.
Work Package 5: Marshalling
All the above has focussed on unmarshalling from XML documents into application structures. This Work Package turns to the question of marshalling application structures into XML documents. There are two different approaches to be explored here. The first is based on compiling the annotations of Work Package 3 into e.g. methods for the classes involved to achieve marshalling functionality. We are not aware of any language-independent precedent for this, in cases where the document structure was designed prior to or independently from the application data model.
Ambiguity and overloading are the obvious stumbling blocks. The alternative, although more speculative, also offers more if it is successful. There is some literature in areas of formal language theory applied to layout (e.g. Feng 1993) and applied to translation (e.g. Yellin 1988) which explores the annotation of one of a pair of grammars, either context-free or finite-state, with information about its relation to the other. The work may offer help in at least identifying properties of document schemas (which are isomorphic to context-free grammars) and application data models (which are relatable to grammars in most cases) which would be necessary to allow the automatic creation of marshalling understood as a similar sort of transformation as in the unmarshalling case.
Work Package 6: Second-order compilation
When an application data model exists, but no document definition work has been done to support XML receipt or delivery, automatic generation of an XML schema complete with annotation would fit well into the overall picture.
Task 6.1 Design Exploration
In principle it would be best to start at the modelling language level, then map from UML or EXPRESS or RDF schema to
1) an XML Schema;
2) the necessary schema annotations.
The major stumbling block that we envisage is when there are re-entrancies or circularities in the application model: when to choose foreign keys/ ID- IDREF/linking vs. embedding. The work on dataguides introduced above offers a promising place to start, going from data 1#1 semi-structured data 1#1 data guide 1#1 schema. How to generate the appropriate mapping annotations as part of this process is an open question.
Task 6.2: Implement and Evaluate
For evaluation, we can take the second-order compiler, apply it to a model for which we actually do have an existing schema, e.g. XML Schema itself, and compare results.