2.1 Overall goals

Next: 2.2 Deliverables and Work Up: 2. Programme and Methodology Previous: 2. Programme and Methodology

2.1 Overall goals

Although the overall shape of the desired architecture for data publishing and interchange via XML is clear, and many more or less ad hoc efforts are already under way to instantiate it for particular application/programming language pairs (see e.g. Reinhold 1999, Box 1999), what is really wanted is support for a declarative specification of the relation between an application data model and an XML Schema, each independently defined. In concrete terms this support should yield implementations of language-independent marshalling and unmarshalling, that is, bi-directional conversion between XML instance and application data. Some aspects of a solution are already clear in outline - others will require exploration of possibilities for application of research results from other related disciplines.

We see the proposed research as necessary preparation for standardisation work in this area: member companies of the W3C have recently requested that it undertake work on standardising XML protocols (Larry Masinter, personal communication), while at the same time clarifying that XML-encoded RPC is not what is required: such a move would leave the XML-structure to/from application structure correspondence issue to be solved.

The following questions each need to be addressed to arrive at the desired architecture:

Is the mapping to be specified by annotations within an individual XML Schema, e.g. by adding mapping information to each element and attribute declaration? Alternatively, should the mapping be specified externally, possibly exploiting XSLT?
The directionality aspect of the mechanism deserves special consideration: Both XML Schema and XSLT are by design good matches for the XML 1#1application (unmarshalling) direction. What kind of conditions must be imposed on either type of solution to guarantee reversibility, that is, the application 1#1 XML (marshalling) direction? Again, XML Schema and XSLT both imply a control structure from which an implementation of unmarshalling naturally emerges. What control structure would be required for marshalling?
What are the tradeoffs between specifying the application side of the mapping in implementation-level terms (e.g. Java class instance/variable or relational table/row) versus specifying it in more abstract terms (e.g. Entity-Relation, EXPRESS, or UML (see http://www.uml-zone.com/umlfaq.asp)?
Would specifying an abstract mapping in the schema, and concrete language-specific bindings from abstract model to implementation independently of the schema, give the right modularity?
When only one or the other model is specified in advance (i.e. XML Schema or application data model), can we automatically derive the other? If so, what conventions should be used in doing so?
Does the intermediate position between XML and traditional databases occupied by semi-structured data offer any leverage for the solution to this problem?
What constraints, if any, are required to allow an implementation of unmarshalling to work in a streaming fashion, i.e. to build application structures as an XML document is processed, without building a complete internal representation of the document before application structure construction can begin?

The work proposed here aims at answering these questions, structuring the effort in terms of three broad goals:

Design a declarative approach based as far as possible on existing public standards which supports automatic generation of implementation-language-appropriate unmarshalling of XML documents into application data structures;
Extend the above to support marshalling in a congruent way, i.e. the construction of XML documents from application data structures;
Explore approaches to (semi)- automatically generating appropriately annotated XML schemas from schemas expressed in one or more data modelling languages.

Next: 2.2 Deliverables and Work Up: 2. Programme and Methodology Previous: 2. Programme and Methodology

Henry Thompson
2000-09-13