Goals for an XML Processing Model working group

Henry S. Thompson
University of Edinburgh
15 August 2005

1.   Introduction

The draft charter for an XML Processing Model WG does not line up with our understanding of the space of possible outputs very well. The following three sections identify what we think the deliverables of such a WG should be:

2.   A Scripting Language for XML

We believe this corresponds to the majority of what is already out there, both open source and commercial. The analogy with UNIX is indeed often appealed to -- whereas under UNIX there are languages for writing pipelines/scripts which are connected with 'thin' pipes, i.e. pipes carrying EOL-separated records consisting of space-separated fields, now we want languages for writing scripts/pipelines connected by 'fat' pipes, i.e. pipes carrying infosets. The 'steps' in such XML pipelines are sometimes W3C-spec-based, e.g. "Transform with this stylesheet" or "Schema-validate with this schema document", but also sometimes either lower-level operations (e.g. XPath-controlled deletion/insertion/substitution) or opaque locally-specified operations (e.g. involving database lookup). We have to hand such a pipeline with 13 XSLT steps, two low-level manipulation steps and one database in/out step.

Most, although not all (The Sun Pipeline language is one exception) of the existing languages are not tied to individual documents, but rather are thought of as specifying a mapping from XML inputs to XML outputs.

3.   Documents describing their own processing

Flipping to the viewpoint of individual XML documents, we may decide we need a way for XML documents to request specific W3C-spec-based processing whenever they are to be used. I think it is premature to insist that the order of e.g. XInclude processing, (schema/dtd) validation, XSLT/XQuery transformation, signing/verifying, en/decrypting, gRDDL interpretation is either unimportant or determinate. Whether it should be possible to control the interpretation of sub-ordinate/referenced documents (parallel to XInclude's 'parse' attribute) is another open question.

There are a range of existing mechanisms for 'signalling' the need for such processing, including the xsi:schemaLocation, xsl:version and data-view:interpreter attributes, the xml-stylesheet processing instruction and the http://www.w3.org/2001/05/xmlenc# and http://www.w3.org/2001/XInclude namespaces. It's another open question whether these are sufficient, or whether something more systematic and explicit is required, e.g. as discussed in our forthcoming f(X) member submission (to appear).

4.   Default processing

In the absence of information which explicitly specifies how a document should be handled, can we say anything at all about what (all|some subset of) XML processors should do if (any|some subset) of the existing signalling mechanisms are present? Clearly this interacts with the previous deliverable, and indeed on one position with respect to that, namely that the existing signalling mechanisms are in fact both complete and deterministic, this point reduces to making recommendations on when to 'do it' and when not to.

5.   Conclusions

We suggest that the charter either say less (because it's tricky to get this all right) or more (along the above lines, in which case we also call attention to our (member-only) posting to the XML Core WG as input to its subsequent suggested Processing Model requirements document).