Towards Λ-XML

Henry S. Thompson
$Id: notes.xml,v 1.2 2005/04/01 14:00:09 ht Exp $

Based on an idea of Tim Berners-Lee as elaborated by Richard Tobin

1.   Introduction

XML processing models are slowly getting more attention, particularly in the form of pipeline languages, e.g. Sun's original W3C Note, Markup Technology's MT Pipeline variant thereof, Sean McGrath's XPipe, Norm Walsh's SXPipe, Orbeon's ???, 1060 Research's NetKernel, . . . All of these consist of declarative descriptions of a configuration of processing steps which can be applied to XML documents to produce other XML documents.

λ-XML approaches the XML processing problem in a rather different way, in terms of XML documents which are interpreted as specifying their own processing. It can be understood as the logical extension of the treatment of the xml-stylesheet processing instruction by browsers.

The basic idea is that although normally the proximate semantics, as we might call it, of an XML document is its own XML infoset, it is possible for some classes of XML documents to have a derived or second-order semantics, typically (always?) a different infoset.

2.   Pure form

One alternative is as close to lambda calculus as possible: elements in the λ-XML namespace are function applications, with the function identified by the element name, their attributes and children are their arguments, anything not in the λ-XML namespace is implicitly backquoted, with xi:include functioning as comma. The functions all map from infosets to infosets, and the derived semantics of λ-XML documents is just the value of their outermost function.

I think that making XInclude happen by default is the right thing, but we obviously need some way to block it.

The obvious functions are lx:PSVI and lx:resultSet:

<lx:PSVI schemaDocuments="po.xsd">
 <purchaseOrder xmlns="http://www.example.com/PurchaseOrder"
                orderDate="1999-10-20">
  . . .
 </purchaseOrder>
</lx:PSVI>
<lx:resultSet xmlns:xi="http://www.w3.org/2001/XInclude"
              stylesheet="po.xsl">
 <xi:include href="po.xml"/>
</lx:resultSet>

The obvious and fundamental point is that these can compose:

<lx:resultSet xmlns:xi="http://www.w3.org/2001/XInclude"
              stylesheet="po.xsl">
 <lx:PSVI schemaDocuments="po.xsd">
  <xi:include href="po.xml"/>
 </lx:PSVI>
</lx:resultSet>

3.   Issues

Strictly speaking we should have only one element in the λ-XML namespace, namely lx:infoset (equivalent to apply in LISP), with a function argument which points to a REC/namespace/language. Then we would view lx:PSVI and lx:resultSet as convenient shorthands, e.g. for:

<lx:infoset xmlns:xi="http://www.w3.org/2001/XInclude"
            function="http://www.w3.org/TR/1999/REC-xslt-19991116">
 <xi:include href="po.xml"/>
 <xi:include href="po.xsl"/>
</lx:infoset>

Can we have a systematic polymorphism, to allow embedding of schema documents/stylesheets as first n-1 arguments, or arguments > 1? If so, which?

Moving beyond straight-line, do we e.g. design explicit lx:cond and lx:mapcar (== viewport), or just use lx:transform and a Literal Result Element stylesheet? LRE is close, but not quite right: For conditional stuff, you want the condition outside the document, and for viewporting you want the document itself outside the results, not some constant.

Richard: something more like lxreplace (:-)

For viewporting, seems like we want something which produces a new infoset by replacing specified infoitems with a transformed version -- we could use a single XSLT template rule:

<lx:itemsReplaced xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML"
            xmlns:xi="http://www.w3.org/2001/XInclude"
            xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xi:include href="po.xml"/>
 <xsl:template match="price">
  <amount currency="USD">
   <xsl:apply-templates/>
  </amount>
 </xsl:template>
</lx:itemsReplaced>

For conditional processing, the nesting isn't working for me, yet:

<lx:cond xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML">
 <lx:case test="/root/@version > 3.0">
  <lx:resultSet stylesheet="current.xsl">
   <lx:PSVI schemaDocuments="current.xsd">
    <xi:include href="#subject"/>
   </lx:PSVI>
  </lx:resultSet>
 </lx:case>
 <lx:otherwise>
  <lx:resultSet stylesheet="stale.xsl">
   <lx:PSVI schemaDocuments="stale.xsd">
    <xi:include href="#subject"/>
   </lx:PSVI>
  </lx:resultSet>
 </lx:otherwise>
 <xi:include href="po.xml"/>
</lx:cond>

That's ugly, the #subject is meant to refer to the final argument of the enclosing lx:case, i.e. the included po.xml, but that's nowhere near obvious . . .

This doesn't seem right -- how can we do conditionals in a pure functional language w/o binding? Maybe we can't . . .

At least lets look at what it would be with binding:

<lx:let xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML">
 <lx:bind name="subject">
  <xi:include href="po.xml"/>
 </lx:bind>
 <lx:cond>
  <lx:case test="$subject/root/@version > 3.0">
   <lx:resultSet stylesheet="current.xsl">
    <lx:PSVI schemaDocumentss="current.xsd">
     <xi:include href="#subject"/>
    </lx:PSVI>
   </lx:resultSet>
  </lx:case>
  <lx:otherwise>
   <lx:resultSet stylesheet="stale.xsl">
    <lx:PSVI schemaDocuments="stale.xsd">
     <xi:include href="#subject"/>
    </lx:PSVI>
   </lx:resultSet>
  </lx:otherwise>
 </lx:cond> 
</lx:let>

Well, that's the best I can do for the time being. . .

We need a lx:try/lx:catch or somesuch for failure management, plus lx:fail, presumably. We need to add an lx:validPSVI to check validity of a PSVI, or fail.

The good news is we don't need lx:lambda or lx:bind for the equivalent of straight-line pipes, or even the common kinds of viewporting which reduce to lx:itemsReplaced.

We need an explicit lx:xincluded function in any case, to deal with cases where e.g. a stylesheet introduces xi:include elements into its output. Getting auto-xinclude right is going to be tricky . . . .

Where/how does this all happen? We have defined the semantics of the λ-XML namespace as a kind of β-reduction which maps from its argument infoset(s) to the infoset resulting from the relevant function application . . . One could imagine a λ-XML proxy, which would deliver the result.

Default is that all URI targets are subjected to λ-XML β-reduction themselves (only an issue if we support the ad-hoc form?)

What about the approach to naming exemplified here? I've tried to use terms descriptive of the result infoset (infoset, PSVI, resultSet, itemsReplaced), but I started with more procedural names (apply, validate, transform, replace) respectively.

4.   Ad-hoc form

My initial thought had been to use existing indicators as alternatives to explicit elements, e.g. xml-stylesheet, xsi:schema-location, the presence of an nsbinding for http://www.w3.org/2001/XInclude. I'm in two minds about this. On the proxy semantics view, one could imagine two versions of the proxy, one of which treated such things as implicit function applications and one of which didn't.

5.   Really crazy idea

Now suppose we added lx:let -- we'd be within epsilon of a full-blown XML programming language:

<lx:let xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML">
 <lx:bind name="ss">
  <lx:itemsReplaced>
   <xi:include href="fortunes.xml"/>
   <xsl:template match="/">
    <xsl:apply-templates select=".//fortune[day='Tuesday']/xsl:stylesheet"/>
   </xsl:template>
  </lx:itemsReplaced>
 </lx:bind>
 <lx:resultSet stylesheet="#ss">
  <xi:include href="#stdin"/>
 </lx:resultSet>
</lx:let>

Not so crazy after all. Compare the following to the lx:itemsReplaced example above:

<lx:map xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML" match="item">
 <lx:lambda args="price">  
  <amount currency="USD">
   <xi:include href="#item"/>
  </amount>
 </lx:lambda> 
 <xi:include href="po.xml"/>
</lx:map>

Much cleaner, and more general. So that's true viewporting. What was described above as viewporting wasn't, because it didn't embed an arbitrary λ-XML namespace expr == infoset. We still want lx:itemsReplaced, because it can do finer-grained things, e.g.:

<lx:itemsReplaced xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML">
 <xi:include href="po.xml"/>
 <xsl:template xmlns:xsl="http://www.w3.org/1999/XSL/Transform" match="price">
  <xsl:copy>
   <xsl:attribute name="currency">USD</xsl:attribute>
  </xsl:copy>
 </xsl:template>
</lx:itemsReplaced>