Based on an idea of Tim Berners-Lee as elaborated by Richard Tobin
XML processing models are slowly getting more attention, particularly in the form of pipeline languages, e.g. Sun's original W3C Note, Markup Technology's MT Pipeline variant thereof, Sean McGrath's XPipe, Norm Walsh's SXPipe, Orbeon's ???, 1060 Research's NetKernel, . . . All of these consist of declarative descriptions of a configuration of processing steps which can be applied to XML documents to produce other XML documents.
λ-XML approaches
the XML processing problem in a rather different way, in terms of XML documents
which are interpreted as specifying their own processing. It can be understood
as the logical extension of the treatment of the xml-stylesheet
processing instruction by browsers.
The basic idea is that although normally the proximate semantics, as we might call it, of an XML document is its own XML infoset, it is possible for some classes of XML documents to have a derived or second-order semantics, typically (always?) a different infoset.
One alternative is as close to lambda calculus as possible: elements in the λ-XML namespace are function applications, with the function identified by the element name, their attributes and children are their arguments, anything not in the λ-XML namespace is implicitly backquoted, with xi:include functioning as comma. The functions all map from infosets to infosets, and the derived semantics of λ-XML documents is just the value of their outermost function.
I think that making XInclude happen by default is the right thing, but we obviously need some way to block it.
The obvious functions are lx:PSVI
and lx:resultSet
:
<lx:PSVI schemaDocuments="po.xsd"> <purchaseOrder xmlns="http://www.example.com/PurchaseOrder" orderDate="1999-10-20"> . . . </purchaseOrder> </lx:PSVI>
<lx:resultSet xmlns:xi="http://www.w3.org/2001/XInclude" stylesheet="po.xsl"> <xi:include href="po.xml"/> </lx:resultSet>
The obvious and fundamental point is that these can compose:
<lx:resultSet xmlns:xi="http://www.w3.org/2001/XInclude" stylesheet="po.xsl"> <lx:PSVI schemaDocuments="po.xsd"> <xi:include href="po.xml"/> </lx:PSVI> </lx:resultSet>
Strictly speaking we should have only one element in the λ-XML
namespace, namely lx:infoset
(equivalent to apply
in LISP), with a function
argument
which points to a REC/namespace/language. Then we would view
lx:PSVI
and lx:resultSet
as convenient shorthands,
e.g. for:
<lx:infoset xmlns:xi="http://www.w3.org/2001/XInclude" function="http://www.w3.org/TR/1999/REC-xslt-19991116"> <xi:include href="po.xml"/> <xi:include href="po.xsl"/> </lx:infoset>
Can we have a systematic polymorphism, to allow embedding of schema documents/stylesheets as first n-1 arguments, or arguments > 1? If so, which?
Moving beyond straight-line, do we e.g. design explicit lx:cond
and lx:mapcar
(== viewport), or just use lx:transform
and a Literal
Result Element stylesheet? LRE is close, but not quite right: For conditional stuff,
you want the condition outside the document, and for viewporting
you want the document itself outside the results, not some
constant.
Richard: something more like lxreplace (:-)
For viewporting, seems like we want something which produces a new infoset by replacing specified infoitems with a transformed version -- we could use a single XSLT template rule:
<lx:itemsReplaced xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xi:include href="po.xml"/> <xsl:template match="price"> <amount currency="USD"> <xsl:apply-templates/> </amount> </xsl:template> </lx:itemsReplaced>
For conditional processing, the nesting isn't working for me, yet:
<lx:cond xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML"> <lx:case test="/root/@version > 3.0"> <lx:resultSet stylesheet="current.xsl"> <lx:PSVI schemaDocuments="current.xsd"> <xi:include href="#subject"/> </lx:PSVI> </lx:resultSet> </lx:case> <lx:otherwise> <lx:resultSet stylesheet="stale.xsl"> <lx:PSVI schemaDocuments="stale.xsd"> <xi:include href="#subject"/> </lx:PSVI> </lx:resultSet> </lx:otherwise> <xi:include href="po.xml"/> </lx:cond>
That's ugly, the #subject
is meant to refer to the final
argument of the enclosing lx:case
, i.e. the included po.xml, but
that's nowhere near obvious . . .
This doesn't seem right -- how can we do conditionals in a pure functional language w/o binding? Maybe we can't . . .
At least lets look at what it would be with binding:
<lx:let xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML"> <lx:bind name="subject"> <xi:include href="po.xml"/> </lx:bind> <lx:cond> <lx:case test="$subject/root/@version > 3.0"> <lx:resultSet stylesheet="current.xsl"> <lx:PSVI schemaDocumentss="current.xsd"> <xi:include href="#subject"/> </lx:PSVI> </lx:resultSet> </lx:case> <lx:otherwise> <lx:resultSet stylesheet="stale.xsl"> <lx:PSVI schemaDocuments="stale.xsd"> <xi:include href="#subject"/> </lx:PSVI> </lx:resultSet> </lx:otherwise> </lx:cond> </lx:let>
Well, that's the best I can do for the time being. . .
We need a lx:try/lx:catch
or somesuch for failure
management, plus lx:fail
, presumably. We need to add an
lx:validPSVI
to check validity of a PSVI, or fail.
The good news is we don't need lx:lambda
or
lx:bind
for the equivalent of straight-line pipes, or even
the common kinds of viewporting which reduce to lx:itemsReplaced
.
We need an explicit lx:xincluded
function in any case, to
deal with cases where e.g. a stylesheet introduces xi:include
elements into its output. Getting auto-xinclude right is going to be tricky .
. . .
Where/how does this all happen? We have defined the semantics of the λ-XML namespace as a kind of β-reduction which maps from its argument infoset(s) to the infoset resulting from the relevant function application . . . One could imagine a λ-XML proxy, which would deliver the result.
Default is that all URI targets are subjected to λ-XML β-reduction themselves (only an issue if we support the ad-hoc form?)
What about the approach to naming exemplified here? I've tried to use
terms descriptive of the result infoset (infoset
,
PSVI
, resultSet
, itemsReplaced
), but I
started with more procedural names (apply
, validate
,
transform
, replace
) respectively.
My initial thought had been to use existing indicators as alternatives to
explicit elements, e.g. xml-stylesheet
,
xsi:schema-location
, the presence of an nsbinding for http://www.w3.org/2001/XInclude
. I'm in two minds about this. On the proxy
semantics view, one could imagine two versions of the proxy, one of which
treated such things as implicit function applications and one of which didn't.
Now suppose we added lx:let
-- we'd be within epsilon of a
full-blown XML programming language:
<lx:let xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML"> <lx:bind name="ss"> <lx:itemsReplaced> <xi:include href="fortunes.xml"/> <xsl:template match="/"> <xsl:apply-templates select=".//fortune[day='Tuesday']/xsl:stylesheet"/> </xsl:template> </lx:itemsReplaced> </lx:bind> <lx:resultSet stylesheet="#ss"> <xi:include href="#stdin"/> </lx:resultSet> </lx:let>
Not so crazy after all. Compare the following to the
lx:itemsReplaced
example above:
<lx:map xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML" match="item"> <lx:lambda args="price"> <amount currency="USD"> <xi:include href="#item"/> </amount> </lx:lambda> <xi:include href="po.xml"/> </lx:map>
Much cleaner, and more general. So that's true viewporting. What was
described above as viewporting wasn't, because it didn't embed an arbitrary
λ-XML namespace expr == infoset. We still want
lx:itemsReplaced
, because it can do finer-grained things, e.g.:
<lx:itemsReplaced xmlns:lx="http://www.ltg.ed.ac.uk/~ht/lambdaXML"> <xi:include href="po.xml"/> <xsl:template xmlns:xsl="http://www.w3.org/1999/XSL/Transform" match="price"> <xsl:copy> <xsl:attribute name="currency">USD</xsl:attribute> </xsl:copy> </xsl:template> </lx:itemsReplaced>