In the world of the languages we're interested in, there are three levels:
On this account we have strings, (abstract) data models, and reality.
I think it's fair to say that the most straightforward mapping from the traditional model-theoretical view to the three-level story is to say that the sentential forms and models of model theory correspond to levels (2) and (3) of the three-level view. The relationship between (1) and (2) is typically held to be trivial, or at least uninteresting.
But I think that it's actually levels (1) and (2) that (most of) the TAG have had in their minds, at least some of the time. And I include myself in that generalisation, at least in part.
There are certainly plenty of cases where (1) and (2) are uninterestingly different (consider e.g. LISP -- the mapping from the string "(+ 3 4)" to the corresponding s-expression composed of three cons pairs, one atom, two numbers and nil is just not the locus of what's interesting about LISP).
For the XML languages, the situation is further complicated. Call the mapping from surface string to data model Θ, and the relationship between data model and the appropriate model/world/domain of discourse Φ. For an XML language such as SVG we then have
XML: document --Θ--> infoset SVG: infoset --Θ--> data structure --Φ--> bitmaps
or for RDF:
XML: document --Θ--> infoset RDF: infoset --Θ--> relational graph --Φ--> world
or for some XML-based business language
XML: document --Θ--> infoset POL: infoset --Θ--> java class instances --Φ--> ???
The two Θs compose in each case, of course, but they are quite different in character, or rather, where they get you is quite different. It's hard to see what concrete domain arbitrary XML infosets can be interpreted as making claims about, whereas SVG data structures certainly can be interpreted as making claims on bitmaps, etc.
[For a very interesting attempt to talk about the general matter of markup semantics, see Sperberg-McQueen, Huitfeld and Renear, Meaning and interpretation of markup, _Markup Languages: Theory and Practice_ v2 n3, pp 215--234, MIT Press, 2001 and Sperberg-McQueen, Dubin, Huitfeld and Renear, Drawing inferences on the basis of markup, in T. Usdin, ed., Proceedings of Extreme Markup Languages, 2002, IDE Alliance. The latter is probably the better introduction.]
So, the net-net of all this? I don't think we can ignore any of the three levels, but I wonder if we can get most of what we need from the two which are fully under our control, as it were, namely (1) and (2). The main reason for this is that in too many cases which we need to cover, we can't ignore (2) because it's where the semantics of the language are normatively stated (see e.g. SVG, RDF, XML Schema), and we can't ignore (1) because it's not even close to being 1-to-1 with (2), and in any case it's what people are used to actually seeing and manipulating.
Consider XML Schema, SVG and POX (purchase-order XML). These have a very different feel to their semantics, if you look carefully. The kind of contrast is that referred to with the words 'declarative' and 'performative' in the philosophy of language -- contrast "Snow is white" with "I pronounce you man and wife". Declarative sentences have what John Searle calls the "word-to-world" direction of fit -- on hearing such sentences, we ask, of the words, do they fit the world. Performative sentences change the world to fit the words, as long as they are uttered felicitously, that is, the necessary pre-conditions satisfied. Note that not all performatives are so socially bound -- something as simple as "I promise to pay you five pounds" is the same sort of thing.
In between 'declarative' and 'performative' we have 'imperative', for such sentences as "Open the door" and "Please pass me the salt" and even "Can you slow down a bit". These have a sort of conditional world-to-word direction of fit---either the world, courtesty of the addressee, changes as commanded/requested, or it doesn't.
Not only declarative, but also performative and imperative can be understood in terms of claims on the world: performatives are understood as a pair of sets of claims, pre-conditions and post-conditions. If the pre-conditions are all satisfied, then the post-conditions become satisfied as a result of the utterance. Similarly for imperatives, but the post-conditions are contingent on cooperation from the addressee.
Let's try working with a very simple XML language to see if we can illustrate some of this -- the RGY language. Here's its syntax (a DTD):
<!ELEMENT rgy (l*)> <!ELEMENT l EMPTY> <!ATTLIST l x NMTOKEN #REQUIRED y NMTOKEN #REQUIRED c (r|g|y) #REQUIRED>
And here's its data model (a UML diagram):
The mapping Θ[RGY] from syntax to data model is obvious.
What's perhaps more interesting is that we can easily give both declarative and performative semantics to this data model.
The domain of the model is 8-bit-per-colour RGB bitmaps. We say that
l
)
is satisfied by a bitmap iff one of the following is true
l.x, l.y
in the bitmap is
#ff0000
and l.colour
is red
.#00ff00
and l.colour
is
green
.#ffff00
and l.colour
is yellow
.The domain of interpretation is a set of traffic lights in a city with a rectangular, NS/EW orientated set of streets and avenues, both numbered, each intersection governed by a traffic light, all of which are controlled by a computer which implements a simple web service, which performs actions under the control of messages as follows:
l
) are determined as follows:
l.x
Avenue and
l.y
Street must be green in the NS direction.l.x
Avenue and
l.y
Street must be red in the NS direction.l
)
l.x
Avenue and
l.y
Street to yellow in the NS
direction, wait 4 seconds, then set it to red in the NS direction and green in the EW direction.l.x
Avenue and
l.y
Street to yellow in the EW
direction, wait 4 seconds, then set it to red in the EW direction and green in the NW direction.l.x
Avenue and
l.y
Street to blinking yellow in both directions.We can define entailment for either approach to the semantics of RGY. An
instance of RGY (call it A
) entails another instance
of RGY (call it B
) with respect to the declarative
semantics of RGY iff all bitmaps which satisfy
A
also satisfy B
, or, to put it another way, the set
of bitmaps satisfying A
is a subset of the set satisfying B
. I think it's
obvious that this will be true just in case B.light
is a subset of
A.light
, where equality for Lights is property equality.
With respect to the performative semantics, things turn out differently.
We say that message A
entails message B
iff for all
possible initial states of the traffic lights, the
response to A
performs at least all the actions involved in the
response to B
.
Somewhat surprisingly the performative semantics also guarantees that
A
entails B
whenever B.light
is a subset of
A.light
. Suppose B
corresponds to
<RBG> <l x='1' y='1' c='r'/> </RGY>
and A
corresponds to
<RBG> <l x='1' y='1' c='r'/> <l x='1' y='1' c='g'/> </RGY>
Wrt the declarative interpretation, A
(defectively) entails
B
, because no bitmaps satisfy A
.
And wrt the performative interpretation, A
entails
B
, because either in the initial state,
the light at the corner of 1st and 1st is green NS, in which case both A and B
turn it yellow then red, or that light is red NS, which case A turns it green,
and B does nothing, which does satisfy the definition of
entailment above.
The consistency here depends on defining entailment in terms of changes, not final states. . .
What changes for our language diagram?
Can we appeal to entailment w/o appealing to the model except insofar as entailment presupposes it?
Is the functional assumption OK for Θ?
Can we enumerate the kinds of relations between markup changes and data model changes, starting within versions and then looking across versions?
The above notion of data model is too simple -- any language with keys in may be mapped directly into updates at the model level, violating the implicit appeal to some kind of context-free abstract syntax kind of story. For example an Address might be interpreted immediately/directly as an update to a database row keyed by name.
Consider the impact of adding <xs:attribute name="foo" use="prohibited"/>
to a type defined by restriction, with the effect of removing something from the data model.
All this gives us the opportunity to state two levels of relationship: