HTML is a language—no, it's a media type—no, it's a namespace. . .
Henry S. Thompson
17 Mar 2009
1. Lighthearted introduction
[Queue audience participation from aging SNL fans:]
GR New Shimmer is a floor wax!
DA No, new Shimmer is a dessert topping!
GR It's a floor wax!
DA It's a dessert topping!
GR It's a floor wax, I'm telling you!
DA It's a dessert topping, you cow!
CC Hey, hey, hey, calm down, you two. New Shimmer is both a floor wax and a dessert topping!
2. You can't tell the players without a program
One of the things the TAG tries to do is tease apart different aspects of
a problem
In particular, technical issues
Versus social/management issues
Not in order to dismiss the non-technical ones
But to make sure they are addressed in the right way
So I'm going to briefly (two more slides) set out the
technical background to the issue at hand
3. Media types, applications and languages
Media types such as image/png, text/plain and
application/pdf are essentially a dispatching mechanism
RFC 2046 says "the top-level media type is used to declare the general
type of data, while the subtype specifies a specific format for that
type of data"
It's tempting to understand media types as signalling what
language a particular message segment or document uses
Where by language I mean what SGML used to call an
application:
A syntax, which answers the question "What do documents in this
language look like"
A semantics, which answers the question "What does a document in
this language mean?" or "What should I do with documents in this language?
And of course there's a pun here: messages/documents get dispatched to
applications for processing
4. A digression about extensibility
I used XHTML+SVG+MathML exclusively for the slides for my most recent
lecture course
That's one case of media-type-based language detection (image/svg+xml)
And one case of namespace-based extensibility (MathML embedded in SVG)
And a browser which dispatches on both
Everybody wins!
5. Languages, namespaces and versions
But 'language' is too fuzzy a term
And the SGML definition is silent on the question of how much has to
change before you have a 'new' language
What distinguishes a new language from a new version?
"A language is just a dialect with an army and a navy"
For XML languages, namespaces can sometimes discriminate between
different dialects/versions/...
But namespaces and languages are not one-to-one
Some languages use multiple namespaces
Some namespaces don't change across language versions
So language or version identifiers are often included
within the specification of a language to make the final discrimination
Alternatively, language evolution is strictly controlled, so that the 'must
ignore' strategy always works, and no identifier as such is supplied
6. Conclusion
There is no technical reason why XHTML1.0, XHTML Basic,
XHTML1.1, HTML5 and XHTML2 should not share the same media type and namespace