19 Nov 2001
There is no winner. The purpose of this panel is not to decide which schema language is universally best or right. There is no more a best schema language than there is a best programming language or a best make and model of automobile. Different tools are suited to different tasks.
The purpose of this panel is to survey the popular schema languages, identify some of the distinguishing features that one might use to decide which language was best suited to a particular task, and give users and schema authors a chance to ask schema language designers and experts questions about their favorite schema technologies.
Norman Walsh participates actively in a number of standards efforts worldwide, including the XML Core, XSL and XML Schema Working Groups of the World Wide Web Consortium, the OASIS XSLT Conformance and RELAX NG Committees, the OASIS Entity Resolution Committee, for which he is the editor, and the OASIS DocBook Technical Committee, which he chairs.
John Cowan ...
Schema languages can be evaluated on many criteria, including:
Datatype library richness.
And many others …
To obtain a representative selection of schemas for analysis, we asked each team to provide two sets of schemas, one of their own design and another to satisfy a specific set of user requirements. Our goal was to allow the schema authors the freedom to demonstrate the unique strengths of their language on the one hand and to obtain some basis for an “apples-to-apples” comparison on the other; see Appendix A.
The analysts examined four schema languages in detail: [XML 1.0] DTDs, [W3C XML Schema], [RELAX NG], and [Schematron]. These are by no means the only schema languages available, but we felt that these languages cover the vast majority of the current landscape. Your favorite schema language, even if it isn't represented in this sample, is no more wrong or right than any of these.
FIXME: general observations
FIXME: discuss the team's schemas
7 months [W3C XML Schema]
SOX, XDR, DDML, …
ad hoc. A formalism is being developed after the fact.
FIXME: general observations
FIXME: discuss the team's schemas
4 months [RELAX NG]
RELAX and TREX
FIXME: general observations
FIXME: discuss the team's schemas
The common schemas were designed to probe the following language features:
The following table provides an “at a glance” summary of how the languages compared, as demonstrated by the teams submission of the common schemas.
|Language Feature||DTDs||W3C XML Schemas||RELAX NG||Schematron|
|Multiple top-level elements||Yes||Yes||Yes||Yes|
|Content model extensibility||Some[h]||Yes||Yes||?|
|Any namespace (XSL)||No||No||Yes||?|
|Any namespace (nesting)||No||Yes||Yes||Yes|
|ID-type element content||No||Yes||No||Yes|
|Sibling attribute values||No||No||No||?|
|Element type from attribute presence||No||No||?||Some[i]|
|Element type from attribute content||No||No||?||Some[i]|
|Attribute type from element content||No||No||?||Some[i]|
|Attribute value exclusion||No||No||No||Yes|
[a] Namespace prefixes are fixed on a per-document basis at best.
[b] SGML had the "&" connector, but it's not in XML.
[c] At the top-level of an element declaration
[d] There are a tiny set of data types.
[e] W3C XML Schema Part II datatypes
[f] Datatype library identified by URI
[g] Some simple types could be checked by assertion.
[h] Dependent on the DTD author using parameter entities appropriately.
[i] By writing appropriate assertions.
[j] On attribute values only.
[l] Using W3C XML Schema Part 2 datatypes
[m] SGML had exclusions, but XML does not.
[n] By content model manipulation
[p] Requires explicit hierarchy
This appendix summarizes the three schemas that each team was required to implement (The full text of the original requirements is also available). All of these schemas are contrived. Schema authors were encouraged to balance readability with absolute adherence to the requirements.
Cross-referencing mechanisms are intentionally vague. Some schema languages, like DTDs, will have to use ID/IDREF. Other teams may choose to use different language facilities for this purpose. All teams were free to add attributes as necessary to accomplish the required linking.
The first schema is for a technical memorandum. It does not have a namespace. It begins with either a memo or techmemo element, these are synonymous. It consists of a head and body.
The head contains exactly one each of the following: date, author, and title. It may also include any number of meta elements from the XHTML namespace. These elements may appear in any order.
The date must be a valid date, the other head fields contain only text. What constitutes a valid date is intentionally vague. If you're using a datatype library that supports something that might reasonably be called a “date type”, you can use that. If you prefer to use a regular expression, that's fine too. As long as you describe how you interpreted “valid date” and how you achieved validation of that date, it's up to you.
The body contains a mixture of zero or more para and list elements. The emph, footnote, footnoteref, and link (a simple XLink) elements may appear inside para, along with text.
The footnote element contains one or more para elements. However, footnotes may not nest; a footnote may not contain a footnote as a descendant. The footnoteref element is empty; it has a required ref attribute which must point to a footnote. The emph and link elements contain only text.
A list consists of an optional title followed by two or more item elements. An item may contain text and inlines (emph, footnote, footnoteref, and link), or one or more para elements, but not both. All of the items in a list must have the same kind of content (all text and inlines or all paragraphs).
The second schema is for a white paper, it also has no namespace. It intentionally shares many of the same structures as the technical memorandum schema. Teams are invited to factor the common bits, write one schema as a customization of the other, or otherwise take advantage of as much reuse as is practical.
A whitepaper consists of a required head followed by a mixture of para and list elements (which may be absent), followed by zero or more section elements and an optional glossary.
The head must contain exactly one date and one title. It must contain at least one author. It may contain at most one titleabbrev element. It may contain zero or more copyright, keywords and legalnotice elements. It may also contain any number of meta elements from the XHTML namespace. The order of elements in the head is irrelevant.
The keywords element has an optional vocabulary attribute. If multiple keywords are provided, they must come from different vocabularies.
The content of date, author and title elements is as before. The titleabbrev element contains text. The keywords element contains a whitespace-delimited list of one or more tokens (there are no restrictions on the characters in the tokens). The legalnotice contains an optional title followed by one or more para elements. Finally, copyright contains one or more year elements and one or more holder elements, in that order.
Copyright years should be valid years, holders simply text.
Sections must have a head, but only the title element is required in section heads. The body of a section consists only of paragraphs, lists, and optionally trailing sections.
The whitepaper schema adds a new inline to the content of para: glossterm. A glossterm must point to a glossdef. If the glossterm has a ref attribute, that attribute points to the definition, otherwise the body of the glossterm is to be used for the cross reference.
A glossary consists of an optional head (of the same form as section) followed by one or more glossdef elements. Each glossdef consists of a term followed by one or more paragraphs or lists. The terms contain only text.
The order form schema uses addresses for both billing and shipping information. For our purposes, there are two kinds of addresses in the world: US addresses and international addresses. A US address consists of the following fields: one or more street elements followed by city, state (which must be one of the 50 US state postal abbreviations), zip (which must be either a five digit zip code or a nine digit “zip+4” code), and an optional country. If country is specified, it must be “US”.
An international address consists of: one or more street elements followed by city, an optional stateOrProvince, an optional postalcode, and a country.
Either of these forms may be used for the address fields of the order form schema.
The namespace name for elements in the order form schema is “urn:x-xmlns:example:orderForm”. An orderForm contains exactly one of each of the following elements, in this order: billToAddress, order, shippingInfo, and paymentMethod. If the billToAddress is a US address and the state is not one of the following: AK, DE, HI, MT, NO, OR, or WY, then the orderForm must also include a salesTax element immediately after the order.
The orderForm may additionally contain any element not from the order form namespace, provided that the expanded-name of the element has a non-null namespace URI. Elements not from the order form namespace may not contain elements or attributes from the order form namespace.
An order consists of one or more item elements.
Each item begins with an itemNumber. If the item number has the form “CL-” followed by a four digit number, it is a clothing item. If it has the form “NC-” followed by a four digit number, it is a non-clothing item. If it matches neither pattern, it is invalid.
Non-clothing items have the following additional fields: description, quantity, and unitPrice in that order. The description may contain text and elements from any namespace other than the order form namespace (including elements whose expanded-name has a null namespace URI and without any restricton on their content). The quantity must be a positive integer. The unitPrice must be a positive decimal number with two digits after the decimal point. The quantity element is optional, if it is not specified, it must default to “1”; description and unitPrice are required. The description must not be empty and may not contain only whitespace.
Clothing items must have all of the fields of a non-clothing item, plus the following additional fields (in this order): size (“S”, “M”, “L”, “XL”, “LT”, or “XLT”), color, alternateColor (color and alternate color may not be the same), and an optional monogram which must consist of 1-3 upper-case letters (“A”-“Z”).
The shippingInfo contains a shipToAddress and a shipBy element in that order.
The shipBy is either “USPS”, “FedEx”, “UPS”, or “DHL” (Tokenized element content, like attribute values, may have leading and trailing whitespace). The shipBy must have a shippingCost attribute and may optionally have a rush attribute containing “none”, “3day”, “2day”, or “overnight”. If unspecified, rush defaults to “none”. Overnight shipping is not available to international addresses.
The paymentMethod consists of either a creditCard or a checkOrMoneyOrder. The amount of the payment is recorded in the amount attribute on the paymentMethod.
The creditCard element must have either a type attribute or a type child (it is an error to have neither or both). In either case, the content must be one of the following “Amex”, “Visa”, or “Mastercard”. The creditCard must also have a number and an expiration. For “Amex” payments, the number must be 15 digits long, for “Mastercard” it must be 16, for “Visa” it must be either 13 or 16 digits long. The expiration must match the pattern “99/99”.
The checkOrMoneyOrder element is empty.
Finally, salesTax must be a positive decimal number with two digits after the decimal point.
[ISO 8879:1986] JTC 1, SC 34. ISO 8879:1986 Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML). 1986.
[XML 1.0] Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, Eve Maler, editors. Extensible Markup Language (XML) 1.0 Second Edition. World Wide Web Consortium, 2000.
[W3C XML Schema] Henry S. Thompson, David Beech, Murray Maloney, et al. editors. XML Schema Part 1: Structures. World Wide Web Consortium, 2000.
[W3C XML Datatypes] Paul V. Biron and Ashok Malhotra, editors. XML Schema Part 2: Datatypes. World Wide Web Consortium, 2000.
[RELAX NG] James Clark, editor. RELAX NG Specification (Committee Specification). OASIS. 2001.
[RNG DTD] James Clark, editor. RELAX NG DTD Compatibility (Committee Specification). OASIS. 2001.
[Schematron] Rick Jelliffe, editor. The Schematron Assertion Language 1.5. Rick Jelliffe and Acedemia Sinica Computing Centre. 2001, 2001.