The issue of union types, often discussed in the past by this group, was
raised again as a Last Call issue (LC-2: Conjunction types? and others). In its simplest form, the requirement at the core of this and other comments is for a simple type definitions which disjunctively combines two or more other simple type definitions, with a result whose lexical and value spaces are the unions of those of its input types. A simple and indeed prototypical example is the
maxOccurs attribute on the
element element in XML Schema itself: it should be constrained as a union of
non-negative-integer and an enumeration with the single member unbounded, but the current mechanisms for defining simple types do not allow for this.
In response to this issue, the above-named examined a number of approaches, and present the following design for consideration by the group. We believe it is simple both to explain and to implement, and will address the stated requirements well. The opportunity to reconsider certain aspects of the existing design has allowed for some overall simplification and cleanup, as well, with an increase in parallelism at the level of XML representation with the definition of complex types.
From the current unitary form of simple type definitions, with six properties: name, base type definition, facets, fundamental facets, variety and target namespace, we move to a core+variants structure, as follows:
union, which in turn determine which of the subsequent variants are filled in:
The semantics are straightforward: a string is schema-valid per a union type definition iff it satisfies the specified facets and is valid per at least one of the constituent type definitions. Processors must not check beyond the first constituent they find which the string satisfies. The type definition outcome property in the PSV infoset reports both the union type definition and the constituent type definition which matched.We considered a range of other strategies and constraints, particularly on whether to allow overlapping lexical spaces or not, and in the end concluded that attempting to rule out overlapping was a bad idea, as it would rule out e.g. float and double as members of a union. We also considered allowing nested spaces to be disambiguated in favour of the most specific, but in the end concluded that given that user-order would have to play a role in the case of non-nested overlap, it was better to use it for everything. A note will be needed that having e.g.
floatwill be pointless given our choice. The constraint on the constituents of union types to be atomic or list does not rule out unions of unions at the XML representation level, it simply requires them to be unfolded at schema construction time. We also considered making
binarya fourth possible value of variety, to reflect both its parameterisation by an encoding type and its (very) limited inventory of allowed facets, but didn't reach consensus on this point.
We thought we would take this opportunity to move
<simpleType> more in line with the (newly simplified itself)
<complexType>. Accordingly the three basic ways of
producing new simple type definitions from old all have a common shape: a
simpleType element with an optional name and a choice between
union as the
single required child (after optional
restriction option has either a
attribute (a QName) or a
simpleType child, and allows the facets
appropriate to that base type as children. Also, if the base is a list, then a
list child whose
type restricts the base's
is alternatively allowed. Similarly, if the base is a
union child whose
restrict the base's
types is a possibility.
list option has either a
type attribute (a
QName) or a
union option has a
types attribute (a list of
QNames) and any number of
This design tightens the content models and matches them better to their use, without completely eliminating semantic dependencies. So although we can now do much better at allowing only the appropriate facets for lists, the allowed facets for the restriction case are still a function of the base type, which cannot be expressed in the schema for schema documents.
An example taken from XHTML (I gather) of an attribute definition would be as follows on this account:
<xs:attribute name="size"> <xs:simpleType> <xs:union> <xs:simpleType> <xs:restriction base="xs:positive-integer"> <xs:maxInclusive="10"/> </xs:restriction> </xs:simpleType> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="small"/> <xs:enumeration value="medium"/> <xs:enumeration value="large"/> </xs:restriction> </xs:simpleType> </xs:union> </xs:simpleType> </xs:attribute>
This example uses only embedded anonymous simple types, but a list of QName-references can be used for named constituents, or the two combined as required. The parallelism of the cases means that you are never forced to name a type definition just in order to use it as part of another definition, so for instance to constrain the length of a list of constrained strings without exposing the string type itself, the following will work:
<simpleType name='fourTuple'> <restriction> <simpleType> <list> <simpleType> <restriction base='string'/> <enumeration value='1'/> <enumeration value='one'/> </restriction> </simpleType> </list> </simpleType> <length value='4'/> </restriction> </simpleType>
We strongly recommend that a further change to the content model of
<complexType> should be made to bring the two completely in to
line: eliminate the
derivedBy attribute here as well, in favour
of a required child, either
<extension>, with a choice between
or nested type definition, as above. Only
be allowed to have neither
base nor a nested type definition,
in which case the actual base would default to the appropriate flavour of
urType, as it does now.
This does represent a backward incompatible change: existing schema documents will become invalid. To reduce the practical impact of this, Martin Gudgin has produced XSLT stylesheets which do the necessary forward conversions, and we'll make these available if these changes are agreed.