The issue of union types, often discussed in the past by this group, was
raised again as a Last Call issue (LC-2: Conjunction types? and others). In its simplest form, the requirement at the core of this and other comments is for a simple type definitions which disjunctively combines two or more other simple type definitions, with a result whose lexical and value spaces are the unions of those of its input types. A simple and indeed prototypical example is the maxOccurs
attribute on the element
element in XML Schema itself: it should be constrained as a union of non-negative-integer
and an enumeration with the single member unbounded, but the current mechanisms for defining simple types do not allow for this.
In response to this issue, the above-named examined a number of approaches, and present the following design for consideration by the group. We believe it is simple both to explain and to implement, and will address the stated requirements well. The opportunity to reconsider certain aspects of the existing design has allowed for some overall simplification and cleanup, as well, with an increase in parallelism at the level of XML representation with the definition of complex types.
From the current unitary form of simple type definitions, with six properties: name, base type definition, facets, fundamental facets, variety and target namespace, we move to a core+variants structure, as follows:
atomic
, list
or union
, which in turn
determine which of the subsequent variants are filled in:
enumeration
, length
, min/maxLength
and pattern
)enumeration
and pattern
)The semantics are straightforward: a string is schema-valid per a union type definition iff it satisfies the specified facets and is valid per at least one of the constituent type definitions. Processors must not check beyond the first constituent they find which the string satisfies. The type definition outcome property in the PSV infoset reports both the union type definition and the constituent type definition which matched.
We considered a range of other strategies and constraints, particularly on whether to allow overlapping lexical spaces or not, and in the end concluded that attempting to rule out overlapping was a bad idea, as it would rule out e.g. float and double as members of a union. We also considered allowing nested spaces to be disambiguated in favour of the most specific, but in the end concluded that given that user-order would have to play a role in the case of non-nested overlap, it was better to use it for everything. A note will be needed that having e.g.double
before
float
will be pointless given our choice.
The constraint on the constituents of union types to be atomic or list does
not rule out unions of unions at the XML representation level, it simply
requires them to be unfolded at schema construction time.
We also considered making binary
a fourth possible value
of variety, to reflect both its parameterisation by an encoding type
and its (very) limited inventory of allowed facets, but didn't reach consensus
on this point.
We thought we would take this opportunity to move
<simpleType>
more in line with the (newly simplified itself)
form of <complexType>
. Accordingly the three basic ways of
producing new simple type definitions from old all have a common shape: a
simpleType
element with an optional name and a choice between
restriction
, list
or union
as the
single required child (after optional annotation
).
The restriction
option has either a base
attribute (a QName) or a simpleType
child, and allows the facets
appropriate to that base type as children. Also, if the base is a list, then a
list
child whose type
restricts the base's type
is alternatively allowed. Similarly, if the base is a union
, a union
child whose types
restrict the base's types
is a possibility.
The list
option has either a type
attribute (a
QName) or a simpleType
child.
The union
option has a types
attribute (a list of
QNames) and any number of simpleType
children.
This design tightens the content models and matches them better to their use, without completely eliminating semantic dependencies. So although we can now do much better at allowing only the appropriate facets for lists, the allowed facets for the restriction case are still a function of the base type, which cannot be expressed in the schema for schema documents.
An example taken from XHTML (I gather) of an attribute definition would be as follows on this account:
<xs:attribute name="size"> <xs:simpleType> <xs:union> <xs:simpleType> <xs:restriction base="xs:positive-integer"> <xs:maxInclusive="10"/> </xs:restriction> </xs:simpleType> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="small"/> <xs:enumeration value="medium"/> <xs:enumeration value="large"/> </xs:restriction> </xs:simpleType> </xs:union> </xs:simpleType> </xs:attribute>
This example uses only embedded anonymous simple types, but a list of QName-references can be used for named constituents, or the two combined as required. The parallelism of the cases means that you are never forced to name a type definition just in order to use it as part of another definition, so for instance to constrain the length of a list of constrained strings without exposing the string type itself, the following will work:
<simpleType name='fourTuple'> <restriction> <simpleType> <list> <simpleType> <restriction base='string'/> <enumeration value='1'/> <enumeration value='one'/> </restriction> </simpleType> </list> </simpleType> <length value='4'/> </restriction> </simpleType>
We strongly recommend that a further change to the content model of
<complexType>
should be made to bring the two completely in to
line: eliminate the derivedBy
attribute here as well, in favour
of a required child, either <restriction>
or
<extension>
, with a choice between base
attribute
or nested type definition, as above. Only <restriction>
would
be allowed to have neither base
nor a nested type definition,
in which case the actual base would default to the appropriate flavour of
urType, as it does now.
This does represent a backward incompatible change: existing schema documents will become invalid. To reduce the practical impact of this, Martin Gudgin has produced XSLT stylesheets which do the necessary forward conversions, and we'll make these available if these changes are agreed.