[
Lists Home |
Date Index |
Thread Index
]
[Zielinski, Marek]
> I am trying to define restrictions on string lengths using Schema. The
data
> actually comes from databases, and is exchanged between two different
> systems. I encountered a snag: when the string contains one of the
reserved
> characters, like "&", the parser automatically translates it into an
entity,
> e.g. &. This increases the length of the string, and now the string
does
> not fit; the validator (I am using XMLSpy) rejects it as too long.
>
XML input can never actually contain the literal character "&" (unless it is
in CDATA, of course). It must be written as & a m p ; (less the spaces).
Even if it were held in the database as a single ampersand character, when
serialized to xml it would have to be escaped. Once parsed, the string
passed to the application should contain the original "&" character.
So the real question is, does the length restriction in xml schema apply to
the raw xml or the the string stored in the infoset after parsing?
Section 2.2 of Part 1 of the Schema Rec says
" The concepts and definitions used herein regarding XML are framed at the
abstract level of information items as defined in [XML-Infoset]. By
definition, this use of the infoset provides a priori guarantees of
well-formedness (as defined in [XML 1.0 (Second Edition)]) and namespace
conformance (as defined in [XML-Namespaces]) for all candidates for
·assessment· and for all ·schema documents·. "
So clearly xml schema assessment has to take place after all the entities
and character references have been replaced by the parser.
So this looks like a bug in Spy.
Cheers,
Tom P
|