[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: whitespaces, need a help !
- From: Rick Jelliffe <ricko@allette.com.au>
- To: xml-dev@lists.xml.org
- Date: Wed, 23 May 2001 19:59:17 +0800
From: Vladimir V. Popov <vladimir.popov@arcadia.it>
> I have a xml data with elements which contain only whitespace(s).
> Does anybody know how can I check and constrain these elements for
> using, for example, only such chars (UTF-16): #x0020, "I", "E"?
There are several issues here. For a start, be warned that the regular
expression matching of XML Schema implementations may not be satisfactory
yet: it is too early to have confidence so you should test the particular
tools.
For the XML document, use xml:space="preserve" on the elements that must
preserve whitespace. This probably won't have any effect, but it is good
practise.
(Xml:space was developed because the default SGML system (remove first
leading and trailing newline) was violated by html:pre and a bit tricky for
people. )
In the XML Schema, use datatypes derived from "string" not from "token".
The token datatype (which include most string types) will strip leading and
trailing whitespace characters (following the rules in XML s. 3.3.3). Check
whether "string" or "normalizedString" is appropriate: perhaps
"normalizedString" might be bad if newlines are changed to spaces.
Now for the particular datatypes. You have three choices, I think.
(I have not tried these: no flames please.)
The first is merely to make a regular expression such as
( ' ' | 'I' | 'E' )
That is the preferred option. If it doesn't work, try
( '\ ' | 'I' | 'E' )
as a workaround. If it still doesn't work, you could settle for using
( '\s' | 'I' | 'E' )
which will allow any single whitespace.
The second choice is to make explicit types for "string containing single
space only", "string containing I only", etc. then defining a union. I
would be surprised if current schema implementations have tested for this,
so good luck if you use it!
The third choice is to use a String, but then use enumerations. Again, I bet
this has not been tested by implementers.
Cheers
Rick Jelliffe