OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: whitespaces, need a help !

Even i am facing the same problem.Actually i have a tag in XML file as
Later i am storing this in the database.Now the maximum length for this is
8.(As defined in the schema file)
What is happening if there is a TAG value like
<NAME>ABCD                   </NAME>
Then also the validator validates this file successfully and this got
bounced when it goes to the database.Is there any way to trim these
leading/trailing whitespaces.
I can't add anything(extra code) in my XML file as they are
autogenerated.Can something be done at the schema file or in the code where
i am validating the file

-----Original Message-----
From: Rick Jelliffe [mailto:ricko@allette.com.au]
Sent: Wednesday, May 23, 2001 5:29 PM
To: xml-dev@lists.xml.org
Subject: Re: whitespaces, need a help !

 From: Vladimir V. Popov <vladimir.popov@arcadia.it>

>  I have a xml data with elements which contain only whitespace(s).
>  Does anybody know how can I check and constrain these elements for
>  using, for example, only such chars (UTF-16): #x0020, "I", "E"?

There are several issues here. For a start, be warned that the regular
expression matching of XML Schema implementations may not be satisfactory
yet: it is too early to have confidence so you should test the particular

For the XML document, use xml:space="preserve" on the elements that must
preserve whitespace. This probably won't have any effect, but it is good
(Xml:space was developed because the default SGML system (remove first
leading and trailing newline) was violated by html:pre and a bit tricky for
people. )

In the XML Schema, use datatypes derived from "string" not from "token".
The token datatype (which include most string types) will strip leading and
trailing whitespace characters (following the rules in XML s. 3.3.3).  Check
whether "string" or "normalizedString" is appropriate: perhaps
"normalizedString" might be bad if newlines are changed to spaces.

Now for the particular datatypes. You have three choices, I think.
(I have not tried these: no flames please.)

The first is merely to make a regular expression such as
  ( '&#x20;' | 'I' | 'E' )
That is the preferred option.  If it doesn't work, try
  ( '\&#x20;' | 'I' | 'E' )
as a workaround. If it still doesn't work, you could settle for using
  ( '\s' | 'I' | 'E' )
which will allow any single whitespace.

The second choice is to make explicit types for "string containing single
space only", "string containing I only", etc. then defining a union.  I
would be surprised if current schema implementations have tested for this,
so good luck if you use it!

The third choice is to use a String, but then use enumerations. Again, I bet
this has not been tested by implementers.

Rick Jelliffe

The xml-dev list is sponsored by XML.org, an initiative of OASIS

The list archives are at http://lists.xml.org/archives/xml-dev/

To unsubscribe from this elist send a message with the single word
"unsubscribe" in the body to: xml-dev-request@lists.xml.org