OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: xml spec 1.0 validity constraint for ID/IDREF

[ Lists Home | Date Index | Thread Index ]
  • From: Peter Murray-Rust <peter@ursus.demon.co.uk>
  • To: "gopi" <gopi@aztecsoft.com>, "Xml-Dev" <xml-dev@xml.org>
  • Date: Wed, 29 Mar 2000 14:29:46 +0100

Gopi,
	I understand your problem, and am happy to try to answer it. I hope I
don't incur the wrath of XML/SGML gurus. It is a perfectly reasonable one
to raise. I hope that the question and answers will be useful for more than
the two  of us...

At 05:27 PM 3/29/00 +0530, gopi wrote:
>Hi all,
>	I had posted this question around 6 months back in 1 or 2 mailing
>ists( like microsoft.public.xml,  but not on xml dev list), but I didn't got
>even one proper reply also.  Hope to get answer for this mystery at least
>here (Don't laugh @ me if it is very stupid 1) :-)

It is not very stupid, and anyway I would deprecate anyone laughing at
another person's ignorance. Ignorance is not a crime on XML-DEV.

[... XML Spec snipped ...]
>	So, if I write an xml document as below,
>
>	<?xml version="1.0"?>
>	<!DOCTYPE root [
>		<!ELEMENT root (#PCDATA)>
>		<!ATTLIST root
>				att1 ID #REQUIRED
>				att2 CDATA #IMPLIED>
>	]>
>	<root att1="10" att2="234"/>
>
>		It is not valid according to specification!!!!.

This statement is correct.

> Because, att1 value has
>to start with a letter, not a digit!!!.  Why is this so?
>	If you look at the line, "A name must not appear more than once in an XML
>document as a value of this type; i.e., ID values must uniquely identify the
>elements which bear them".  If I am not wrong, it only wants to make sure
>that the ID value should be unique in the xml document.  But, why should
>there be restriction like it "has" to start from letter or underscore, why
>can't it be one of 0-9 ?

The "metareason" is that that is how it is in SGML. Unlike you, I have had
the experience of living through the XML specification and there were a
number of things that were required to be compatible with SGML. Otherwise
the SGML community would not have used XML, and more importantly the very
dedicated group of SGML experts would not have given their time (very large
amounts) to XML and so it wouldn't exist. *Why* SGML decided on that I have
no idea. 


<aside>
It is illegal in SGML to define enumerated attributes which share a value.
Thus:
<!ELEMENT FOO (#PCDATA)>
<!ATTLIST FOO bar (yes/no) #IMPLIED>
<!ATTLIST FOO plugh (yes/no) #IMPLIED>

is an error in SGML. Why? I believe because documents can be minimised to:
<FOO yes>
(no =, no quotes, no attribute name) and this was to save typing. So maybe
the NAME restrictions stem from a similar reason.
</aside>


>		If I change the value to
>			<root att1="a10" att2="234"/> , it works fine!!!!   It looks stupid for
>me.  XML is not allowing to "represent" my original data as it is, it
>expects me to do some manipulations with my data :-(.

Yes, but att2 is not an SGML ID.

>
>	I will just take two examples where this can crib lot of data.
>	example 1:
>		Suppose I have data in a database table and want to convert it into XML
>document and convert the table schema to equivalent DTD.
>Then, the primary key column (assuming there is only column as primary key)
>in table will be equal to ID in DTD.  If my primary key in database proves
>to be integer or float , what should I do? I can't make corresponding
>attribute as ID or I should prefix every value for that column with some
>character 'x'.  Why should I do this?

The answer is that SGML/XML never intended that the ID was mapped onto
database keys. It is assumed that the "application" has some means of
making SGML IDs unique within a document. All that XML V1.0 guarantees is
that if you try to validate an XML document which contains attributes of
type ID which have non-unique IDs it is entitled to throw an error of some
sort.

>		ID should only make sure that value is "unique", but shouldn't put any
>restriction on the "value" of that corresponding attribute.  The one who
>writes the xml document has to make sure that, he is writing unique values
>to that attribute.  Whether he writes an integer, float or string shouldn't
>really be the constraint.  The XML processor "should" not try to put this
>restriction that you got to start the value with letter or underscore or ':'

This is the law. You have to obey.

It is often not realised that XML V1.0 is a specification of document
syntax - NOT a set of instructions on how that document should be processed
(other than by a parser). It is because the spec is so insistent on not
giving any guidance on implementation that I set up XML-DEV and shouted for
things like SAX. Unique IDs are another things that we really need some
common tools for. I would like to have an editor (yes, an editor) which
among other things filled in unique IDs for me. I shall keep shouting.
>.
>
>	example2:  (extending example 1 scenario)
>	If at all I am getting some data (not in XML form) from other application
>and I have to generate equivalent XML format for it (XMLising some other
>data).  If the DTD for that XML has some attribute of type ID, then while
>creating the XML doc, I should check every time that the value is not

>starting with 0-9  [probably it has list of few more characters].  If so,
>then I should add some prefix character (say 'x') and generate it.  The
>problem will not end there, the one who receives this xml document will have
>problem, when he receives the XML document and attribute of type ID has
>value starting with 'x', he has to remove that.  Again, problem won't end
>there:-) if the original data itself had started with character 'x', then my
>application (which generates XML doc), should do "character stuffing"
>(adding 'xx') and other receiving application has to do "de-stuffing".
>Hoof, this is not as easy as I thought.

No, it isn't, is it! There is still a lot of generic tool development to be
done.

>  Then I would prefer to throw away
>XML as intermediate format itself and go away with my original data format
>itself :-)

My suggestion - FWIW is that you use a separate attribute or PCDATA for
your foreign key, and write your "application" to take care of uniqueness.
It is a good example of the fact that XML is not a magic bullet which
solves all your problems. But it does help to identify them and XML-DEV
helps to encourage people to solve them.

>
>	This is my opinion, would anyone give me convincing answer for this?
>		OR
>	Can I say this is a "BUG" in xml specification 1.0?

You might say that if the current XML/W3C/XML-DEV community were developing
XML *now*, they might tackle IDs and NAMEs differently. But IDs were/are
used in SGML and presumably are not seen as bugs.

	P.


***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS