OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   xml spec 1.0 validity constraint for ID/IDREF

[ Lists Home | Date Index | Thread Index ]
  • From: "gopi" <gopi@aztecsoft.com>
  • To: "Xml-Dev" <xml-dev@xml.org>
  • Date: Wed, 29 Mar 2000 17:27:15 +0530

Hi all,
	I had posted this question around 6 months back in 1 or 2 mailing
ists( like microsoft.public.xml,  but not on xml dev list), but I didn't got
even one proper reply also.  Hope to get answer for this mystery at least
here (Don't laugh @ me if it is very stupid 1) :-)
	Here starts the mystery:

	According to XML 1.0 section [56],

	TokenizedType    ::=   'ID' | 'IDREF' | 'ENTITY' |    .........
	Validity Constraint: ID
	          Values of type ID must match the "Name" production. A name must
not appear more than once in an XML document as a value of this type;
		i.e., ID  values must uniquely identify the elements which bear them.
	The grammar for "Name" says it should start with "letter"

           [5] Name ::= (Letter | '_' | ':') (NameChar)*

	So, if I write an xml document as below,

	<?xml version="1.0"?>
	<!DOCTYPE root [
		<!ELEMENT root (#PCDATA)>
		<!ATTLIST root
				att1 ID #REQUIRED
				att2 CDATA #IMPLIED>
	]>
	<root att1="10" att2="234"/>

		It is not valid according to specification!!!!.  Because, att1 value has
to start with a letter, not a digit!!!.  Why is this so?
	If you look at the line, "A name must not appear more than once in an XML
document as a value of this type; i.e., ID values must uniquely identify the
elements which bear them".  If I am not wrong, it only wants to make sure
that the ID value should be unique in the xml document.  But, why should
there be restriction like it "has" to start from letter or underscore, why
can't it be one of 0-9 ?
		If I change the value to
			<root att1="a10" att2="234"/> , it works fine!!!!   It looks stupid for
me.  XML is not allowing to "represent" my original data as it is, it
expects me to do some manipulations with my data :-(.

	I will just take two examples where this can crib lot of data.
	example 1:
		Suppose I have data in a database table and want to convert it into XML
document and convert the table schema to equivalent DTD.
Then, the primary key column (assuming there is only column as primary key)
in table will be equal to ID in DTD.  If my primary key in database proves
to be integer or float , what should I do? I can't make corresponding
attribute as ID or I should prefix every value for that column with some
character 'x'.  Why should I do this?
		ID should only make sure that value is "unique", but shouldn't put any
restriction on the "value" of that corresponding attribute.  The one who
writes the xml document has to make sure that, he is writing unique values
to that attribute.  Whether he writes an integer, float or string shouldn't
really be the constraint.  The XML processor "should" not try to put this
restriction that you got to start the value with letter or underscore or ':'
.

	example2:  (extending example 1 scenario)
	If at all I am getting some data (not in XML form) from other application
and I have to generate equivalent XML format for it (XMLising some other
data).  If the DTD for that XML has some attribute of type ID, then while
creating the XML doc, I should check every time that the value is not
starting with 0-9  [probably it has list of few more characters].  If so,
then I should add some prefix character (say 'x') and generate it.  The
problem will not end there, the one who receives this xml document will have
problem, when he receives the XML document and attribute of type ID has
value starting with 'x', he has to remove that.  Again, problem won't end
there:-) if the original data itself had started with character 'x', then my
application (which generates XML doc), should do "character stuffing"
(adding 'xx') and other receiving application has to do "de-stuffing".
Hoof, this is not as easy as I thought.  Then I would prefer to throw away
XML as intermediate format itself and go away with my original data format
itself :-)

	One more thing, I have not read xml schema spec in detail, can anyone tell
me whether this case is taken care there?
		Does xml schema also puts restriction that if attribute is of type ID , it
cannot be of data type "integer" or "float" something like that ?
	This restriction is almost like "putting restriction on data type of the
attribute". Btw, how about in DT4DTD ???
	All XML processors, blindly follow the xml spec 1.0 and every processor
gives error for my XML document :-(.

	This is my opinion, would anyone give me convincing answer for this?
		OR
	Can I say this is a "BUG" in xml specification 1.0?
Note : Please send a CC to my email id also.  XML DEV-LIST is very slow.  I
get the message 5 or 6 hrs late after it has been posted on the list :-(

regards,
Gopinath M.R.
Software Engineer,
Aztec software and technology services (P) Ltd (www.aztec.soft.net )
Bangalore -560078
Ph : 91-080-5522892/93, 5532036,
5533725, 5533649
email : gopi@aztecsoft.com
gopinathmr@bigfoot.com


***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS