xml-dev - Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9

Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
From: Arjun Ray <aray@nyct.net>
Date: Fri, 14 Jun 2002 03:47:15 +0000
In-reply-to: <200206132132.RAA21312@mail.reutershealth.com>
References: <hb0iguks9ppbm18h4bkdf0d1e7pibqd2p8@4ax.com> <200206132132.RAA21312@mail.reutershealth.com>

John Cowan <jcowan@reutershealth.com> wrote:
| Arjun Ray scripsit:

|> Is this definitional, or a means to specify a list of user defined names?

| Definitional.  LIST(integer) would mean that the content/value is a
| whitespace-separated sequence of integers.  It generalizes the ad hoc -S
| ending on the built-in datatypes.

IMHO, expanding the set of declared values (which, pardon my insistence on
a point that gets lost repeatedly in such discussions, are lexical types,
not data types) is a poor idea, so the ad hoc -S is of little consequence.
The basic distinction between CDATA and the rest is "name-ness", calling
for tokenization as a further treatment of the normalized attribute value
literal.  The relevant subsidiary issues are single vs multiple values
(addressed by your LIST() construct) and referential constraint (as with
IDREF, ENTITY and NOTATION).  

Putting this together, we can have a streamlined taxonomy of declared
values according to the following criteria:

  (1) Token or not
  (2) If token, single or multiple
  (3) If token, referential or not
  (4) In any case, subject to data content notation or not

| This could be migrated from ATTLIST and ELEMENT to NOTATION: we could 
| have something like
| 
| <!NOTATION integers #LIST integer>

This would be radically changing the purpose of notation declarations,
which is associating a local document specific name with an external
identifier, the latter being a formal means of "stating" a concept.  
  
|> I'd prefer a whitespace-separated list of tokens within parens.  In fact
|> I'd like this for all name group and nametoken group usages, instead of 
|> an infix separator.
| 
| What is the precedent here?

None.  The justification is simpler grammar: it's equivalent to Lisp-ish
prefix syntax where the semantic of the symbol that would occur in the
operator position being already known makes the symbol ('|') redundant.
With infix separators, would you care to deal with weirdnesses like this
(possible in SGML)?

  <!ENTITY % foo  ' A | B | C' >
  <!ENTITY % bar  ', D, E' >
  <!ENTITY % quux  '%foo; %bar;' >

  <!ATTLIST (%quux)  blort CDATA #IMPLIED>

IOW, I would reserve the infix separators ('|', ',' and '&') for model
groups only, where they are indeed convenient.  We *are* whiteboarding
syntax, no?

| Remember that we are dealing with external validation here: we don't
| want to check whether an incoming document is self-consistent, but 
| rather whether it's consistent with some schema we already have in hand.

Yes, but there's more to it.  Does/will the schema purport to describe
"everything" in the document, or just some determinate part of it? 


| Without some way to notate the root, an XHTML DTD would accept the 
| document:
| 
| <p>foo</p>

And what is wrong with that?  Perhaps the real problem is in the usage
"XHTML DTD", a form of persuasive definition that only <xhtml> can be the
"root" with such a set of declarations?  

| The issue is whether this specification should be inside the DTD or
| only provided directly to the validator, making it look like "Validate
| document X against DTD Y, requiring that the root element be Z."

Clearly the latter.  There is no harm in a document *claiming* validity
with respect to some DTD, but since we're in a regime where validation is
always an optional value-added activity, a validation API would clearly be
incomplete if it didn't offer this kind of external control/invocation.
Note that the ArchUse PI also covers this (the doc-elem-form attribute),
so let's not reinvent wheels.

|> Validation with respect to a fixed set of declarations is a separate 
|> exercise (as in ArchForms).  The issue would be how to declare that 
|> fixed set.
| 
| That however is what I am discussing here.  

Not sure what "that" refers to.  I think Eliot explained the situation
well:

http://groups.google.com/groups?as_umsgid=34E9CBC9.401B6BB0@isogen.com 

| In external validation, you do not want the document to "declare" the 
| set of declarations it is to be validated against:

I think sometimes you do.

 http://lists.w3.org/Archives/Public/www-html/2000Jan/0217.html

| rather, it is the application that declares it.

I'd say the application invokes it, through some interface that allows
specification of the relevant declarations (and I suppose, the root).  But
I don't see how this is relevant to DTD syntax.

Follow-Ups:
- Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
  - From: John Cowan <jcowan@reutershealth.com>

References:
- Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
  - From: Arjun Ray <aray@nyct.net>
- Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
  - From: John Cowan <jcowan@reutershealth.com>

Prev by Date: Re: [dsdl-comment] DSDL: use cases: namespace declaration notation
Next by Date: Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
Previous by thread: Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
Next by thread: Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
Index(es):
- Date
- Thread