OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Does DTD validation work with namespaces?

[ Lists Home | Date Index | Thread Index ]
  • From: Norman Walsh <ndw@nwalsh.com>
  • To: xml-dev@lists.xml.org
  • Date: Wed, 09 Aug 2000 15:19:08 -0400

/ "Winchel 'Todd' Vincent, III" <winchel@mindspring.com> was heard to say:
| My understanding of what you wrote was that you were saying two different
| things (1) that if you are in control of all the DTDs (or Schema) and they
| all relate to each other, then mixing them is OK and (2) if you are not in
| control of the DTDs and they do not have explicit references to each other,
| then mixing them is not ok.
| 
| So, I agree with (1).  I disagree with (2).  I think Any and I agree on (2).

That's a slightly stronger statement than I intended to make. Let me
attempt to be clear.

My personal, default model of validation is very strict and draconian.
A document is valid if and only if every child of element 'A' is
listed in the content model of element 'A' and the children occur in a
number and sequence that is allowed by the content model in question.

So, in my model, you cannot have

  <foo:a><bar:b/></foo:a>

unless the content model of foo:a explicitly allows bar:b as its first
and only child. (Where "explicitly allows" may include DTD-style ANY
content models or XML Schema style "selective any" content models).

This is irrespective of whether or not you are in control of the
schemas that you're using.

This is *clearly* not the only validation model that one could imagine.
It is *equally clear* that this model doesn't work in all cases.

| <Norman>
| > If I follow, what you're saying is that there are contexts in which you'd
| > like to make random mixtures of markup. Embedding contract markup, for
| > example, in a web page.
| </Norman>
| 
| Yes, exactly!  :-)
| 
| A "webpage" might not necessarily be HTML, it might be Amazon.com mark-up
| language or anything else that needs contract mark-up but also needs its own
| specific, self-defined mark-up.  The premise/assumption is that
| legal-technologist are best qualified to create legal mark-up,
| accountant-technologies are best qualified to create financial mark-up,
| etc., etc.

Sure, but if you're going to allow any random mixture, I think you
leave open the possibility that the end result will be a mixture that
the three communities in question would interpret differently.

| <Norman>
| > I agree that that is perfectly reasonable, but I don't think that it
| > is "valid" in the traditional sense. Maybe we need some new definition
| > of valid for this case, "mixed-namespace-island validity" or
| > something.
| </Norman>
| 
| Today, word processor users routinely "cut and paste" text from one type of
| document into another.  I agree that you could call these meshed-together
| peices of text "namespace islands" . . . and, yes, what I'm after is to be
| able to validate those islands of elements -- not simply because it is
| important as a *technical* matter, but because I want to fix some meaning to
| those elements and I can only do that in a *technical*, *automated* way if I
| link the elements to a well-understood DTD/Schema.

But if you don't control the order and sequence of the mixtures, I
don't see how you have any semantics in the end. But that's just me.

The model of validation you describe seems to say "I trust the authors
not to do anything truely bizarre, but I can't trust them not to make
typos or leave out attribute values that are required". I understand
that model.

| I also want some way of ensure that the DTD/Schema does not change over time
| (or that there is proper version control) because if you change the
| DTD/Schema, you change the meaning of the elements (in a whole lot of
| documents).

Version control is a big issue.

| <Norman>
| > If you aren't suggesting that "html:" always mean
| > http://whateverthexhtmluriis/, then I'm not sure I follow.
| </Norman>
| 
| If I understand you correctly, yes, I am suggesting exactly that.

Then you don't really need the URI, right? We've agreed that the
prefix 'html:' will always have exactly one meaning. Going back to my
original argument, then, all you've done is added ':' to the list of
name characters and asserted that 'html:p' always means an HTML "p"
element. I just don't think that works in the general case.

I agree that URIs can be made unique because I am the only one with
the authority to establish URIs under http://nwalsh.com/. (Or with the
URN NID "ndw", if the IETF URN WG blesses my request.)

But if I propose 10, 100, or 1000 vocabularies that all contain a 'p'
element, we'll never get agreement about what the prefixes should be.
The prefixes have to vary.

| Version
| control makes it more difficult because HTLM 3.2 would need a different
| prefix than HTML 4.0.  (Or, probably a better, less verbose solution, html:
| would always relate to the HTML family of versions, but there would be some
| date associated with the namespace, so you could determine which HTML
| version related to the elements in the document at any given time.

I really don't see how what you're proposing is significantly different
than saying that

  html:p

is a shortcut for

  {http://thehtmluri, p}

And a useful shortcut, at that! :-)

| Ok, I understand this concern.  However, what this means is that the
| universe of documents and schema that you will use is very limited.  The
| world wide web is not so limited, so I don't believe your view is realistic
| or desirable.  I don't mean this as an attack, it is simply that what you
| suggest  is not an architecutre that, in my mind, works on a large scale.

I don't take it as an attack. The architecture(s) that exist on the
web will need different validation models. Next generation web
browsers will probably understand a mixture of several vocabularies
and they'll likely allow people to mix them together
arbitrarily. That's fine. But if you and I are doing electronic
commerce and you're not strictly validating all of the purchase orders
I send, I'm not going to feel very confident about the transactions.
Worse still, if you send me a bill and it doesn't validate, I'll never
do business with you again.

| I think Tom Passin is on the right track in his email from today, dated
| 8/9/00, time 9:17 a.m. (EST time).
| 
| <TomPassin>
| Presumably, it would mean "If this fragment had been embedded in a valid
| structure according to its own DTD, this fragment would not cause the whole
| structure to be invalid."
| 
| This sounds like a tall order for a processor to understand, and also a tall
| order to describe in a DTD.  It's funny, though, isn't it? All us humans
| know pretty well what it would mean:  e.g., if we put in an <html:h2>
| element, we want a processor to display an h2 heading at that point **as
| if** it were an html document.  It's the formal aspect that's tough.

I don't, in fact, agree that it's easy for humans to understand. Given:

  <bookinfo>
  <author><firstname>Norman</firstname>
    <html:h2>Is a Big Fat Idiot</html:h2>
    <surname>Walsh</surname>
  </author>
  </bookinfo>

I have no idea what that H2 means. It doesn't mean display an H2 heading,
it *can't* mean that because the author isn't displayed at all, it's
just metadata that's associated with the document. Now I have a document
with unintelligable metadata. I'm totally confused.

| We have non-validating parsers.  We have validating parsers.  Why not have
| namespace-aware validating parsers?

We'll surely have namespace-aware validating parsers. XML Schema will
support this.

| I do not, however, believe that I am talking about requirements that are
| specific only to me or to the legal industry (which is the industry in which
| I work).

I'm truely surprised that the legal industry would ever take anything less
than the strictest most draconian view of validity. If you make contracts
with mixtures of markup that don't validate, um, well, I'm not putting
my digital signature on them :-)

|  I think this is a great big issue that effects the architecture of
| the Internet and should be addressed by the W3C.  It is easy enough to
| simply say, oh, well, that's too hard or those aren't my requirements and
| then simply not do it.  The problem, of course, is that someone will say,
| oh, well, those are my requirements and, on balance, it is not that hard, so
| they will go out and do it.  If that happens, then we have fragmentation,
| not standards.  I'm very wedded to the idea of standards, so that's why I've
| been moaning and bitching on this list. :-)

With respect to namespace aware validation, what requirements do XML
Schemas fail to meet? (Yes, I'm aware that 1.0 doesn't include version
control or co-constraints, but please, small steps. Er, medium sized
steps, at least! :-)

| But, I still think you need reserved prefixes that map one-to-one with URI
| (as a matter of practice and policy).  After all, we all know, already,
| implicitly, what "html:" (versions notwithstanding) means . . . and what
| "xsl:" means . . . and what "rdf:" means . . . . it would be nice to know
| what "legal:" and "contract:" and "transcript:" etc. . . . mean as well.

I could not disagree more. I never, ever want to allow prefixes to have
meaning. I support the notion that the processor discards the prefixes
during parsing. They are irrelevant for machine understanding of the
documents.

                                        Be seeing you,
                                          norm

-- 
Norman.Walsh@East.Sun.COM | Temptation laughs at the fool who takes it
XML Technology Center     | seriously.--The Chofetz Chaim
Sun Microsystems, Inc.    | 




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS