OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Does DTD validation work with namespaces?

[ Lists Home | Date Index | Thread Index ]
  • From: "Winchel 'Todd' Vincent, III" <winchel@mindspring.com>
  • To: Norman Walsh <ndw@nwalsh.com>, xml-dev@lists.xml.org
  • Date: Wed, 09 Aug 2000 13:50:57 -0400

> / "Winchel 'Todd' Vincent, III" <winchel@mindspring.com> was heard to say:
> | Amy's vision is my own.  For example . . .
> |
> | Within a "Legal" set of namespaces, (Court Filing, Contract,
Transcript), if
> | the same legal industry consortium defines the namespaces, then they
> | *should* work together as Norman suggests.

> It seems that in some respects my vision and Amy's are different, so
> I'm not sure how to intepret what you've just said.

My understanding of what you wrote was that you were saying two different
things (1) that if you are in control of all the DTDs (or Schema) and they
all relate to each other, then mixing them is OK and (2) if you are not in
control of the DTDs and they do not have explicit references to each other,
then mixing them is not ok.

So, I agree with (1).  I disagree with (2).  I think Any and I agree on (2).

> | However, I agree with Amy and disagree with Norman in the following
> [...]
> | Contract DTD into their website mark-up, or an Amazon invoice, or
> | prospectus mark-up with an independently created DTD defined by some
> | organization, or anything else that the legal industry consortium does
> | know about (and did not consider when it created Contract DTD), then
> | would be perfectly *appropriate* and, indeed, exactly what the goal
> | be.

> If I follow, what you're saying is that there are contexts in which you'd
> like to make random mixtures of markup. Embedding contract markup, for
> example, in a web page.

Yes, exactly!  :-)

A "webpage" might not necessarily be HTML, it might be Amazon.com mark-up
language or anything else that needs contract mark-up but also needs its own
specific, self-defined mark-up.  The premise/assumption is that
legal-technologist are best qualified to create legal mark-up,
accountant-technologies are best qualified to create financial mark-up,
etc., etc.

> I agree that that is perfectly reasonable, but I don't think that it
> is "valid" in the traditional sense. Maybe we need some new definition
> of valid for this case, "mixed-namespace-island validity" or
> something.

Today, word processor users routinely "cut and paste" text from one type of
document into another.  I agree that you could call these meshed-together
peices of text "namespace islands" . . . and, yes, what I'm after is to be
able to validate those islands of elements -- not simply because it is
important as a *technical* matter, but because I want to fix some meaning to
those elements and I can only do that in a *technical*, *automated* way if I
link the elements to a well-understood DTD/Schema.

I also want some way of ensure that the DTD/Schema does not change over time
(or that there is proper version control) because if you change the
DTD/Schema, you change the meaning of the elements (in a whole lot of

> | Norman, you missed the point when you replied:
> |
> | / "Winchel 'Todd' Vincent, III" <winchel@mindspring.com> was heard to
> | | It seems to me that URIs would be the right answer if there were a
> | | one-to-one relationship between URI and namespace prefixes, rather
than a
> | | one-to-many relationship (i.e., unique prefixes via a fixed
> | with
> | | a URI).
> |
> | <Norman Date="2000.07.25" Subject="Re: Question About Namespaces and
> | If there was a one-to-one relationship, then the URIs would be
> | unnecessary, the prefixes would be enough. But any proposal based on
> | the notion of standard prefixes amounts to little more than adding
> | another name character to XML. And that's just not enough.
> | </Norman>
> |
> | You took the above paragraph out of context.  In the previous email, I
> | wrote:
> |
> | <ToddVincent Date="2000.07.25" Subject="Re: Question About Namespaces
> | DTDs">
> | I have been thinking that it would also be nice if there were some
> | requirement (perhaps an optional feature in parsers) that allowed one to
> | fetch the schema/DTD at the end of the namespace URI.  If there were at
> | least a moral responsibity on the owner to keep the schema/DTD at the
end of
> | the URI the same (or tell people when it changed) (or a perhaps a
> | responsibility on the user to hash/sign it, so you know if it has
> | then namespaces would be tied to a vocabulary that gave them meaning
> | and in the future), rather than simply being a means of avoiding
> | element collision.  This is what I thought "semantic web" meant and was
> | disappointed to find it didn't.
> | </ToddVincent>
> |
> | My understanding is that URI's can't be parsed, but prefixes can.
> | don't have a well-defined means of pointing to web resources.  So,
> | and URIs are very different.  A one-to-one mapping of them makes a lot
> | sense if you want to use the prefixes on elements you want to parse and
> | use the associated URI to point to a DTD or Schema that gives the
> | some meaning.  (Please correct me if I'm making any technical mistakes
> | here.)

> If you aren't suggesting that "html:" always mean
> http://whateverthexhtmluriis/, then I'm not sure I follow.

If I understand you correctly, yes, I am suggesting exactly that.  Version
control makes it more difficult because HTLM 3.2 would need a different
prefix than HTML 4.0.  (Or, probably a better, less verbose solution, html:
would always relate to the HTML family of versions, but there would be some
date associated with the namespace, so you could determine which HTML
version related to the elements in the document at any given time.

> | If well-formed Contract XML (defined by X consortium), embedded within
> | Invoice XML (defined by Y consortium) points to a standard, fixed,
> | well-known DTD or Schema, which in turn references a specification (or
> | another Schema perhaps) that explains it, then I think Norman's
> | problem begins to go away and we start to have a "semantic web."

> I understand that there are several possible ways to interpret
> validity.  For my own, personal use, I'd only be happy with the above
> scenario if there is a schema Z that references X and Y so that it
> explicitly indicates in what content models the various mixtures are
> allowed.
> But I am by no means saying that I think my understanding of validity
> is (or should be) universal.

Ok, I understand this concern.  However, what this means is that the
universe of documents and schema that you will use is very limited.  The
world wide web is not so limited, so I don't believe your view is realistic
or desirable.  I don't mean this as an attack, it is simply that what you
suggest  is not an architecutre that, in my mind, works on a large scale.

> | There are really two different, but related issues here.  (1) Namespaces
> | don't work as a technical matter if you want to validate DTDs
> They work OK as long as you are in a closed system and can always use
> a fixed set of prefixes *and* you have my draconian view of validity.

Ok, I can agree on this . . . but, again, go back to the statement above . .
. the Internet is a big place.

> Extending DTDs to work outside this environment, by means of a PI for
> example that mapped prefixes to URIs in the DTD, would make a parser
> that accepted documents that were not XML 1.0 valid but were XML
> 1.0+Namespaces valid. It'd be an interesting exercise.
> If you want a more liberal view of validity, you have to use ANY
> content models. I really don't think you can get the "these elements
> from my schema, or any element from these three other namespaces, but
> nothing else" semantics from DTDs.

I think Tom Passin is on the right track in his email from today, dated
8/9/00, time 9:17 a.m. (EST time).



Presumably, it would mean "If this fragment had been embedded in a valid
structure according to its own DTD, this fragment would not cause the whole
structure to be invalid."

This sounds like a tall order for a processor to understand, and also a tall
order to describe in a DTD.  It's funny, though, isn't it? All us humans
know pretty well what it would mean:  e.g., if we put in an <html:h2>
element, we want a processor to display an h2 heading at that point **as
if** it were an html document.  It's the formal aspect that's tough.

Maybe this is what should happen when a foreign xml structure is included,
if you are validating.  The processor locates the DTD of the inclusion, but
builds its syntax starting, not at the start of the DTD, but at the included
elements(s).  Thus, larger contexts would be thrown away, and only those
declarations in-scope for the elements of interest would be retained in the
final syntax.  While we're doing this, we might as well have the DTD
processor implicitly add on the new prefix(s).

This approach would not have to change the specification for DTD syntax.  It
would change the specification of DTD processing.  However, you still
wouldn't be able to mix and match at will, since an xxx:p element wouldn't
necessarily be able to contain an svg element, for example (because this
woudn't have been allowed under the original xxx DTD).

We have non-validating parsers.  We have validating parsers.  Why not have
namespace-aware validating parsers?

Note, I don't build parsers myself, although it is on my list of things to
do.  I just read about all this stuff and play with other people's products
and parsers.  So, I don't know how hard or realistic this is.

I do not, however, believe that I am talking about requirements that are
specific only to me or to the legal industry (which is the industry in which
I work).  I think this is a great big issue that effects the architecture of
the Internet and should be addressed by the W3C.  It is easy enough to
simply say, oh, well, that's too hard or those aren't my requirements and
then simply not do it.  The problem, of course, is that someone will say,
oh, well, those are my requirements and, on balance, it is not that hard, so
they will go out and do it.  If that happens, then we have fragmentation,
not standards.  I'm very wedded to the idea of standards, so that's why I've
been moaning and bitching on this list. :-)

> | (2) Namespaces
> | don't work very well conceptually, even if they work technically,
> | you use DTDs or Schemas, because there is no requirement or even
> | of uniqueness (unless you work within your own closed little world), so
> | you start mixing and matching elements from different schemas that you
> | know about, you get a jumbled mess of elements that may *technically*
> | but has no meaning (or at least a meaning that is very uncertain and
that is
> | subject to change at anytime).

> Uniqueness is garaunteed by URIs. You don't get prefix uniqueness but
> you don't need that if you have the mapping to URIs.


But, I still think you need reserved prefixes that map one-to-one with URI
(as a matter of practice and policy).  After all, we all know, already,
implicitly, what "html:" (versions notwithstanding) means . . . and what
"xsl:" means . . . and what "rdf:" means . . . . it would be nice to know
what "legal:" and "contract:" and "transcript:" etc. . . . mean as well.

<RonBourret Date="2000.07.25" Subject="Re: Question About Namespaces and
>A further thought . . . to be manageable, it seems to me that this would
>require a registry of prefixes for the particular industry.   Assuming a
>registry were possible, would namespaces and DTDs mix?

Yes, although the registry would need to be global, not industry specific,
since you can never guarantee that soap manufacturers might suddenly take an
unexpected interest in the breakfast cereal market.

Ron Bourret's assessment makes perfect sense to me.  However, I have
absolutely no control or influence over what the rest of the world does or
what the W3C does.  I have some influence as to what a relatively small
group of people are trying to do for XML in the legal industry.  Again, the
question is one of standards or the potential for fragmentation.  People
will do non-standard things if there are not standard solutions to
real-world problems. I would like to see *standard* solutions.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS