OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Expert's advice needed about XML Schema and defining some

[ Lists Home | Date Index | Thread Index ]

On Fri, 2003-12-05 at 15:50, Robert Koberg wrote:
> Hi,
> 
> Michael Champion wrote:
<snip>
>  > I had the same reaction to the original post.  As useful as XML is for
>  > lots of things, one needs to guard against the temptation to see  nails
>  > that need pounding simply because one has a hammer.  XML per se has no
>  > notion of cross-document referential integrity nor does XPath/XSLT 1.x
>  > have the notion of a join.  Obviously these are two great strengths of
>  > the relational approach.
<snip>
> This is the second post I have seen that pooh-poohs the value of 
> id/idref, XML Schemas and xslt1.0 to manage a project's validity. I am 
> wondering if the context here is unique or if it is generally a bad 
> practice to use these types of things. I use them for a few reasons: 
> provide a UI for a user to manage their project and ensure validity for 
> our cms.

There is a general tendency to disparage features when they are
misapplied, or when we try to apply them without understanding them. I
am _not_ saying anyone in this thread has done this. Rather, it is a
general observation from my work experience. (Of course, me dissing
relational database systems at every opportunity is well thought through
and rational. Another matter entirely!)

I can't help stepping up on the soap box for a bit here (just skip this
if you don't enjoy rants):

<rant>
<![CDATA[
ID/IDREF is a very simple mechanism for creating cross-references in
documents, usually technical documents. For that purpose it works well.
However, ID/IDREF was never intended for use in a Web environment, nor
was it intended to create links between different documents. SGML used
other mechanisms for that purpose (HyTime). For describing relationships
between resources XML has XLink, and a couple of other recommendations.
(For some reason everyone seems intent on inventing their own linking
mechanism, duplicating work that has already been done. I don't quite
understand why.)

XML Schemas are intended to specify document structure. (It is arguable
whether W3C Schema does this well.) The intent is not to validate link
relationships, or entire projects.

XSLT was originally designed as a language for transforming XML
documents to XSL-FO. The idea of using XSLT as a general transformation
language was hit upon quite early in the design process though, so I do
not think XSLT suffers to much from the change in scope and purpose.
Still, XSLT _is_ a transformation language, not a general purpose
programming language. XSLT even specializes in the kinds of
transformations it does, handling some things exceedingly well, and
other things rather poorly.

There is absolutely nothing wrong with finding a new use for a tool. (I
once got a tool for removing the stems from strawberries as a gift. I
use it as a sugartong. Works very well.) However, it is (or should be) a
calculated risk. There is always a risk that the tool is not really
suited for the task. Knowing as much as possible about the tool, what it
was designed for, the circumstances under which it came to be, and what
alternative tools may be available, certainly helps.

The uses of W3C Schema, ID/IDREF and XSLT discussed in this thread are
not the uses these tools were originally intended for. Nothing wrong
with trying to find a new use for them, actually it is a very good
thing, but it is not the fault of the tools if it does not work.

Since I am in rant mode, I might also point out that XML itself is
designed for publishing content on the Web. Originally, it was not
designed for content creation. The original idea was more along the line
of creating content using SGML, and then transform to XML for
publication. This idea died very quickly, but there are still traces of
it left, for example in the idea of making SYSTEM identifiers in DOCTYPE
declarations required, and the nonexistent support for remapping SYSTEM
identifiers in some parsers (notably MSXML). To this day things like
these cause considerable problems when designing and building XML
compliant document production systems. (Which does not stop me from
doing it, or suffering for it.)

]]>
</rant>

> 
> Below is a simplified example of some things I do; could you (anyone) 
> comment on it? (tear it to shreds if you like; I have a thick skin)
> 
> This is a config XML that describes a brochure-type website (site.xml):
> <site ...>
>    <folder id="f123" index_page="a123" ...>
>      <page id="a123"  ...more system independent metadata...>
>        <region name="wideColumn">
>          <content ref="c123"/>
>        </region>
>      </page>
>    </folder>
>    <page id="a234" ...>
> ...
>    </page>
> </site>
> 
> This is a config XML that describes a kind of topic mapping or dmoz-type 
> website (topics.xml):
> <topics>
> ...
>   <topic id="t123" label="some_grouping">
>     <topic id="t234" label="some_sub_grouping">
>       <content id="c123" label="blah" ...more system independent 
> metadata.../>
>     </topic>
>   </topic>
> ...
> </topics>
> 
> When validating I bring in config files like so:
> 
> <config>
>    &site;
>    &topics;
> </config>

In other words, you have one document, the config file, and two well
formed fragment files. There is nothing wrong with this approach,
provided that you don't have so much data that the size of the
normalized file becomes a problem.


> 
> and here is a the content piece referenced/identified in the content 
> elements above (c123.xml):
> <article>
>    <p>blah blah <link page_idref="a234">blah</link>
> </article>

Here things go a bit weird, in my dochead opinion. If article is the
root element of c123.xml, then c123.xml and the site.xml file must both
be imported into the same file before validating the article.

It looks as if the system has very tight couplings where it shouldn't.
An article can't be validated on its own, and is tied to the web site
were it is published. I realize that for a brochure site that is
completely self contained, this may not matter much, but as a generic
design model, it does not work very well.

Again, XLink, or another link model with a similar purpose, would have
felt more natural here, and would have enabled a simpler, more flexible
design.

It is worth noting that in general it is a good thing to keep structural
validation and link validation separate. It is necessary to be able to
validate the structure of a document at the time it is written, at least
for anything even moderately complex. On the other hand, external
resources the document refers to may not be available when the document
is written. Indeed, they may not even exist, because a document may
refer to some other document that has not been written yet. In such
cases, and they are frequent, tying link validation to structure
validation would be a big mistake.

<snip>
> In addition and among other things, I ocassionally validate that the 
> content pieces referenced in the site.xml//page/region/content exists in 
> topics.xml/topics//content (automated).
> 
> site.xml/site//folder has the index_page attribute which is an xs:IDREF 
> while site.xml//page/@id is an xs:ID.
> 
> c123.xml//link/@page_idref is not defined in a schema as an xs:IDREF. 
> Rather, I use XSL to verify that the 'virtual page' (to be rendered as 
> HTML to the file system or sent back to a browser) that the content 
> piece attempts to link to actually exists in the site.xml.
> 
> If the above is understandable (:-o), am I following bad or good practices?

Yes it is understandable. It is not the way I would have designed it,
but it works and it isn't to complex.

What I have against the design is that it locks out standard tools and
techniques. How does an XML editor validate an article? For that matter,
how does an authoring application validate an IDREF link when the target
is in another document? How do you reuse content that is currently
published on this web site somewhere else, where the publishing
mechanism is entirely different? Again, these considerations may not be
relevant for everyone, but they certainly are to me and the systems I
work with.

Most of the differences in our respective approaches, is probably due to
the fact that we work with different things. I am mainly concerned with
content creation, and you (it seems to me) with publishing. Also, we
deal with different kinds of information, a brochure and the technical
documents I work with are very different. It is only natural if we have
different perspectives and come up with widely different solutions to
similar problems.

/Henrik







 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS