OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Personal reply to Edd Dumbill's XML Hack Article wrt W3C XML Schema



This is an interesting discussion, isn't it?

Eric van der Vlist:
> I think that a great asset of XML is to decouple the documents from the
> technology used to process them and I consider W3C XML Schema as a
> technology amongst other.
>

Simon St.Laurent:
> Even if the message arrives burdened with data schemata, content
> models and canonical semantics which the recipient has pledged to honor, in
the
> end he must instantiate the 'true' data, or semantics, or meaning of the
> message as whatever it is that he is specifically capable of using,
presumably
> by processing it to some locally meaningful outcome.

William Perry:
> The 'original value in the original document' is the character string
> '<value>45.67</value>'; after parsing it is the text '45.67'. There is no
real
> number there.

Matthew Gertner:
> In my view, information architectures based on XML
> will be driven by XML schemas (hence the bean example in my last post). A
> given schema tells you how to process a given class of instances, so you
> have to have a single schema for a given instance.

Here's an extreme position:

Why have any markup at all?  Just use asn.1 and PER encoding, for example.
This will be compact and unambiguous (we hope, at any rate!).  Make the asn.1
schema available over the web, use it to compile your data structures, and you
have Matthew's world taken to its ultimate.  Maybe we can even do compile the
structures dynamically.

Simon talked about the user needing to "instantiate the 'true data', or
semantics ...".  I suggest that working meaning of the term "semantics" (which
we see thrown around in these posts all the time)  is something like this:

"Semantics is that information or knowledge that allows you (you may be a
computer system) to know what to do to process the data correctly for your
job."

In this sense, your semantics may be my syntax.  XML markup lets a processor
know how to build a generic tree, or, SAX-like, send certain events.  So in a
way the markup functions as a kind of semantics for an xml document, at least
at the level of the lowest level of processing.  It appears that there are
layers to this semantics business.

I ask again, why are we using markup?    From the above viewpoint, the markup
moves a layer of semantics out of the processor and into the document or data
set (of course, doing so in a standard way).  This approach seems to fit in
with the way the Web has developed, it seems to me.  This leads me to ask,
does XML-schemas continue this pattern?

I have to say that the answer is not clear-cut, at least to me.  On the one
hand, schemas could be seen as moving another layer of semantic information
out of the processor and into - not the data document itself, but at least
into another declarative document.

On the other hand, xml-schema isn't moving that information into the data
document, either.  And from a semantic point of view, once we transform a
document using xslt, all those carefully crafted complex types may become
completely rearranged an a way that is no longer schema-valid by the original
achema.  The simple types may still be attached to certain elements, but the
complex types may be gone, gone, gone.  Where is your schema now?  How do you
know know how to process the transformed data?

There is an close analogy here to sql.  An sql query returns a set of rows.
Each row has the same set of components - the cells or individual values.
According to by C. J. Date, the design of the row constitutes a type.  Thus
the query has created a new type that may not be in the schema of the database
at all.

Rather than my drawing conclusions here, I invite everyone to pitch in with
respect to the two key questions I have been posing:

1) Why use markup at all?

2) Does xml-schemas move us farther along the path started by HTTP and HTML,
or not?  Why? And is that a GOOD THING?

Cheers,

Tom P