[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Schemas and Semantics (was: A Personal Reply...)
- From: Matthew Gertner <matthew.gertner@schemantix.com>
- To: "'xml-dev@lists.xml.org'" <xml-dev@lists.xml.org>
- Date: Tue, 13 Mar 2001 16:38:41 +0100
1) Why do we need markup?
I'm coming at this from a data-centric world where XML is used to serialize
object structures for communication purposes. So we definitely need some
sufficiently robust mechanism for representing these structures. I don't
think that the case for XML is too hard to make, in this case. We can debate
the technical merits/demerits ad nauseum, but the bottom line is that it has
gained amazing traction in the market, has given rise to a healthy
aftermarket of tools, many of them free, and is based on years of experience
from the SGML and HTML worlds that has enabled XML to avoid many (alas, not
all) of the pitfalls that might befall a competing approach. In the end I
can only answer with another question: why not use markup?
2) Why do we need schemas?
Here I guess I have a more controversial view. When an XML document is
authored, the author definitely has a set of baseline semantics in mind.
When she writes "<age>45</age>", she knows what the value is meant to
represent. Anything we can do to capture these inherent semantics in an
automatically processable form is extremely valuable. If I know that the
datatype of this field is a positive integer with a maximum value of 150, I
can validate it, I can generate cool UIs for it (e.g. a slider), etc. If I
know that it is a number of years from date of birth until today, I can
calculate the DOB automatically. And so on.
Two important subpoints:
a) I still don't believe that there is even one noncontrived example where
an element in an instance truly has ambiguous semantics. The examples I have
seen have mainly been along the lines of "I can treat age as a boolean, so
it might be a boolean for some applications". Well sorry, but if I know the
true underlying semantics of the element type then this is very useful
information for transformations and any or all of these semantics can be
ignored by a given application that doesn't need them. But to argue that
these semantics are not there at all seems bizarre.
b) Definitely the underlying schema model should be extremely generic and
independent of any concrete syntax. The lack of such a model in XSD is very
disturbing. Perhaps this is the crux of people's objections to putting vital
semantics into schemas. Equally disturbing is the lack of a decoupled
extensibility mechanism like Tibco's Schema Adjunct Framework. Sure we have
"appinfo", but embedding this metadata directly in the schema is another big
problem because the schema has to be modified for each application that uses
its own types of semantics. The underlying semantics will always be the
same, but a database application clearly needs different metadata from a
Bean generator. The fact there isn't a built-in mechanism for associating
multiple files containing schema-level metadata with a schema is a big
mistake responsible for much of the bloat in the spec. Would this kind of
approach alleviate people's objections to using schemas as the underlying
model for XML-based software architectures?
Matt