Lists Home |
Date Index |
> Ronald Bourret wrote:
> While I don't like XSchema particularly well, this argument
> is a bit flawed. XSchema is, among other things, supposed
> to play a major role in XML-DBs and mappings of XML to/from
> other DBs. To the DB people, differences between short, int
> and wider types sometimes matter, not because of storage but
> because of I/O and disk R/W capacity. There are life databases
> where this makes a significant difference in performance.
> Whether it was really a good idea to force this on unsuspecting
> users who wouldn't even think of touching a DB is another
It's really a trade-off between interoperability and functionality. The
more specific your data types are, the better your uses of those data
types will be. For example, if you're generating Java classes from your
schema and you really want a certain property to be an int, you'll do
better having an int data type than a genericAnySizeInteger data type,
which you will have no choice but to map to a BigDecimal.
All of this unfortunately assumes that the data type is representable in
your local language, which is where interoperability comes into play.
Here's a cautionary tale from my own product (XML-DBMS), which transfers
data between XML documents and relational databases.
I store the order of child elements in their parent in a column in the
database. The software that generates a mapping between an XML document
(a DTD, really) and the database assumed that this column could be of
type INTEGER, which I assumed would (a) be sufficient for the purpose
and (b) be supported by all databases.
Turns out that Oracle doesn't support INTEGER. Instead, it maps an
INTEGER in a CREATE TABLE statement to a NUMERIC(5). The JDBC driver
returns this as a BigDecimal. Boom. All my code that assumed I would be
dealing with (nice, portable) ints had casting errors.
Although this example doesn't specifically address data types in XML,
the moral is the same: The more specific your data types are, the more
interoperability problems you will have.
Contrast this to the strategy XML-DBMS uses when actually transferring
data. The user sets up a mapping (or modifies the generated one) between
the XML document and the database. One part of this mapping is mapping
PCDATA-only elements to columns. The software converts the PCDATA value
based on the destination's (column's) data type, not the type specified
in a PSVI.
The advantage of this is that you and I can both store data from the
same XML document, but use two different data types to do it, depending
on the capabilities of the system each of us is using. That is, we have
I think what this points out is a fundamental difference in how people
think data types should be used with XML. Many people want complete
control over their data, saying you will use an int (short, long, etc.)
to represent this piece of data.
I think this flies in the face of the vision of XML as data-on-the-Web,
which shouldn't care who uses the data any more than Web sites today
care about who reads, indexes, caches, etc. their Web pages.
XML data types are useful for explaining some things about a given piece
of data -- sort sequence, valid operators, etc. -- but should have no
more to say about how that data is actually used by an application than
an element type or attribute name is. (If I have an element named
Quantity, am I forced to manipulate its value in a variable named
Quantity?) If you want to tell me more about the legal values of a piece
of data (range, set of legal values, etc.) that's fine, but this is
mostly separate from the data type.
Whew, sorry about the rant. And I do think you have a valid point. I
guess I'd just prefer to keep my XML as interoperable as possible, even
if it means trading away functionality in some cases. I figure that
applications can regain most of that functionality by choosing how they
want to view a given XML document.