xml-dev - Re: [xml-dev] The privilege of XML parsing

Re: [xml-dev] The privilege of XML parsing - Data types,binary XML

[ Lists Home | Date Index | Thread Index ]

To: "Roger L. Costello" <costello@mitre.org>, xml-dev@lists.xml.org
Subject: Re: [xml-dev] The privilege of XML parsing - Data types,binary XML
From: Alaric Snell <alaric@alaric-snell.com>
Date: Thu, 12 Dec 2002 20:16:44 +0000
In-reply-to: <3DF8E5A2.FA73A28E@mitre.org>
References: <5.1.0.14.0.20021209184210.02e7e7e0@mail.propylon.com> <15863.17382.401191.380719@megginson.com> <3DF8E5A2.FA73A28E@mitre.org>

On Thursday 12 December 2002 7:38 pm, Roger L. Costello wrote:

> (2)  Data models of the raw XML string.  There are many ways to model the
> XML. Some sample data models are:
>
> An XML Schema Data Model of the aircraft:
>
> - the  XML Schema data model
>       . declares that the aircraft element is comprised of an altitude
> element. . The altitude element in comprised of an integer that is
> restricted to the range 0-20000.
>
> An RDF Schema Data Model of the aircraft:
>
> - the RDF Schema data model
>       . aliases aircraft and plane,
>       . states that aircraft is a subclass of "FlyingMachine", and
>       . constrains altitude to an integer that is restricted to the range
> 0-20000

I may just be arguing terminology, but I would say that the data model here 
is:

AN AIRCRAFT HAS AN ALTITUDE

...and the RDF/XSD are just ways of writing that rather than different 
models. While the XML is a bit of data that happens to fit that model.

> Another Example: Consider a library.  Each book in the library represents
> "data". The card catalogue "models" all the data (books) in the library. 
> The card catalogue provides data about each book in the library.  Thus, the
> card catalogue "data model" provides metadata.

I don't think that's a good example... the card catalogue is just an index. 
The data model is something along the lines of "Books may have titles, Dewey 
decimal numbers, and ISBNs" - that being a loose data model since some books 
don't have these attributes and there are many other attributes a book may 
have. The card catalogue may index this information, and in that sense the 
design of the catalogue depends on the data model of the book, but that's 
nothing to do with the data model per se. But again I may just be arguing 
fine points of terminology here, I'm still unsure as to whether we are 
discussing the same thing or not...

More formally specified, I would say that my data model for books would be 
that they can be modelled as a series of key:value pairs. Some standard keys 
that may be present, with the required constraints upon their values, are:

Title - Unicode string
Cover - bitmapped image
Back - bitmapped image
Spine - bitmapped image
Content - sequence of bitmapped images (one per page)
Authors - sequence of Unicode strings
...publisher, copyright details, etc...
...physical construction method...
...Dates of publication, date this particular book was printed...
...catalogue of physical blemishes to this copy...
ISBN - sequence of digits (with length constraint)
DeweyNumber - sequence of natural numbers seperated by '.'

That needs some normalisation, there may be more than one copy of the one 
book.

Anyway. That's what I would call a data model. I think a data model is an 
abstract concept which can be realised in lots of different ways. DOM is a 
data model, as is an Infoset or a PSVI; they are data models of trees, and as 
such a bit more abstract.

> b. Data models may be usefully utilized by applications.
>
> Example.  A Purchase Order schema (data model) specifies the valid format
> for a PO.  An application may use the Purchase Order schema to validate a
> PO XML string.

Yep; and display it neatly, know what values are equivelant, know how to 
normalise values, etc.

> Example of an XML string  that is explicitly bound to a data model:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <aircraft xmlns="http://www.FAA.org";
>                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>                xsi:schemaLocation=
>                           "http://www.FAA.org
>                            aircraft.xsd">>
>     <altitude>12000</altitude>
> </aircraft>
>
> Here the XML string has been explicitly bound to an XML Schema data model -
> aircraft.xsd.

Typing is just part of data modelling... in my example data model for books 
above, I defined types for various attributes, but an important thing is that 
the attributes are human readable names; the ISBN of a book is a numeric 
string, sure, but the computer does not know that it's incorrect if I put the 
wrong ISBN in - it will match the types, but it does violate my data model if 
we use the ISBN field to count the number of times the book has been taken 
out of the library. I would not define this as a data model:

"More formally specified, I would say that my data model for books would be 
that they can be modelled as a series of key:value pairs, where the keys are 
integers. Some standard keys that may be present, with the required 
constraints upon their values, are:

0 - Unicode string
1 - bitmapped image
2 - bitmapped image
3 - bitmapped image"

...I'd call that a set of type rules. Now, to the computer, the XSD is just 
typing rules - it's the comments in the XSD (or, more likely, in a document 
that is not referenced by the XSD but is read by developers before they start 
using the XSD) and the meaningful element/attribute names that make the data 
model.

> e. Over time the data model may change.
>
> Example.  Consider the library example above.  Over time the card catalogue
> (i.e., the data model) may change:
>                      - it may be changed from paper form to electronic
> form, or - the data provided for each book may change.  For example, we may
>                        change the filing system from Dewey Decimal to
> Library of Congress
>
> LESSON LEARNED: data models are "disposable".

See, now there I disagree. For a start the catalogue isn't a data model, it 
just depends on it to some degree; my book data model above could be extended 
in a later revision to add LoC codes, but the books would still have Dewey 
codes even if they didn't get put in the catalogue.

> f. Note that when a data model is "applied" to the parse tree from (1) then
> the parse tree is "decorated" with things such as:
>
> - the aircraft node is decorated with an alias "plane"
>
> - the aircraft node is decorated with a "subclass of FlyingMachine"
>
> - the altitude node is decorated with datatype="integer" restriction (0,
> 20000)

I disagree. I think that if a system is given a mapping from some XML 
vocabulary to a data model and a mapping from a Java class to that data 
model, then it can map from instances of that XML to instances of that class.

Mappings for my book data model might be that we have a <book> element 
containing <attribute name="...key...">...value...</attribute> elements, and 
my Java class being java.util.Map. Note that there other vocabularies in XML 
and classes in Java that fulfill the same data model in different ways - 
that's an important difference between my notion and your notion, I think.

In my notion the data model outlives the XML since that book information can 
be mapped into other forms without losing the data model. Indeed, in the 
aircraft model, the data model is implemented by an actual aircraft - you can 
measure its distance from the ground - as well as by the XML document so the 
data model definitely lives beyond the XML.

> (3) Applications process the XML.  Applications should be free to use just
> the undecorated parse tree (that is, the parse tree resulting from parsing
> the XML string that has no association with a data model). Several examples
> were presented to demonstrate the value of an application utilizing the
> undecorated parse tree. Alternatively, applications may choose to use the
> decorated parse tree (that is, the parse tree that results from "applying"
> a data model).

XSLT works on the XML rather than the data model; it's a tool that is 
specific to one syntax - XML - but independent of application data models, it 
just works with a DOM/Infoset-like tree data model which is low level.

An aircraft's user interface works on the data model, but that data model may 
be represented by a real aircraft or any of a number of data structures 
inside a flight simulator.

The real world, and human thought, is about the data models; software tools 
are about data formats. 

LESSON LEARNED: Data formats should be disposable, while data models live on, 
                so use the ASN.1 model :-)

ABS

-- 
Oh, pilot of the storm who leaves no trace, Like thoughts inside a dream
Heed the path that led me to that place, Yellow desert screen

References:
- Re: [xml-dev] The privilege of XML parsing - Data types, binary XML and XMLpipelines (LONG)
  - From: Sean McGrath <sean.mcgrath@propylon.com>
- Two-mile-high markup (data typing, etc.)XMLpipelines (LONG)
  - From: David Megginson <david@megginson.com>
- RE: [xml-dev] The privilege of XML parsing - Data types,binary XML
  - From: "Roger L. Costello" <costello@mitre.org>

Prev by Date: Re: [xml-dev] The privilege of XML parsing - Data types,binary XML
Next by Date: RE: [xml-dev] Namespace URI question
Previous by thread: Re: [xml-dev] The privilege of XML parsing - Data types,binary XML
Next by thread: Re: [xml-dev] The privilege of XML parsing - Data types, binary XML and XMLpipelines (LONG)
Index(es):
- Date
- Thread