XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] Lesson Learned: Use namespaces for both markup anddata

I have to say that, all in all, I don't understand why working with
namespaces is supposedly so painful, so I've stayed out of this
discussion; however, as an "application developer" who works with quite
a bit of XML data, I have to disagree with you here.  Actually, I
disagree with Roger, too, so maybe I'm just disagreeable. :-)

In processing a large XML document, the cost of keeping track of the
prefix-to-namespace mappings is fairly trivial (mid-stream redefinitions
of existing mappings do throw a wrinkle into things, but it shouldn't be
insurmountable by any means); keeping track of the document context
itself should also be easy, if that is essentially a linked list of
child-to-parent relationships extending all the way back to the document
root.  However, a lookup against a map of prefixes to namespaces can be
done in constant time, whereas if I rely on document context, a lookup
will take time proportional to the depth of my tree.

The only time I can think of when the prefix mappings become non-trivial
(and the one you seem to be alluding to when you talk about "copying"
and "pasting") is when doing manual editing of raw XML data, which
_should_ be a fringe case.

Simple example spinning off of Roger's example (I'll assume as you did
that there are schemas available):

<example xmlns:aqr="http://www.aquarium.org";
xmlns:atl="http://www.artillery.gov";>
  <aqr:tank>
    <aqr:capacity>55</aqr:capacity>
  </aqr:tank>
  <atl:tank>
    <atl:capacity>300</atl:capacity>
  </atl:tank>
</example>

Due to the use of QNames here, I needn't backtrack at all to figure out
that the first capacity element is measured in gallons while  the second
is measured in shells-the prefix tells me the namespace, which in turn
maps to a schema, which specifies all relevant information about the
capacity element in that namespace.  The only context information I need
is a mapping of prefixes to namespaces.

On the other hand, consider what happens if I don't use Qnames (leaving
out long list of attributes that would need to exist on the example
element for brevity):

<example ...>
  <tank xsi:type="Aquarium">
    <capacity>55</capacity>
  </tank>
  <tank xsi:type="Artillery">
    <capacity>300</capacity>
  </tank>
</example>

Now, in order to get the same information, I need to backtrack through
my stored document context information, noting the xsi:type of the tank
element on my way, before I can look up the schema; then I will need to
proceed through the same process as before, except that I may have to
search through several schemas in order to determine which one defines a
capacity element as a child of a tank element with the particular
xsi:type that I'm looking for.  (As an alternative to the document
context linked list, I could store all of the schema information in
memory, but that would be very expensive.)

In this simple case, of course, I only had to go up a couple of levels,
but you can extrapolate to larger and more complex documents.

I definitely don't agree with Roger's idea of using QNames in data,
either, though-IMHO that dangerously mixes metadata with data.

Doug Glidden
Software Engineer
The Boeing Company
Douglass.A.Glidden@boeing.com

-----Original Message-----
From: Amelia A Lewis [mailto:amyzing@talsever.com] 
Sent: Tuesday, August 11, 2009 16:35
To: Costello, Roger L.
Cc: xml-dev@lists.xml.org
Subject: RE: [xml-dev] Lesson Learned: Use namespaces for both markup
anddata

On Tue, 11 Aug 2009 16:06:05 -0400, Costello, Roger L. wrote:
>    When this QName value is taken out of context, 
>    it does not have less information than a 
>    non-QName value taken out of context.

True enough, if you strip all context.

You're expected to maintain the context of the element or attribute when
interpreting any data in an XML document.  If you aren't doing that, why
use XML?  CSV is much more compact.

So, you have a labelled graph, and each information-bearing node has a
label.  You may want to look a little way up the ancestor axis (for
instance, you may want to know what element an attribute is on before
interpreting the content of the attribute).  Or you may not, but ... if
you're relying on that context, chances are good that you have a schema.
The information content of a particular node is relatively
well-localized; it may need no more than it's label (element or
attribute name), or it may need slightly more information extracted from
the ancestor axis, but the structure of XML imposes the rules,
reasonably cleanly.  You can cut a fragment
(<example><object>value</object></example>) and--based on the namespaces
required by the *structure* (example and object, here), paste it into
some other XML safely (FSVO "safely").

Once you have namespaces in content, then you're pretty much forced to
copy *all* in-scope namespaces for the example fragment, because you
can't tell which ones you need.

Your application is also forced to replicate the functionality of an XML
processor.  It must keep track not only of the expanded names of the
elements/attributes that it processes, but also the prefix-to-identifier
mappings that are in force for that document.

"aquarium:tank" may equal "artillery:tank" or "tank", each of these
QNames binding their prefixes to the same namespace identifier.  Stick
QNames in context and you then have to explain, to the application
processors, that in this case, an artillery:tank is *just the same* as
an aquarium:tank, because they're both bound to the
http://www.gaz-guzzlers.com URI.  Application developers will, with
perfect reason, wish to beat you, and the first person who put QNames in
content, and probably the authors of the namespaces in XML spec.

Poor locality of data, information not enforced implicitly by the rules
of the XML schema, and a data representation that misleadingly does not
contain half of the significant information (assuming the local name
carries half the value, a QName does not provide the other half, it
provides a pointer that allows you to look that up, so long as you
maintain the complete state at the point of parsing).

QNames in content are a *terrible* idea.

Amy!
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
Confidence: a feeling peculiar to the stage just before full
comprehension of the problem.

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS to
support XML implementation and development. To minimize spam in the
archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org List archive:
http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS