OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Attribute Order (Was: Create XML )

[ Lists Home | Date Index | Thread Index ]

elharo@metalab.unc.edu (Elliotte Rusty Harold) writes:
>Source code level editors are a very special case. They really need a 
>special purpose API. SAX, DOM, JDOM, etc. are not adequate for their 
>needs. Attribute order is really the least of the problems here. 
>However, baking the needs of an editor into a generic processing API 
>would vastly complicate the APIs. It would make them much less 
>suitable for all other uses.  I still see no reason for relying on 
>attribute order when it comes to the information content of an XML 
>document.

I'm not talking about source code level editors here.  I'm talking about
dealing with information that comes back to you after you create a
document and feed it into a processor.  (Remember, Common XML took its
foundations from what survived in round-tripping documents, so this
isn't the first time these issues have been visited.)

SAX filters run from the command line are a good example - I use them
for minor transformations all the time.  Unfortunately, what I get back
is different from what I put into it.  Sometimes that's fine - reversing
the order of two attributes is not a big deal for me to work with.
Sometimes it sucks, as when fifteen attributes come back at you with a
different order than you used originally, sometimes in fifteen different
orders.

I don't care much about theories of processes expect only to feed XML
from one computer to another.  The processes I work on are processes
which tend to benefit from human interactions, even if human interaction
is not something their creators expected.  In my own work, there's a lot
of back and forth between me making edits to documents and using
processors of various kinds.  DocBook (for O'Reilly work) is my
nightmare in this story, but I have similar problems working with other
vocabularies, including data-centric vocabularies.

>>For a non-HTML example, I think we're all familiar with XML documents
>>representing relational tables where the creator of the document
>>represented the values as attributes. If another process makes
>>changes to some values in some rows with a filter and the document is
>>re-serialized, there are often side effects.  Suddenly, the sequence
>>of attributes can vary among the elements.  It may not matter to the
>>computer consuming it, but it does make debugging more interesting
>>than it should be.
>
>I don't see why this is any problem at all. Can you be more specific?

I'm just typing this up with random data, but hopefully you'll get the
idea:

<table>
<row item="0001452A" time="08:08.72PM EST" quantity="3" discount="20%"
tax="7%" base_price="19.99" />
<row item="1356352A" time="08:09.81PM EST" quantity="1" discount="10%"
tax="3.5%" base_price="4.99" />
<row item="BC758333" time="08:12.14PM EST" quantity="1" discount="50%"
tax="7%" base_price="19.99" />
<row item="0001452A" time="08:14.09PM EST" quantity="7" discount="0%"
tax="3.5%" price="1.99" />
</table>

Is the initial data.  I run a transformation on it which only modifies
entries with discounts 20% and over, and I might get:

<table>
<row base_price="19.99" discount="20%" item="0001452A" quantity="3"
tax="7%" time="08:08.72PM EST" color="pink" />
<row item="1356352A" time="08:09.81PM EST" quantity="1" discount="10%"
tax="3.5%" base_price="4.99" />
<row base_price="19.99"  discount="50%" item="BC758333" time="08:12.14PM
EST" quantity="1" tax="7%" color="red" />
<row item="0001452A" time="08:14.09PM EST" quantity="7" discount="0%"
tax="3.5%" price="1.99" />
</table>

That's not even that big a change, but finding things in that just got a
lot harder. (Try it with a thousand rows, twenty attributes, and
multiple choices of attribute sequence.) Instead of using the
pattern-matching abilities that humans come with, I now get to write
code if I want to figure out what's changed.  

>>I also see it as order as important to processing which works on XML
>>documents as a stream of characters, processing attributes as they
>>appear in the start tag rather than as a set, though the actual
>>throughput improvements in those cases are pretty minimal.
>
>This is completely possible now. There is nothing in the spec that 
>says a processor cannot stream attributes to the client application 
>in the order they appear. There is nothing that says the processor 
>must shuffle the attributes. It just says that client applications 
>cannot rely on processors reporting attributes in any particular 
>order.

That's ignoring the reality of XML processor practice, which has been a
long story of people reading the rec, seeing that it says attribute
order is insignificant, and using their favorite language-specific
construct for name-value pairs.  Explicit shuffling isn't necessary - it
just happens.  I find it happens regularly in SAX when I create new
AttributeImpl objects from existing Attribute objects, which is
generally necessary should you want to modify any attributes.  It's
annoying, but it's not a bug.

That said, I'm writing tools which do preserve attribute order and other
syntactic bits which have traditionally been discarded.  The folks at
Extreme have seen fit to give me a "Daily Polemic" on the subject, and
I'm hoping to explore these issues (and why they matter) more deeply
there.

>>One specific case where attribute ordering mattering might have been
>>especially useful is in namespace declarations.  It's already the
>>case that the entire start tag has to be processed before you know
>>what namespace the element is in, but the same issues apply to
>>attributes. Stating that the namespace declaration which applies to
>>the element must appear first and that all namespace declarations
>>must appear before other other attributes might at least make it
>>easier for both humans and computers to sort out what they're looking
>>for with a lot less hunting around.
>
>You're stretching here. This is just not a problem in practice. In 
>fact, adding a requirement like this would make it harder for humans 
>to author XML documents because there'd be one more unobvious rule 
>they'd need to learn and follow. The status quo is simpler than what 
>you propose.

Stretching?  Sure.  But since the attribute order actually exists in the
document, I don't feel like I'm asking for a lot in hoping that I can
take advantage of it.  This is not a common problem in practice, but it
has both affected my code and my personal readings of various (perverse
but very legal) XML documents. A simpler answer would have been a
different approach to namespace declarations, of course...

>I disagree. I think we've learned that due to some poor choices made 
>in the early days with XSLT and XPath and now schemas, we need to 
>keep the prefixes around. However, this is recognized as a flaw in 
>those applications, not as a good thing in and of itself. More modern 
>applications like RELAX NG are designed so that it's no longer 
>necessary to keep the prefixes around.

Much as I like RELAX NG, I don't see it as an invitation to throw away
namespace prefixes and substitute ns0, ns1, etc.  XPath alone will keep
computers stuck with prefixes for years to come, but humans tend to
prefer consistency in any case.

-- 
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com -- http://monasticxml.org




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS