OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   genx: canonicalization vs. pretty printing

[ Lists Home | Date Index | Thread Index ]

At 7:21 PM -0800 1/21/04, Tim Bray wrote:

>1. Any output encoding other than UTF-8
>2. Optional escaping of illegal characters
>3. Prettyprinting support
>4. Various kinds of error workarounds, and turning errorchecking off
>5. Writing CDATA sections
>6. Writing XML declaration (good idea, but I want the output to be 
>Canonical XML)

I think you're at best at 50% with these goals, maybe less. This is 
certainly not at 80%.

My experience with JDOM, XOM, and other APIs for doing operations 
like XInclude that ultimately reslt in a serialized XML document is 
that users really, really want pretty-printing a lot of the time. If 
the API wont do that, they will ignore it. I wouldn't pretty print by 
default, but I would definitely include options for setting the 
maximum line length and indent string.

Furthermore, I would make canonicalization an option if it's included 
at all. It  imposes a significant performance hit since you have to 
sort the attributes and namespaces. Worse, it prevents full streaming 
since the attributes and namespaces have to be buffered before you 
sort them. But more importantly, probably half the time users don't 
care about canonicalization. The other half the time they do care. To 
be more specific they don't want it. It's too damn ugly and the lines 
are too long to work with. Plus there's no XML declaration, which 
users like.

Similarly, users actively desire non-UTF-8 encodings. UTF-16, 
Latin-1, SJIS, etc. are all much easier for some classes of users to 
process with non-XML aware tools than is UTF-8 on today's systems.

I think an XML output library needs to realize that opening files in 
a plain text editor is still a very important use case a lot of the 
time. Byte-by-byte comparison and digital signatures usually aren't 
(and when they are the digital signature library will canonicalize 
first). Requiring canonical XML and not pretty printing or allowing 
encoding selection does not give users the simple library that does 
what they need. It is not sufficiently full featured to hit the 80-20 
point.

-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS