RE: [xml-dev] MicroXML

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Amelia A Lewis <amyzing@talsever.com>
To: David Lee <dlee@calldei.com>
Date: Mon, 13 Dec 2010 13:56:51 -0500

On Mon, 13 Dec 2010 13:26:56 -0500, David Lee wrote:
> Filesystems often use the file extension as a magic number.
> I find this convenient but shouldn't be counted on ( particuarly on systems
> where you can pipe via stdin ).
> I'd presume that the app has to take care of using the right processor, just
> as it does today if you have a mix of text, image, html , xml and Json data
> in the same directory.

Not congruent problems.

By JC's design, uXML is XML 1.0.  Namespace handling, in particular, 
could be problematic, if a uXML document is handed to a processor 
expecting XML 1.0 + namespaces.  The same is true in reverse if an XML 
1.0 + namespaces document were handed to a uXML parser (it's possible 
that the parser writer could build in a fallback).  With adoption and 
tool development, the problem would eventually mostly go away, but a 
five-year old document with extension .xyzzy that starts with <xyzzy 
xmlns="http://great.underground.empire/";> is immediately attributable 
to XML 1.0 or XML 1.0 + namespaces or uXML, but it *ought* to be 
possible to distinguish more quickly--it's immediately distinguishable 
from <html> (or <!DOCTYPE html) as initial characters, or the magic 
numbers for PNG, JPEG, GIF, etc.  Distinguishing from text is harder, 
but a text file that starts with an SGML/XML/uXML-like 
tag-containing-general-identifier is "reasonably" misidentified.  I've 
no clue what JSON looks like, or if it's detectable.

If you don't care that the heavier-weight parser is (always?) going to 
be used, fine ... but I can't see much impetus for adoption; this 
becomes again a best-practices proposal.  Optimizations are possible if 
you know it's uXML.  If you see <?xml version="1.0"?> you know it's XML 
1.0, not uXML.  A lot of XML documents lack the declaration, though, 
and absence of the declaration means that such XML documents are UTF-8 
(like uXML), but these documents may have namespace fun (a problem for 
a uXML parser), may contain PIs, etc.  You could always try lightweight 
and fall back to heavy, but this may be problematic.  It would be 
better, in my opinion, to have something recognizable.

Amy!
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
    Songs and fame are vain endeavor--
    only two things fail us never,
    only two things last forever--
    sorrow and love, sorrow and love ....
                -- The Last Song of Sirit Byar

References:
- MicroXML
  - From: James Clark <jjc@jclark.com>
- Re: [xml-dev] MicroXML
  - From: Amelia A Lewis <amyzing@talsever.com>
- Re: [xml-dev] MicroXML
  - From: Richard Salz <rsalz@us.ibm.com>
- Re: [xml-dev] MicroXML
  - From: Amelia A Lewis <amyzing@talsever.com>
- RE: [xml-dev] MicroXML
  - From: "David Lee" <dlee@calldei.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]