[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] MicroXML
- From: Amelia A Lewis <amyzing@talsever.com>
- To: David Lee <dlee@calldei.com>
- Date: Mon, 13 Dec 2010 13:56:51 -0500
On Mon, 13 Dec 2010 13:26:56 -0500, David Lee wrote:
> Filesystems often use the file extension as a magic number.
> I find this convenient but shouldn't be counted on ( particuarly on systems
> where you can pipe via stdin ).
> I'd presume that the app has to take care of using the right processor, just
> as it does today if you have a mix of text, image, html , xml and Json data
> in the same directory.
Not congruent problems.
By JC's design, uXML is XML 1.0. Namespace handling, in particular,
could be problematic, if a uXML document is handed to a processor
expecting XML 1.0 + namespaces. The same is true in reverse if an XML
1.0 + namespaces document were handed to a uXML parser (it's possible
that the parser writer could build in a fallback). With adoption and
tool development, the problem would eventually mostly go away, but a
five-year old document with extension .xyzzy that starts with <xyzzy
xmlns="http://great.underground.empire/";> is immediately attributable
to XML 1.0 or XML 1.0 + namespaces or uXML, but it *ought* to be
possible to distinguish more quickly--it's immediately distinguishable
from <html> (or <!DOCTYPE html) as initial characters, or the magic
numbers for PNG, JPEG, GIF, etc. Distinguishing from text is harder,
but a text file that starts with an SGML/XML/uXML-like
tag-containing-general-identifier is "reasonably" misidentified. I've
no clue what JSON looks like, or if it's detectable.
If you don't care that the heavier-weight parser is (always?) going to
be used, fine ... but I can't see much impetus for adoption; this
becomes again a best-practices proposal. Optimizations are possible if
you know it's uXML. If you see <?xml version="1.0"?> you know it's XML
1.0, not uXML. A lot of XML documents lack the declaration, though,
and absence of the declaration means that such XML documents are UTF-8
(like uXML), but these documents may have namespace fun (a problem for
a uXML parser), may contain PIs, etc. You could always try lightweight
and fall back to heavy, but this may be problematic. It would be
better, in my opinion, to have something recognizable.
Amy!
--
Amelia A. Lewis amyzing {at} talsever.com
Songs and fame are vain endeavor--
only two things fail us never,
only two things last forever--
sorrow and love, sorrow and love ....
-- The Last Song of Sirit Byar
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]