[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] MicroXML
- From: Amelia A Lewis <amyzing@talsever.com>
- To: xml-dev@lists.xml.org
- Date: Tue, 14 Dec 2010 01:00:28 -0500
On Tue, 14 Dec 2010 11:35:31 +0700, James Clark wrote:
>> How do I tell whether it's safe to use my uXML parser instead of my
>> (heavier) XML 1.0 + Namespace in XML + XML:Base + XML:ID + whatever
>> parser?
>
> Given that MicroXML is designed to be a subset, how could there be a
> reliable in-band mechanism to tell you? Anything you might put in the
Well, if MicroXML hadn't ruled out the use of most of the available
indicators, then certainly something like a PI would be possible.
> document, has to be legal XML 1.0, so it can't be a reliable indicator that
> it's MicroXML rather than XML 1.0. Similar problem with telling how to use
> MicroXML rather than HTML. I don't think this is any different from
> problems we have today. How do you choose between an HTML, XML or an SGML
<html> Not in a namespace? It's HTML. In the XHTML namespace?
XHTML. Not <html>? XML. There's potential confusion for XML vs SGML
if there is no XML declaration and there is a doctype declaration
containing at least a system ID. Hmmm. Well, the available BNF
suggests that the SGML declaration is not optional, either.
http://xml.coverpages.org/sgmlsyn/index.htm, and especially sgmlsyn.htm
there.
> parser? There's no reliable in-band mechanism. In the end, you have to rely
> on out-of-band information.
Perhaps. I've been involved (somewhat peripherally) in SGML-related
code (for a parser/validator capable of handling XML and SGML (and some
other things) (proprietary software)). For performance, we used
standard XML processors; even in 2000/2001 (when it was a live product)
instances of XML outnumbered instances of SGML encountered by a
significant factor (in particular environments, the reverse was true,
but they didn't mind adding XML parsing to SGML--whereas adding SGML
support to XML lost the value of XML, most thought).
I don't think MicroXML reaches that standard.
I'm not concerned about distinguishing (Micro)XML from HTML--or from
images, or from other easily recognizable file types. The question,
which I think is important, is how to safely use a small, fast MicroXML
parser--rather than starting to use it, throwing away the results, and
falling back to an XML parser.
> Nonetheless I can think of some heuristics. I suspect it is very unusual in
> XML to have a DOCTYPE declaration with neither an internal nor an external
> subset. Thus such a DOCTYPE declaration (regardless of the DOCTYPE name)
> could be a good indicator.
All right. This means that MicroXML cannot be embedded, unless the
doctype declaration is stripped (or the Root Element Type validity
constraint is to be ignored
(http://www.w3.org/TR/REC-xml/#vc-roottype)). The only potential for
confusion is for HTML5 polyglot markup; almost any other use case for
XML is going to include either a system id with URI or a public id
(with FPI and uri).
> I think the general policy has to be that if you don't have out of band
> information, then use the more liberal format (ie XML or HTML5 rather than
> MicroXML).
Oops. Is MicroXML actually attractive enough to see significant takeup
if the recommendation is that safe parsing in the absence of an
out-of-band indicator is to use something else?
Ah, well. It appears that this proposal is targeted
primarily-nearly-exclusively toward bridging XML with HTML5? Is that a
fair characterization? If so, I'll slide off and stop being annoying
(I don't have any interest worth mentioning in the behavior of
browsers).
I'd like to see a 'next generation'. I'm starting to wonder if we
haven't got at least a couple or three different use cases:
a) the confluence-with-the-browser case, where JSON and HTML5 are going
to be mentioned and targeted, where removing namespaces is accepted as
a near-given, but extensibility doesn't seem particularly important;
b) the xml-over-the-network case (including exempli gratia SOAP, but
also less RPC-ish document/resource interactions), where the doctype
decl has long been forbidden, and namespace improvements would be grand
but no one can afford to throw out the distributed-authority baby with
the prefix-mapping bathwater, and 'typing' (fsvo 'typing') is liable to
be an issue;
c) the document/store case (likely including databases), where the
entire prolog causes discomfort, and namespace simplification is
regarded as unattainable utopianism, again with at least a part of the
crowd concerned with 'typing'; this case may also include those doing
extensive pipeline processing (or perhaps that's yet another use case).
I'm probably not outlining the groups well. What all have in common, I
think: elements good, attributes good; things that aren't elements or
attributes bad (comments seem to be tolerated better than PIs or
anything from the prolog). I don't know that the browser-case
antipathy to extensibility via a distributed authority can be
reconciled with the much less drastic desire to address the various
(and variously interpreted or understood) shortcomings of the
namespaces specification for establishing a distributed authority for
vocabularies.
Hmmmmm. *shrug*
Amy!
--
Amelia A. Lewis amyzing {at} talsever.com
Yankees are compelled by some mysterious force to imitate Southern
accents and they're so damn dumb they don't know the difference beween
a Tennessee drawl and a Charleston clip.
-- Rita Mae Brown, "Rubyfruit Jungle"
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]