[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] How to represent mixed content in JSON and JSON Schema?
- From: "Norman Gray" <norman@astro.gla.ac.uk>
- To: "Amelia A Lewis" <amyzing@talsever.com>
- Date: Sat, 14 Jul 2018 00:29:04 +0100
Amy, hello!
On 13 Jul 2018, at 2:29, Amelia A Lewis wrote:
Hmmmm.
On Fri, 13 Jul 2018 01:01:17 +0100, Norman Gray wrote:
On 12 Jul 2018, at 15:58, Liam R. E. Quin wrote:
Yes. We saw this also back in Perl days, with some XML libraries
using
a mix of an array for contents and a hash for attributes,
[snip]
Entertainingly, the XML spec does not in fact explicitly require that
elements be presented to the application in document order. But it
omits that requirement on the grounds (and I think I can cite chapter
and verse on this) that such a requirement is so screamingly obvious
that it would be bloody silly to spell it out.
Section 3.2.1? The specification of the contents of a document type
definition provides productions for 'choice' and for 'seq', and note:
"Any content particle in a choice list may appear in the element
content at the location where the choice list appears in the grammar;
content particles occurring in a sequence list must each appear in the
element content in the order given in the list."
Ah no, you're not getting away that easily. That specifies that order
is constrained _in the source document_, but it is magisterially silent
on the order in which those elements are presented to the processing
application. If the parser wants to save up all of the <strong>
elements and deliver them in a big reveal at the end, with trumpets,
then who are we to deny it its god-given right to do so?! The quality
of syntax is not strained. It droppeth as the gentle rain from
tag-salad, upon the application beneath.
It is, I suppose, merely an implication that a documented validated by
DTD would be presented to the application in the same order that it
had
to be presented to the validator or validating parser, but since the
validator can be conceived as an application, I think it's a fairly
strong implication.
But is that written down?! (plus a couple more !! for good luck)
Of course, at this point I start to feel like the rather desperate
lawyer saying 'but your honour, the legislation doesn't actually _say_
that drink-driving remains illegal if my... if a driver were at the same
time wearing a clown wig and playing the national anthem on the kazoo
(and whilst I appreciate my client's latest address to the court may
have been too slurred to be readily intelligible, your honour should
infer no animadversions from the admittedly unfortunate gestures which
accompanied it)'
A somewhat stronger doubt might be thrown by suggesting that since DTD
validation is optional, non-validating parsers need not present
content
in order, but here the mere existence of section 3.2.2, and the
concept
of mixed content, pretty much mandates that children (elements and
text
nodes) have to be presented in the order they are encountered, or
significant information is lost.
I think that the 'problem' here is that the XML spec is almost entirely
concerned with the syntax of the source document, and says remarkably
little about its semantics. That's because the semantics of a parse are
so obvious, and the semantics of a pointy-bracket parse so immediately
in the SGML background of most folk reading the XML spec at that time,
that it would be merely obfuscatory to rehearse them. I think XML was
trying to hum a different practical tune from the high-minded legalism
of ISO 8879 and friends.
[...] and who insisted
that the Namespaces in XML 1.0 specification had simply misspoken when
it distinguished in namespace handling for attributes and elements,
and
therefore implemented his code to treat attributes identically to
elements). I can *hear* your colleague insisting "But it doesn't *say*
that, does it? Ha!"
What can one say? Except: loading-bay -- bare-knuckles -- now!
I can't imagine how the Perl code you describe handled mixed content,
though. Did it just not support it? Concatenate all the text nodes (or
better: throw away all the text nodes after the first, or replace the
m_text members value with each new text node, effectively discarding
all but the last) and set them as an m_text member, separate from the
m_children hash, which contained only elements? And how did it
distinguish between replacing a child versus multiple children of the
same name? Oh, well ... long ago, in a different country, and the code
is dead, I suppose.
I think I have cleansed my head of the memory of what it actually did,
but I suspect that it was originally intended for parsing XML
configuration files (let's not start...), and thus used in contexts
where document order didn't matter.
As so often happens, exasperation found expression in a wall of text,
which resulted in <http://text.nxg.me.uk/2010/1yfs>. That, as so often
happens, "fell deadborn from the press, without reaching such
distinction as even to excite a murmur among the zealots."
Best wishes,
Norman
--
Norman Gray : https://nxg.me.uk
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]