Re: [xml-dev] How to represent mixed content in JSON and JSON Schema?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Norman Gray" <norman@astro.gla.ac.uk>
To: "Amelia A Lewis" <amyzing@talsever.com>
Date: Sat, 14 Jul 2018 00:29:04 +0100

Amy, hello!

On 13 Jul 2018, at 2:29, Amelia A Lewis wrote:

Hmmmm.

On Fri, 13 Jul 2018 01:01:17 +0100, Norman Gray wrote:

On 12 Jul 2018, at 15:58, Liam R. E. Quin wrote:
Yes. We saw this also back in Perl days, with some XML libraries using
a mix of an array for contents and a hash for attributes,

[snip]

Entertainingly, the XML spec does not in fact explicitly require that
elements be presented to the application in document order.  But it
omits that requirement on the grounds (and I think I can cite chapter
and verse on this) that such a requirement is so screamingly obvious
that it would be bloody silly to spell it out.

Section 3.2.1? The specification of the contents of a document type
definition provides productions for 'choice' and for 'seq', and note:
"Any content particle in a choice list may appear in the element
content at the location where the choice list appears in the grammar;
content particles occurring in a sequence list must each appear in the
element content in the order given in the list."

Ah no, you're not getting away that easily. That specifies that order is constrained _in the source document_, but it is magisterially silent on the order in which those elements are presented to the processing application. If the parser wants to save up all of the <strong> elements and deliver them in a big reveal at the end, with trumpets, then who are we to deny it its god-given right to do so?! The quality of syntax is not strained. It droppeth as the gentle rain from tag-salad, upon the application beneath.

It is, I suppose, merely an implication that a documented validated by
DTD would be presented to the application in the same order that it had
to be presented to the validator or validating parser, but since the
validator can be conceived as an application, I think it's a fairly
strong implication.

But is that written down?! (plus a couple more !! for good luck)

Of course, at this point I start to feel like the rather desperate lawyer saying 'but your honour, the legislation doesn't actually _say_ that drink-driving remains illegal if my... if a driver were at the same time wearing a clown wig and playing the national anthem on the kazoo (and whilst I appreciate my client's latest address to the court may have been too slurred to be readily intelligible, your honour should infer no animadversions from the admittedly unfortunate gestures which accompanied it)'

A somewhat stronger doubt might be thrown by suggesting that since DTD
validation is optional, non-validating parsers need not present content
in order, but here the mere existence of section 3.2.2, and the concept
of mixed content, pretty much mandates that children (elements and text
nodes) have to be presented in the order they are encountered, or
significant information is lost.

I think that the 'problem' here is that the XML spec is almost entirely concerned with the syntax of the source document, and says remarkably little about its semantics. That's because the semantics of a parse are so obvious, and the semantics of a pointy-bracket parse so immediately in the SGML background of most folk reading the XML spec at that time, that it would be merely obfuscatory to rehearse them. I think XML was trying to hum a different practical tune from the high-minded legalism of ISO 8879 and friends.

[...] and who insisted
that the Namespaces in XML 1.0 specification had simply misspoken when
it distinguished in namespace handling for attributes and elements, and
therefore implemented his code to treat attributes identically to
elements). I can *hear* your colleague insisting "But it doesn't *say*
that, does it? Ha!"

What can one say?  Except: loading-bay -- bare-knuckles -- now!

I can't imagine how the Perl code you describe handled mixed content,
though. Did it just not support it? Concatenate all the text nodes (or
better: throw away all the text nodes after the first, or replace the
m_text members value with each new text node, effectively discarding
all but the last) and set them as an m_text member, separate from the
m_children hash, which contained only elements? And how did it
distinguish between replacing a child versus multiple children of the
same name? Oh, well ... long ago, in a different country, and the code
is dead, I suppose.

I think I have cleansed my head of the memory of what it actually did, but I suspect that it was originally intended for parsing XML configuration files (let's not start...), and thus used in contexts where document order didn't matter.

As so often happens, exasperation found expression in a wall of text, which resulted in <http://text.nxg.me.uk/2010/1yfs>. That, as so often happens, "fell deadborn from the press, without reaching such distinction as even to excite a murmur among the zealots."

Best wishes,

Norman

--
Norman Gray : https://nxg.me.uk

Follow-Ups:
- Re: [xml-dev] How to represent mixed content in JSON and JSONSchema?
  - From: "Liam R. E. Quin" <liam@fromoldbooks.org>

References:
- How to represent mixed content in JSON and JSON Schema?
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] How to represent mixed content in JSON and JSON Schema?
  - From: Henry Luo <henry@perpetuatech.net>
- Re: [xml-dev] How to represent mixed content in JSON and JSONSchema?
  - From: "Liam R. E. Quin" <liam@fromoldbooks.org>
- Re: [xml-dev] How to represent mixed content in JSON and JSON Schema?
  - From: "Norman Gray" <norman@astro.gla.ac.uk>
- Re: [xml-dev] How to represent mixed content in JSON and JSONSchema?
  - From: Amelia A Lewis <amyzing@talsever.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]