[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] "Introducing MicroXML, Part 1: Explore the basicprinciples of ...
- From: James Clark <jjc@jclark.com>
- To: John Cowan <cowan@mercury.ccil.org>
- Date: Sun, 15 Jul 2012 11:24:24 +0700
On Fri, Jul 13, 2012 at 10:38 PM, John Cowan <cowan@mercury.ccil.org> wrote:
> I ... compromised:
>
> 1) PIs are in the syntax, but not in the data model. In MicroLark,
> if you want them, you have to use the pull or the push parser, because
> the tree parser ignores them. There's even a switch to turn them into
> fatal errors, if you decide you really don't want to deal with them.
>
> 2) Syntactically they have to look like start-tags except for the
> <? and ?>. That was pragmatic: there are a lot of widely used PIs that
> already look like that, and it made parsing them trivial. They are
> reported with the pseudo-attributes already nicely parsed.
This is the one part of your current editor's draft that I strongly
disagree with.
Your draft says:
"PIs are not part of the MicroXML data model, but processors SHOULD
make them available to the application."
But surely the data model should be the canonical way that processors
make information the available to the application.
If MicroXML users can't rely on getting PI information out from
processors, then they can't rely on PIs to encode significant
information, which makes PIs of little use.
But if you put arbitrary PIs in the data model, you are, of course,
significantly complicating things (going from 2 kinds of content to
3).
The start-tag syntax restriction also means you can't encode arbitrary
XML infosets.
> I think you need support for the xml-stylesheet PI if you are going to
> support MicroXML on the Web other than MicroXML + HTML5.
I find that a much more compelling use case.
The compromise I suggest is this:
- allow PIs only before the root element (and perhaps only before the
DOCTYPE if there is one), probably with the start-tag syntax
restriction
- put them in the data model, so your formal data model represents
documents by a <pi-list, element> pair (I would similarly describe an
element as a <name, attributes, content> triple), where the pi-list is
a list of elements
In terms of the JSON encoding, I would suggest the following:
- encode elements differently: an element is represented by a JSON
object, with each attribute represented as a property of the object;
the element name and content would be represented by properties whose
name starts with "$", so that are not legal XML names but are legal
JavaScript identifiers eg $name/$content or something shorter.
- encode PIs before the root element as a "$" prefixed property of the
root element eg $pi
- consider adding DOCTYPE to the data model as well using similar
techniques (not sure about this, but it's going to be difficult for
serializers to know whether to output a DOCTYPE otherwise)
I think this gives an encoding which is quite natural and makes it
easy to ignore PIs.
James
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]