[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Random Access XML
- From: John Cowan <cowan@mercury.ccil.org>
- To: rjelliffe <rjelliffe@allette.com.au>
- Date: Wed, 23 Feb 2011 17:17:11 -0500
rjelliffe scripsit:
> Yes, if people are happy to keep comments and PIs after the prolog, I
> don't mind. (But I thought James' idea was to reduce the different
> number of nodes types in the parse tree, because multiple node types
> apparently freaks programmers out?)
Well, you know the zero-one-infinite rule. Without PIs, you need only
element nodes; with PIs, you need document nodes, element nodes, and
PI nodes. That's triple the number of node types.
MicroLark normally reports an error if a PI appears, but if you set the PI
feature to true, its push and pull parsers will report PIs, but the tree
builder will ignore them. Only PIs that look like well-formed start-tags
(except for the question marks) are allowed, which covers things like
xml-stylesheet and xml-model. XML declarations are still disallowed.
>> The only reason [MicroXML] doesn't ban > in attribute values is that
>> they are required for compatibility with Canonical XML.
>
> Oh, is that a requirement?
No, but it's convenient because it means that XML->MicroXML converters
already exist in the form of XML canonicalizers.
> (I think using non-ascii characters for token separators wont
> get any traction, unless encodings are restricted to UTF-*. [...])
MicroXML limits the encoding to UTF-8, with ASCII as a degenerate case.
> BTW, the idea of using paths in names to allow random access is not new
> or mine. IIRC the Dynatext readers indexed their SGML into a one element
> per line format, with a long path name at the beginning of each line.
> This allowed fast contextual searches using normal line-oriented text
> matching. I think Steve deRose had the patent on this, but I'd think it
> would be expired by now.
A good thing, since I have such a script not for indexing but for
pipelining: it produces lines of the form "path\tvalue" for every element
path in a document, where "value" is the XPath value of the element.
There's a switch to allow paths ending in "/@foo" as well.
--
All Gaul is divided into three parts: the part John Cowan
that cooks with lard and goose fat, the part http://ccil.org/~cowan
that cooks with olive oil, and the part that cowan@ccil.org
cooks with butter. --David Chessler
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]