OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] "Introducing MicroXML, Part 1: Explore the basicprinciples of ...

=====================  Uche says
Sure without a separator, you would simply have an closing document tag switch the serial docs parser to a state of looking for new start tag, DTDecl, PI or end of stream, but I think an explicit separator would reduce the cases where what we would think of now as malformedness from user error winds up looking like an intentional sequence of two or more documents.

See my cross posted reference to 

I had an "Ah Ha" Moment last week when I realized that the UTF8 BOM could serve as such a separator.
( I havent updated the above page to reflect this).

Why I stumbled on this is I had a concatenation of all things, a bunch of JSON documents in UTF8.
( in this case Twitter output) and they had UTF8 BOM at the beginning of each document but all in the same file.
I opened it in my favorite JSON reader app and Voila ! It opened just fine but only showed the first JSON document.
Then I realized that a use case I wanted for XDM Serialize is that a sequence of 1 take the same format as just 1,
Thus a single XML document (or any XDM value) would have the same serialized form as a sequence of 1 document.
This is somewhat tricky ... in conjunction with some other use cases. Such as the concatentation of 2 documents should produce 
a sequence of 2 documents.
Then I realized that if I used BOM as a separator it might actually work and plain XML parsers could read the degenerate case of 1 document.
If every document started like
BOM <data>
BOM <data>

Then by themselves they are valid XML documents
If you concatenate them they become
BOM <data> BOM <data>

which a XDM Serialized capable parser could parse, and in some cases "dumb" parsers might just see this as 1 document and stop.

This also means you can concatenate arbitrary documents with 0 or more sequences without inspecting them and without adding extra bytes.
And splitting, counting  document sequences requires only knowing how to read for BOM sequences.

Still its a bit of a misuse though but still I am intrieged.

David A. Lee

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS