xml-dev - Re: [xml-dev] An approach to let XML 2.n resources hold multiple entitie

Re: [xml-dev] An approach to let XML 2.n resources hold multiple entitie

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] An approach to let XML 2.n resources hold multiple entities
From: "Rick Jelliffe" <ricko@allette.com.au>
Date: Wed, 19 Mar 2003 23:51:14 +1100
References: <B885BEDCB3664E4AB1C72F1D85CB29F805C1F72D@RED-MSG-10.redmond.corp.microsoft.com> <002a01c2ede6$60df40a0$4bc8a8c0@AlletteSystems.com> <115943299723.20030319102056@jenitennison.com>

From: "Jeni Tennison" <jeni@jenitennison.com>

> Here's a third possibility: in XPath 2.0, the collection() function
> returns a sequence of nodes from a particular URL. In the case of a
> file that contains multiple documents, collection("first example")
> could return multiple document nodes, so you would use
> collection("first example")[2]/* to get the <y> element.
...
> Such a document could also act as the input to a transformation. In
> XSLT 2.0, the input() function returns a sequence of nodes as an
> input, and in the case of a "resource"/"collection document" in the
> format you suggest, this could be a sequence of document nodes. So if
> your document was acting as the input to a transformation then
> input()[2]/* would return the <y> element.

Great!

Also, I am not suggesting files containing multiple documents, but 
[the result of dereferencing a URL]'s containing multiple XML entities.

"Bill de hÓra" wrote:

> This proposal will probably result in encoding weirdness unless it 
> offers some guidance in that area.

There would be a WF constraint that only the first entity had an
encoding declaration, the subsequent ones would have an xml declaration
only. The whole file/stream would be in a single encoding.
(I would prefer "approach" to "proposal": much to early!)

From: <AndrewWatt2000@aol.com>

> Would you care to lay out the use case for this suggested change?

1) Log files, or documents where you want to continually append
  fragments, without wanting to keep the context to end them,
 for example in a stream.  Using that XPath2 input() function Jenny
  mentioned (without trying to get a proper XPath2 path):

 <?xml ...?>
 <!DOCTYPE logs [ 
   <!ENTITY contents SYSTEM "#input()/*[position()!=1]"> 
]>
<logs>&contents;</logs>
<?xml ...?>
log 1
<?xml ...?>
log 2
...

2) Incremental or lazy parsing of documents. The parser reads the first document
(e.g. into a DOM). When the user agent requests elements in a subsequent
entity, the parser continues parsing (or fast scans) to that entity then
parses it. 

(This shows the downside of using text rather than a control character:
super fast skipping over entities is not really available--you need to have
some simple delimiter-aware/element-stack-aware skipping.)

3) Transmitting a Post Schema Validation Infoset without altering the
original document: the PSVI augmentations are added as extra
entities to the same resource, thereby not altering the original document's
XPaths at all.

Or any document where we want to have out-of-line annotations to an
existing document, preserve the original document intact, and transmit
the whole thing as one resource.

3a) Transmit a RELAX NG, Schematron or XSD schema or XSLT
stylesheet along with the document. 

3b) A RDDL document, together with the XML resources it references
Or any time we want to bundle together different "documents" which
each use a different standard schema but act as a whole, and where
we want to name and access them as a single resource.

4)  To suit transmission of documents over the web, where we want
to be able to start rendering the document as soon as we receive it.
This is problematic in XML, because if a WF error is found, the
document is supposed to fail.   Using this multi-entity XML, the
user agent does not need to wait till the whole document is received
before rendering that top-level chunk.  If datacoms are bursty, then
starting to render the document does not need to wait until the end.

Consider how Acrobat makes pages available as soon as they are received,
or HTML's progressive rendering. Any time we have a sequence of pages 
and once the user has started at the first one, we want to have subsequent
ones pre-fetched and available ASAP.   

Note that the use of the term "entity" here does not imply that they
are in anyway tied to XML entity declarations (syntax) or XML entity
reference syntax. They could be not declared, but referenced using XInclude.
Indeed this kind of XInclude use might be more appropriate for this 
use case, to avoid WF controversies. 

5) To provide a way out of the signing problem, to make it trivial for
a document to be sent along with metadata about itself such as
checksums. 

Or where the metadata is not part of the document but application
specific: such as where the document does not have Dublin Core
elements in its schema, but we want to ship along a Dublin Core
metadata file with the document.  

6) To decrease the amount of buffering required at the server
side.  This is the table of contents (TOC) problem.  If we want to progressively
accumulate elements when transforming a document, then place 
these first in our document, we have to suspend transmitting our document
at that point until we have harvested all the information that is supposed
to go their.  

Instead, with this we can transmit the document first with a reference
at the TOC point, then transmit the TOC as a subsequent entity in
the same resource.   That they are in the same resource means we
don't need to make the TOC available at some temporary URI, and
the user agent does not need to make a second request nor open another
connection.  The recipient puts the information in sequence, not the 
sender.

7) Because entities (reading this to mean "fragments"  and avoiding kneejerks 
based on XML markup declarations) are a form of modularity which gives 
programmers more flexibility.  A Good Thing 

8) Where we want to add information as part of a document, but
we don't want to check for ID clashes (the other entities can
act as SUBDOCs and have their own ID scope if they are not
referenced with & as entities.)

9) So that application-settings can be saved along with with a
document. For example editor settings: rather than pollute
the document with extraneous PIs or elements, the information
is tacked on to the end. 

10) To allow the storage of deltas, out-of-band with a document.
The first entity is the document, and some subsequent unreferenced
entity gives deltas. This is for editing and version control. 

Probably there are more, and I don't know whether any of these
are particularly compelling. 

Cheers
Rick

References:
- RE: [xml-dev] XML too hard for programmers?
  - From: "Dare Obasanjo" <dareo@microsoft.com>
- An approach to let XML 2.n resources hold multiple entities
  - From: "Rick Jelliffe" <ricko@allette.com.au>
- Re: [xml-dev] An approach to let XML 2.n resources hold multiple entities
  - From: Jeni Tennison <jeni@jenitennison.com>

Prev by Date: Re: [xml-dev] An approach to let XML 2.n resources hold multiple entities
Next by Date: RE: [xml-dev] XML too hard for programmers?
Previous by thread: Re: [xml-dev] An approach to let XML 2.n resources hold multiple entities
Next by thread: Re: [xml-dev] An approach to let XML 2.n resources hold multipleentities
Index(es):
- Date
- Thread