OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   An approach to let XML 2.n resources hold multiple entities

[ Lists Home | Date Index | Thread Index ]

A couple of times people have suggested that XML should allow multiple top-level 
elements. Thinking about it, here is one possible approach that might fit in with existing
systems with fairly minimal changes.

The idea is that every top-level occurrence of <?xml\w (where \w means word end)
in an XML resource signals the end of any previous entity and the start of a
new one.  So the following would be valid

<?xml version="1.2"?>
<?xml version="1.2"?>
<?xml version="1.2"?>

but not

<?xml version="1.2"?>
<?xml version="1.2"?>

because we are not at the top level. Only the furst entity in the resource
can have a DOCTYPE declaration; this avoids several complications.

How does this fit in with XPath?  

At the moment,  count(/*) always is 1.  I am suggesting redefining /
away from being the "document" to being the "resource", and then
using indexing to get other entities. Two ways for this spring to mind:
1) Use existing XPaths, so that in the first example above the address of
the y element is   document("first example")/*[2]   
The XPath of the document element is document("first example")/*[1]   

This has the advantage of not requiring syntax changes to XPath. (The
only disadvantage I see is that XPath cannot express which entity
leading and trailing comments and PIs come from: I don't think this is
a biggy.)

<?xml version="2.0"?>
  <!ENTITY next SYSTEM "#xpointer(/*[2])">

<?xml version="2.0"?>

2) Use a new axis on XPath, for example
   /entity::*[2]  is the y element
   /entity::*[1] is the document element,
  /x is shorthand for /entity::*[1]/x  and 
  //x is shorthand for /entity::*[1]//x

This has the advantage of introducting parseable entities as first hand components
of a document, which may also be useable by XInclude

<?xml version="2.0"?>
  <!ENTITY next SYSTEM "#xpointer(/entity::*[2])">

<?xml version="2.0"?>

I am not sure which one I prefer.  

How does this fit in with SGML?

The top-level production of SGML is 

[1] SGML document =
  SGML document entity,
 (SGML subdocument entity |
   SGML text entity |
  character data entity |
  specific character data entity |
  non-SGML data entity )*

which models the document as a single stream of data broken into entities,
each entity being terminated and separated with an Entity End signal
(to the parser)  

SGML specifically says in a note on that production that "This International Standard
does not constrain the physical organization of the document within
the data stream, message handling protocol, file system etc that contains
it. In particular, separate entities could occur in the same physical object,
 a single entity could be divided between multiple objects, and the objects
could occur in any order."

Of course, at this top level the use of productions are just a formalism
not something an SGML parser needs to implements.  XML makes the
simplification that a entity is addressed by a single URL, which effective
precludes the need for an XML entity manager to handle elements that 
start in one entity by end in another.

But there is nothing I see in SGML that prevents a change in XML to
disconnect resource and entity, so that a resource can contain
multiple parseable XML entities.  

The textual nature of an XML resource is maintained and an existing
tag that is already swallowed as part of entity handling (i.e. <?xml?>)
is reused.  The use of explicit text is, I think better than using an invisible
control character, such as ^L form feed. 

Rick Jelliffe


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS