xml-dev - RE: Streams, protocols, documents and fragments

RE: Streams, protocols, documents and fragments
[ Lists Home | Date Index | Thread Index ]
From: <Marc.McDonald@Design-Intelligence.com>
To: <xml-dev@ic.ac.uk>, <Mark.Birbeck@iedigital.net>
Date: Wed, 24 Feb 1999 17:09:17 -0800
Instead of separation characters, I would just label the fragments 
(borrowed from XML linking):
	<xf:fragment location="ROOT() CHILD(1,stocks) (1700,stockPrice)">
  <stockPrice timestamp="19992402141500">
	    <ticker>MSFT</ticker>
	    <price>1000</price>
	  </stockPrice>
	</xf:fragment>
	<xf:fragment location="ROOT() CHILD(1,stocks) (1327,stockPrice)">
	  <stockPrice timestamp="19992402132540">
	    <ticker>ICI</ticker>
	    <price>1010</price>
	  </stockPrice>
	</xf:fragment>

The xf:fragment element identifies each fragment in terms of its 
location in the entire document. In this case it assumes a document 
structured as <stocks> <stockPrice>..</stockPrice>* </stocks> where 
the prices desired are the 1700th and 1327th price entries. It may be 
a bit verbose, but allows a document tree to be transmitted piecemeal 
and reassembled. Access to any node not yet downloaded could be 
requested (for instance if stockPrice 1553 were desired). The 
reassembled tree caches the subset of elements that an application is 
interested in, but has all of the holes to access additional 
elements.

Internet protocols are supposed to ensure an error free transmission. 
Any ordering problems are resolved by the location description.

Marc B McDonald
Principal Software Scientist
Design Intelligence, Inc
www.design-intelligence.com


----------
From:  Mark Birbeck [SMTP:Mark.Birbeck@iedigital.net]
Sent:  Wednesday, February 24, 1999 7:18 AM
To:  xml-dev list
Subject:  RE: Streams, protocols, documents and fragments

> From:	Borden, Jonathan [SMTP:jborden@mediaone.net]
> My sole purpose in discussing 'document
> fragments' was because the thread had gotten stuck on the notion 
that
> a
> continuous XML stream would contain a single long document (perhaps
> w/o a
> closing tag) and the actual PDU's consist of document fragments ...
> the
> point is that if we create a protocol on a stream which transmitts
> multiple
> documents, there is no loss of functionality over a solution 
employing
> 'document fragments'
>
	I agree with this. And the point I was trying to get to was that
therefore we don't need to introduce loads of terms on top of XML 1.0 
to
understand the concepts.

	I still think all of this is being over-complicated - but then
maybe I'm the one who's missing something, so let's see.

	I don't follow why so many suggestions to resolving this problem
involve stepping 'outside of' XML 1.0. We have suggestions for sync
characters like ^C and ^L, we have the proposal that XML 1.0 should 
be
fundamentally altered to allow the concept of a 'not well-formed'
document (or one that may *become* well-formed at some point in the
future), we have proposals for documents that contain subsets of
validity. All of these suggestions seem to go against the grain of 
what
XML is about.

	XML 1.0 already copes with streams and files. A physical XML
document is a linear sequence of characters conforming to certain 
rules.
You can't tell whether those rules have been met until you have 
received
the entire sequence of characters. You know when you've reached the 
end
by the closing tag. That's it! There's not much else you can do about
it, because that's what XML is all about - well-formed, possibly
validated documents conforming to certain rules.

	Now, the fact that the beginning and end of this sequence of
characters may be presented to the parser eight hours apart is to me 
an
application problem. If someone has a document that takes eight hours 
to
arrive then maybe they should re-think how they're setting the system
up. If it's a massive document that can only be processed in its
entirety, and if any part fails to arrive the whole document fails, 
then
sure, you have to go ahead and send it over eight hours. But the 
stock
ticker example is not like this. If I miss the stock price for 
Microsoft
at 11am, then I can still make use of the stock price for Microsoft 
at
11.20am. It will affect my historical archives, but at least I have
something to display. It is not an 'all or nothing' situation.

	So, accepting for a moment that we should transmit many
documents throughout the day, rather than one big one, it leaves the
question of demarcation. And here I'm surprised that people want to 
step
outside of XML to find a solution. Say we send the following:

	^L
	<stockPrice timestamp="19992402141500">
	    <ticker>MSFT</ticker>
	    <price>1000</price>
	</stockPrice>
	^L
	<stockPrice timestamp="19992402132540">
	    <ticker>ICI</ticker>
	    <price>1010</price>
	</stockPrice>
	^L

	If the data link is 100% reliable then we have encoded redundant
information because the document name - the element for stockPrice -
already tells us where one starts and ends. So, we don't need the ^L.

	But if the data link *isn't* reliable then adding a few ^L
characters doesn't help a lot, because if we lose the following 
sequence
we have no way of knowing:

	    <price>1000</ticker>
	</stockPrice>
	^L
	<stockPrice timestamp="19992402132540">
	    <ticker>ICI</ticker>

	If this sequence is taken out of the above two documents then
you now have the wrong price for Microsoft and nothing for ICI, and 
your
application is none the wiser.

	I think if 100% data reliability is required then we need a few
streaming-related attributes that we can add to our documents, such 
as:

	<stockPrice timestamp="19992402141500" streamns:packetID="55">
	    <ticker streamns:packetID="55">MSFT</ticker>
	    <price streamns:packetID="55">1000</price>
	</stockPrice>
	<stockPrice timestamp="19992402132540" streamns:packetID="56">
	    <ticker streamns:packetID="56">ICI</ticker>
	    <price streamns:packetID="56">1010</price>
	</stockPrice>

	These would be added by a 'sending' application as a separate
layer to the original document generation, and would allow the 
receiving
application to process all the 'streamns' packets before actually
processing the nodes - say, storing or displaying the stock prices. 
You
could remove 'invalid' nodes from the tree (well-formed at the XML
level, but with the wrong packet ID), and then while your main
application is getting on and acting on the stock data, the receiving
process could be re-requesting the lost data. In the illustration 
above,
after losing the packet, we would now have:

	<stockPrice timestamp="19992402141500" streamns:packetID="55">
	    <ticker streamns:packetID="55">MSFT</ticker>
	    <price streamns:packetID="56">1010</price>       <--- error
here
	</stockPrice>

	and the 'streamns' processing would spot and re-request the
missing data easily (both packet 55 and packet 56).

	To be honest, I'm not suggesting what I've said here as some new
standard. There are lots of ways what I've described could be 
achieved,
for example:

	<stockPrice timestamp="19992402141500" streamns:packetID="55"
streamns:checksum="556543">
	    <ticker>MSFT</ticker>
	    <price>1000</price>
	</stockPrice>
	<stockPrice timestamp="19992402132540" streamns:packetID="56"
streamns:checksum="771239">
	    <ticker>ICI</ticker>
	    <price>1010</price>
	</stockPrice>

	takes up less space, and would still spot the same errors. I'm
just trying to illustrate how solutions can be found that don't 
involve
smashing XML 1.0 to bits. At the end of the day this is an 
application
problem, not an XML one.

	Regards,

	Mark


xml-dev: A list for W3C XML Developers. To post, 
mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on 
CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following 
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
Prev by Date: RE: MIME types vs. DOCTYPE (was RE: ANNOUNCE: New XHTML WD)
Next by Date: RE: MIME types vs. DOCTYPE (was RE: ANNOUNCE: New XHTML WD)
Previous by thread: RE: Streams, protocols, documents and fragments
Next by thread: RE: Streams, protocols, documents and fragments
Index(es):
- Date
- Thread