OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

Rick Marshall wrote:
> if you use a binary format how do you know
> the end of the data is the end of the data?
	Depends on the format. There are a couple of different ways to
do this...
	1. Use "tag-length-value" encodings. i.e. each element is
preceded by a tag and a length. Since you know the length, you know
where the value ends. Of course, there is no problem with nesting
elements -- you just increase the length of the outer tags. This is
real nice for fast parsing since you never have to scan for an "end
tag." You always know the size of the elements you are reading.
	2. Use start/stop tags. i.e. define something that serves like
a null value does in null-terminated strings. You insert that value
into the binary and then scan for it. (This is less efficient but
works well when your lengths are not known when you start an element
-- you might be streaming)
	3. Use schema data to figure it out. i.e. if I have a
structure of n integers and each integer takes four bytes, then assume
that the structure ends after n*4 bytes. (This can be fragile...)

On determining well-formedness:
	It all depends. The range of solutions is probably too large
to iterate through in something short of a text book.

> it wasn't until agreement was reached on 8 bit
> bytes that a lot of processor design could take off
	I think you have just alienated all the PDP-8, 10, and 20
programmers who thought that 6 bits was just fine. You can still run
across ex-PDP/8 assembly programmers who will wax on poetic about the
beauty of that instruction set...

		bob wyman


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS