XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] I think XML tools should handle XML files up to 2^64bytes in size

On Tue, 2018-11-13 at 13:32 +0000, Costello, Roger L. wrote:
> Hi Folks,
> 
> I think XML tools (e.g., XML parsers, schema validators, XSLT
> processors) should handle XML files up to 2^64 bytes in size.

Hmm. My own text retrieval system [1] limited files to 2^23 blocks of
64 bytes each (32 bits) -- it needed bytes rather than characters so it
could seek directly to any part of the file to extract a snippet for
showing results.  But that was on 32-bit systems; on a 64-bit system it
might be able to address 2^59 blocks. Many other systems at the time
duplicated the text internally, a strategy which lets you guarantee
some sort of integrity but massively increases the index size.

Since 64 bits lets you address more storage than most people can buy,
and vastly more than can be parsed linearly by most XML tools in any
reasonable amount of time, it's not a useful limit and not easily
testable.

> Why that number? Here's why:
> 
> The number 2^64 is:
[list of magical correspondances deleted]

You could choose any number and find lots of reasons to choose it.

Using 63 bits lets you use negative numbers as an offset from the end
of the file. The reason i used fewer bits for the text retrieval system
was first that storing only approximate locations in files meant
storing less information - a smaller index, faster to process - and
secondly that it let me use some of the bits in the address for flags,
again saving space in the index. Some systems store garbage collection
information inside address pointers. 2^47 bytes would still permit very
large files and would give 8 bits for another purpose and still not use
the sign bit, allowing negative offsets.

[...]

> The total number of IPv6 addresses generally given to a single LAN or
> subnet.

There are 39 books inthe Old Testament.
There are 3 * 9 = 27 books in the New Testament.
There are 2 * 7 = 14 books in the Apocrypha.
There are 1 * 4 = 4 Gospels.
So XML systems should support at least 4^39 bytes.


Liam

[1] lq-text was (is)  the open source version of nx-text, a commercial
package i wrote but that we never sold. Michael Sperberg-McQueen has
suggested a backronym of "Liquid Text" which i shall use if i ever do
another release.  https://www.holoweb.net/liam/lq-text

-- 
Liam Quin, https://www.holoweb.net/liam/cv/
Web slave for vintage clipart http://www.fromoldbooks.org/
Available for XML/Document/Information Architecture/
XSL/XQuery/Web/Text Processing/A11Y work & consulting.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS