Re: [xml-dev] Granularity

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Michael Sokolov <sokolov@ifactory.com>
To: Len Bullard <cbullard@hiwaay.net>
Date: Thu, 05 Jan 2012 21:21:47 -0500

Good question!

It depends on the content.

Most human-readable texts are broken down into conceptual units already; articles, sections, chapters, entries, etc. We try to pick one that will at least fill the screen with text, and then impose maximum size constraints based on the delivery channel's capacity. It's not just viewing though that informs the choice; search often figures into it as well. Ideally search results are 1-1 with viewable chunks; this leads to a natural, easily-grasped interface, and makes search implementation straightforward.

Sometimes texts (like novels) don't have natural breaks; in these cases search is less important, reading more so, and we just paginate according to the user's viewport size.

Other texts impose their own specific chunking requirements (enormous court documents; dictionaries where you can search entries, senses (within an entry) or quotations (within a sense)) that fight against the simple rules. In these cases we try to recast the problem in more familiar terms, sometimes chunking at multiple levels at once for search, but displaying using anchors or pagination within a larger chunk.

Machine to machine I think is informed by a different set of considerations: transaction boundaries, channel capacity again, ability to rollback and retry, etc. Basically a compromise between performance (large messages will tend to be more performant, up to memory limits), and robustness (small messages make a smaller crater when they fail).

As far as human-machine, it does also depend to a certain extent on the software. Word can handle much larger documents than in-browser editors, and features like autosave can mitigate the failure to save a large document, but generally speaking I'd say chunk size here is similar to the human-human piece. I do sometimes end up poking around in 50MB xml documents in emacs, sometimes even changing something, and it works fine, but I don't think that's a typical use case? I find that 100MB is pretty much the limit for that sort of thing.

-Mike

On 1/5/2012 7:14 PM, Len Bullard wrote:

8044FBBA608F4BAEACD54B9453165FD9@LenBullardPro" type="cite">

When building XML systems, how do you choose the best granularity for storing and retrieving fragments?

Machine to machine

Human to machine

Human to human

Part of the art is interpreting what branch and leaf combinations best give a role/user the most copacetic view. How do you choose? Does the user choose?

The proportion of XML consumed and emitted by machines or humans is not interesting,IME. The cost and type of the value-add of the humans consuming and emitting XML is. In documents, this is obvious. Granularity.

len

References:
- Granularity
  - From: "Len Bullard" <cbullard@hiwaay.net>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]