[
Lists Home |
Date Index |
Thread Index
]
Tim asked me to send this e-mail to the list as well:
----- Original Message -----
From: "Tim Bray" <tbray@textuality.com>
To: <xml-dev@lists.xml.org>
Sent: Tuesday, January 20, 2004 1:49 AM
Subject: [xml-dev] Genx
> At one point during the discussions about Atom and well-formedness, I
> offered to cook up some libraries for safely and efficiently writing
> guaranteed-well-formed XML; it seems that the world is well-provided
> with these for Java, but several people wrote me saying such a thing
> would be really handy at the C level; libxml2 being OK but too big and
> complicated for most people's purposes in this particular regard and
> also suspected of a memory leak.
>
> So I sketched out a design, see the write-up at
> http://www.tbray.org/ongoing/When/200x/2004/01/19/HeresGenx and have at
> it. -Tim
Since I have dealt with a similar issue (on another platform),
I checked your API. My comments are not necessarily meant
to be criticizing the API, I am just highlighting things
I did differently or I don't understand (in chaotic order):
(I left out anything I happened to see discussed already)
- About genxSetAllocator:
I would think a Free and Realloc pointer should be provided too.
- Does the API require that all input is checked for
UTF-8 (or UTF-32) encoding correctness (and Name correctness)
even if that were already know to the application?
- Providing the extra functions to do that might be enough?
- About genPI:
Why not have two arguments, let's call them target and data?
The first one - target - is a name, the other one isn't, so
different checking rules apply.
- Why not provide for escaping (of illegal input) by default?
If you want to write a comment or attribute, the assumption should
be that you want to write it correctly - so why not do the
necessary escaping for the programmer?
Different rules apply for comments, attributes, character data
(and entity values - but there is no API for them).
- Allow choice of quote character for attributes, allow (three)
options for ampersand escaping: none, numeric char ref, amp entity
- genxEndStartTag is not really necessary, as this can be
derived from internal state tracking.
- Add Formatting features:
- Style: none or indented. Can be chosen for each nesting level
dynamically. E.g. simple bottom elements with character data
might have no indentation, but their parent elements might have it.
- Set indentation amount
- Set indentation (starting) level (e.g. for writing heavily indented fragments)
- genxNewLineIndented function - to add a new line and whitespace
according to level
- For this purpose one might need some genxStartContent(formattingStyle)
function, to be called whenever the attributes are finished (if any),
so that formatting is determined at this level (downwards) until
overridden at a lower level
- genxEndTag: add shortForm (boolean) argument, so that the element,
if empty, can be written as <element/>
- Add genxStart/EndCDATA functions, and allow for un-escaped output
in between
- About genxCharacter: does this write a character reference?
- About genxCheckName: should differentiate if namespaces are
turned on or not, as a colon is legal in one case, but not the other.
- Should one support DTD output (declarations)?
For instance for writing an internal subset.
- What about a genxXML function that writes the XML declaration?
- How to declare a default namespace? One cannot pass an empty prefix
as this means one will be created.
- Are namespace declarations tracked internally?
What about an API for:
- find most recently declared prefix currently active for a given URI
- find all prefixes active for a given URI
Regards,
Karl
- References:
- Genx
- From: Tim Bray <tbray@textuality.com>
|