Lists Home |
Date Index |
Recent criticisms of some Eclipse-based XML editors (including mine) (in
part) because they use a lot of memory relative to file size underline
the fairly obvious fact that XML files are often much larger than
programming language files. When the techniques used successfully for
programming languages are applied to XML, they can break down.
The first person I ever saw address this issue directly was Bryan Ford,
in his packrat parsing paper
parsing requires an O(n), where n is the document size, data structure
with a rather large constant factor. Ford observes "For example, for
parsing XML streams, which have a fairly simple structure but often
encode large amounts of relatively flat, machine-generated data, the
power and flexibility of packrat parsing is not needed and its storage
cost would not be justified."
However, the expectations of a modern XML editor are set by the features
of modern programming language editors:
1) Syntax coloring. Coloring implies context (the string 'abc' is
colored differently if it is an attribute name vs. attribute value vs.
element name vs. PI name, etc.); context implies parsing. Coloring is
particularly demanding in that it must be done in real time in the
foreground while the user is editing after each user action and before
characters are echoed to the display.
2) Outline view. Every practical XML editor offers both a text and an
outline view; some allow editing of both views and most allow the views
to be seen simultaneously, which in practice means one view must catch
up to the other after a relatively brief delay. For XML, the outline
view is essentially a DOM view with some node types possibly elided.
3) Content assist. Most commercial-quality XML editors derive content
assist for element names, attribute names, element and attribute
contents, entities, etc. from DTDs and/or schemas. This means that a)
the DTD or schema must be parsed before any assistance is available, and
b) the DTD or schema must be resolved to an in-memory data structure
that drives assistance. This data structure is inherently O(g) where g
is the grammar size; I have seen a number of them and I have yet to see
one designed to be compact.
4) Validation. Much the same considerations apply as for content assist,
with the additional constraint that validation is expected to be of very
high quality. It is easy to come up with a data structure that could
drive both validation and content assist, but it is very hard to write a
decent validator (esp. for XML Schema) and another kind of problem to
re-use the data structures of existing decent validators, most of which
were not designed for external use, for code assist.
5) Graphical view. If the document under edit is a DTD or schema, a
graphical view is often provided that shows the logical structure of the
grammar (as opposed to that of the document). Editing the graphical view
is often allowed, resulting in the need to update other open views (text
or outline) of the same document. (Though, in fact, the graphical view
is inherently a multi-document editor.)
6) Open definition, show references, refactor/rename. These are actions
applied to a document, e.g., to an element name or definition, that
suggest the need for a multi-document data structure that, at a minimum,
exposes the knowable dependency relationships between documents (though
one could brute-force search all known documents on demand, performance
is likely to suffer). These relationships are often not manifest in a
document under edit.
Each of these requirements can be addressed by a data structure and each
of the data structures has an analog used by programming language
editors. But if you poke under the covers of programming language
editors you often find that memory overhead was not a major design
factor, because most program language files are fairly small.
Consequently a XML editor that uses the same techniques to address the
requirements above will be judged 'not ready for prime time' when it is
applied to extra-large (or exceptionally squirrely) documents, DTDs or
If you think addressing these needs with no memory overhead is a trivial
weekend project, feel free to show us your editor. In the meantime, I'd
be happy to discuss implementation techniques that might make some or
all of this faster/smaller all day long, on or off the list.