OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   XML-appropriate editing data structures

[ Lists Home | Date Index | Thread Index ]

Recent criticisms of some Eclipse-based XML editors (including mine) (in 
part) because they use a lot of memory relative to file size underline 
the fairly obvious fact that XML files are often much larger than 
programming language files. When the techniques used successfully for 
programming languages are applied to XML, they can break down.

The first person I ever saw address this issue directly was Bryan Ford, 
in his packrat parsing paper 
(http://www.brynosaurus.com/pub/lang/packrat-icfp02.pdf). Packrat 
parsing requires an O(n), where n is the document size, data structure 
with a rather large constant factor. Ford observes "For example, for 
parsing XML streams, which have a fairly simple structure but often 
encode large amounts of relatively flat, machine-generated data, the 
power and flexibility of packrat parsing is not needed and its storage 
cost would not be justified."

However, the expectations of a modern XML editor are set by the features 
of modern programming language editors:

1) Syntax coloring. Coloring implies context (the string 'abc' is 
colored differently if it is an attribute name vs. attribute value vs. 
element name vs. PI name, etc.); context implies parsing. Coloring is 
particularly demanding in that it must be done in real time in the 
foreground while the user is editing after each user action and before 
characters are echoed to the display.

2) Outline view. Every practical XML editor offers both a text and an 
outline view; some allow editing of both views and most allow the views 
to be seen simultaneously, which in practice means one view must catch 
up to the other after a relatively brief delay. For XML, the outline 
view is essentially a DOM view with some node types possibly elided.

3) Content assist. Most commercial-quality XML editors derive content 
assist for element names, attribute names, element and attribute 
contents, entities, etc. from DTDs and/or schemas. This means that a) 
the DTD or schema must be parsed before any assistance is available, and 
b) the DTD or schema must be resolved to an in-memory data structure 
that drives assistance. This data structure is inherently O(g) where g 
is the grammar size; I have seen a number of them and I have yet to see 
one designed to be compact.

4) Validation. Much the same considerations apply as for content assist, 
with the additional constraint that validation is expected to be of very 
high quality. It is easy to come up with a data structure that could 
drive both validation and content assist, but it is very hard to write a 
decent validator (esp. for XML Schema) and another kind of problem to 
re-use the data structures of existing decent validators, most of which 
were not designed for external use, for code assist.

5) Graphical view. If the document under edit is a DTD or schema, a 
graphical view is often provided that shows the logical structure of the 
grammar (as opposed to that of the document). Editing the graphical view 
is often allowed, resulting in the need to update other open views (text 
or outline) of the same document. (Though, in fact, the graphical view 
is inherently a multi-document editor.)

6) Open definition, show references, refactor/rename. These are actions 
applied to a document, e.g., to an element name or definition, that 
suggest the need for a multi-document data structure that, at a minimum, 
exposes the knowable dependency relationships between documents (though 
one could brute-force search all known documents on demand, performance 
is likely to suffer). These relationships are often not manifest in a 
document under edit.

Each of these requirements can be addressed by a data structure and each 
of the data structures has an analog used by programming language 
editors. But if you poke under the covers of programming language 
editors you often find that memory overhead was not a major design 
factor, because most program language files are fairly small. 
Consequently a XML editor that uses the same techniques to address the 
requirements above will be judged 'not ready for prime time' when it is 
applied to extra-large (or exceptionally squirrely) documents, DTDs or 

If you think addressing these needs with no memory overhead is a trivial 
weekend project, feel free to show us your editor. In the meantime, I'd 
be happy to discuss implementation techniques that might make some or 
all of this faster/smaller all day long, on or off the list.

Bob Foster


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS