Lists Home |
Date Index |
On 3/15/06, Peter Hunsberger <email@example.com> wrote:
> On 3/14/06, Oleg A. Paraschenko <firstname.lastname@example.org> wrote:
> Well, yes, but XML isn't always a tree. If you're going to manage the
> XML by considering only simple element name, then yes, you may have a
> tree. However, if you've got to consider things like name spaces then
> you've may actually have a graph.
> I haven't given it much thought, but it seems that even the general
> case of merging non-structurally identical trees is NP complete.
> Consider the fact that identically named nodes may lie on different
> branches. The only way to evaluate whether two such nodes can in fact
> be merged is to look at the relationship of such nodes to every other
> node on the tree and decide if they match (whatever "match" might mean
> in this case). As such, this looks to be a classic travelling
> salesman traversal...
On the way into work this morning I was thinking about refactoring a
couple of tree parsers I had built. For some reason this discussion
sprang to mind and I realized that my answer might be misleading: for
many simple cases this is a solvable problem. Specifically, as long
as you are matching _only_ on simple element names then you should
never have any cycles. Top down decomposition into lists of lists
(breadth first) can then be compared in finite time. Essentially,
it's the same problem as canonical XML compare. This is limited since
if there is no root level structural similarity then the merge fails
completely, but (not giving this a whole lot of thought) I believe
should work for all cases where the two instances share the same
schema (though many schema would produce rather useless results, eg.
So, the question comes back to what to do when there are no root
level structural similarities (ignoring simple aliases for the root
element) but there are element name matches at other points in the two
trees? Are you attempting to handle such cases?
As a side note: this relates back to the thread on XML schema design
and the use of generic element names with attribute distinguishers vs.
unique element names. It's one more data point that suggests to me
that completely generic Schema (such as RDF) are not the way to go.