Lists Home |
Date Index |
On 3/18/06, Oleg A. Paraschenko <firstname.lastname@example.org> wrote:
> > I haven't given it much thought, but it seems that even the general
> > case of merging non-structurally identical trees is NP complete.
> > Consider the fact that identically named nodes may lie on different
> > branches. The only way to evaluate whether two such nodes can in fact
> > be merged is to look at the relationship of such nodes to every other
> > node on the tree and decide if they match (whatever "match" might mean
> > in this case). As such, this looks to be a classic travelling
> > salesman traversal...
> Either we refer to different meanings of "join markup", either our
> approaches are very different.
> My algorithm is quite straightforward. I have two trees. One is "stable"
> (order and hierarchy of the elements isn't changes), another one is "weak"
> (the elements can be fused).
> I traverse the first tree, visiting the character data chunks which are
> continuous (only text, no tags) in the both trees. Each chunk is wrapped
> by its context from the second tree.
> An example.
> first tree: <x><b>some dummy</b> text</x>
> second tree: <x>some <i>dummy text</i></x>
> There are three chunks:
> "some "
> " text"
> The corresponding contexts:
> So the expected result is:
> <x><b><x>some </x><x><i>dummy</i></x></b><x><i> text</i></x></x>
> Fortunately, in many situations, it's very easy to make the code smart to
> simplify contexts. In this examples, simplified contexts are <empty>, "i",
> and "i", and the better result is:
> <x><b>some <i>dummy</i></b><i> text</i></x>
If I understand correctly you are matching only on text nodes and the
basic textual content is always the same between the two trees? If
so, then in general this cannot be completely defined, there could be
ambiguity in the resultant element matching.
Eg, (somewhat artificial):
This is a <ol><li><ul><li>problem.</li></ul></li></ol>
This is a <li class="bad">problem.</li>
However, I'd guess that in practice this works just fine most of the
time. Unfortunately, I don't see any utility for us...