OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] joining parallel markup

[ Lists Home | Date Index | Thread Index ]

Hello Peter,

sorry for delay.

On Wed, 15 Mar 2006 08:03:12 -0600
"Peter Hunsberger" <peter.hunsberger@gmail.com> wrote:

...
> > > >
> > > > <x><b>some <i>dummy</i></b><i> text</i></x>,
> > > >
> > > >  or something like it.
> > >
> > > If you look at what you are asking for in the most general case, I
> > > believe the answer is not really: you want to to do generalized
> > > merging of graphs (which is believed to be a  NP complete problem).
> >
> >  I didn't know it. But trees are much simplier than graphs, so the
> > problem also should be much simplier.
> 
> Well, yes, but XML isn't always a tree.  If you're going to manage the
> XML by  considering only simple element name, then yes, you may have a
> tree.  However, if you've got to consider things like name spaces then
> you've may actually have a graph.
> 
> I haven't given it much thought,  but it seems that even the general
> case of merging non-structurally identical trees is NP complete. 
> Consider the fact that identically named nodes may lie on different
> branches.  The only way to evaluate whether two such nodes can in fact
> be merged is to look at the relationship of such nodes to every other
> node on the tree and decide if they match (whatever "match" might mean
> in this case).  As such, this looks to be a classic travelling
> salesman traversal...

Either we refer to different meanings of "join markup", either our
approaches are very different.

My algorithm is quite straightforward. I have two trees. One is "stable"
(order and hierarchy of the elements isn't changes), another one is "weak"
(the elements can be fused).

I traverse the first tree, visiting the character data chunks which are
continuous (only text, no tags) in the both trees. Each chunk is wrapped
by its context from the second tree.

An example.

first tree: <x><b>some dummy</b> text</x>
second tree: <x>some <i>dummy text</i></x>

There are three chunks:

"some "
"dummy"
" text"

The corresponding contexts:

/x
/x/i
/x/i

So the expected result is:

<x><b><x>some </x><x><i>dummy</i></x></b><x><i> text</i></x></x>

Fortunately, in many situations, it's very easy to make the code smart to
simplify contexts. In this examples, simplified contexts are <empty>, "i",
and "i", and the better result is:

<x><b>some <i>dummy</i></b><i> text</i></x>

> 
> > > If your domain is limited and common (eg. XHTML ) then you might get
> > > lucky and find some existing tools.  Otherwise, I think you've got a
> > > good use case for rolling your own with XSLT...
> >
> > In fact, I've already draftly implemented it to support my upcoming
> > XTech paper on XSieve. Now I'm looking for other tools to compare
> > them. The best I found are LMNL and just-in-time trees, but they
> > provide another functionality.
> >
> 
> Can you tell us a bit about the implementation?  Is it for XHTML or
> more generalized? Do you handle name spaces? Do the trees have to have
> identical structures?

My code is used to join DocBook and a sort of XHTML, but the code is
generic. Namespaces are supported and not supported: the code just uses
full element names in the "prefix:name" form, but in 99.999% the result is
correct. The trees don't have to have the identical structure, the only
requirement is that the text content is the same. I'm going to release the
code in the nearest future.

> 
> The reason I ask is because we do some of this over several well
> defined cases (non-identical structure, but there are ways of
> identifying unique nodes) and as we move forward we're evolving more
> and more generic requirements that may eventually require us to try
> and evaluate structural equivalence.
> 
> --
> Peter Hunsberger
> 


-- 
Oleg Parashchenko  olpa@ http://xmlhack.ru/  XML news in Russian
http://uucode.com/blog/  Generative Programming, XML, TeX, Scheme




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS