XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] Schema based XML compare

Interesting ideas.   
What I have in mind is probably closer to deep-equal ... but could run on a non-schema aware processor.
I happen to have  a StAX based compare program which runs in streaming mode and handles blankspace or not as a global option.

I was thinking of "simply" keeping track of the XSTypeDefnition for each node as it encounters it and replace the current string-compare 
with a data type compare.

I did not consider something like

================================
the element sequence
<x>hello</x>

would be schema equivalent to

<x>hello</x>
<x>hello</x>
<x>hello</x>
============================

Which actually I don’t think follows ... 
If I went that far all you would have to do is validate the 2 documents against a schema and not bother comparing them.

For my use cases I would NOT consider the above to be equivilent.
By 'schema equivilent'   I mean the document *instances* are equivilent but the data comparison of text values uses schema type information 
so say a xs:double  "6.0"  == "6"   but not if it were an xs:string





----------------------------------------
David A. Lee
dlee@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Mukul Gandhi [mailto:gandhi.mukul@gmail.com] 
Sent: Thursday, December 23, 2010 11:59 PM
To: David Lee
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Schema based XML compare


Hi David,
    I believe, Mike's idea to use XPath 2.x deep-equal function would
be useful if you consider true deep-equality (i.e same number of child
nodes, siblings etc at equivalent locations, in XML documents that
you're comparing) of XML trees as notion of XML documents equality.

It also seems that deep-equal function doesn't allow a configuration
to minus the effect of white-spaces in XML document equality. But I
believe, this concern has different repercussions on data oriented and
document oriented XML documents. It seems white-spaces would be
significant for document oriented XML but not for data oriented XML.

All of above is not to say that deep-equal method is not useful. It's
very useful for lots of use cases.

If your notion of XML documents equality is purely XML Schema aware
(the below schema example [1] is an example for this) (and not
strictly equivalent in XPath 2.x deep-equal sense) then you could
explore using something like the JAXP Schema validation API to derive
this equivalence.

[1]
For this particular schema fragment,
<xs:element name="x" type="xs:string" maxOccurs="unbounded" />

the element sequence
<x>hello</x>

would be schema equivalent to

<x>hello</x>
<x>hello</x>
<x>hello</x>

And I'm not sure if your use-case considers the above two XML
fragments equivalent given the above XML schema element declaration
[1].

On Thu, Dec 23, 2010 at 5:59 PM, David Lee <dlee@calldei.com> wrote:
> I've run into an age-old issue but I don’t see any off-the-shelf solutions
> for.
>
>
>
> Suppose I have 2 XML documents I want to compare (not diff, just give me
> yes/no are they equivalent).
>
> This is pretty simple to do even with things like ignoring whitespace
> options etc.  Many tools out there, including one I wrote
>
> ( http://www.xmlsh.org/CommandXcmp)
>
>
>
> Now here's the twist …
>
>
>
> Suppose I want to compare for XSD  data model equivalence, not XDM
>  equivalence ?
>
>
>
> Example.
>
>
>
> <number>1.0</number>
>
> vs.
>
> <number>1</number>
>
>
>
> Without type annotation these are different.
> But if I declare the type for number to  be xs:double
>
> they should compare equal.
>
>
>
> Thus a compare tool should be able to be given a schema and do a comparison
> and report that these 2 documents are equivalent at the XSD data model
> level.
>
>
>
> Has anyone seen anything like this ?
>
> Would anyone have a use for it ? (I may end up writing it for my own uses).
>
>
>
> Not sure how far one can take this before entering murky waters …
>
> Even in the numeric cases there are edge cases where comparisons are not
> well defined (rounding/precision issues on floating point numbers).
>
> Then add in things like date/times …
>
> But suppose I'm willing to avoid the murky edges and just stick to the
> obvious cases … shouldn’t be too hard right ?
> In fact I suspect its so obvious its been done but I can't find one
> anywhere.
>
>
>
> -David

>
>
>
>
> ----------------------------------------
>
> David A. Lee
>
> dlee@calldei.com
>
> http://www.xmlsh.org




-- 
Regards,
Mukul Gandhi



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS