OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Differentiating two xml files of size each around500MB.

[ Lists Home | Date Index | Thread Index ]


In reply to your query, you will find some details of how DeltaXML 
can deal with large files at 
http://www.deltaxml.com/comparing-large-files.html The file sizes 
tested are up to 60Mb. However, you should be able to process larger 
files on a larger system but you would need to evaluate this on your 

If the DeltaXML output is not what you need, then you can easily 
convert it with XSL to what you do need (some XSL processors will 
deal with large files, but the delta file should be small anyway, 
because DeltaXML does not include unchanged data in the delta file 
unless you specifically ask for this).

We are not aware of any other system that will deal with such large 
files, though the following reference may also be useful to you:

"Detecting Changes in XML documents" - academic paper:

This is for presentation at ICDE 2002: Feb 26 - Mar 01, San Jose, 
California - http://www.research.telcordia.com/society/icde2002/

This is research work though, so it will not solve your problem 
today. It also needs mapping files to relate the delta file to the 
originals so you do not get the output you need.

Best regards,

At 10:40 am +0530 23/3/02, Santoshwt wrote:
>We need to differentiate two xml files for their contents.
>The structure of both files will be something similar like -
>	<element1>content</element1>
>	<element2>content</element2>
>	<element3>content</element3>
>	<element1>content</element1>
>	<element2>content</element2>
>	<element3>content</element3>
>We want to result the difference in the form of segments that are changed,
>or added in new file or deleted from old file.

DeltaXML will give you exactly what you need here.

>The file sizes are going to be very hugh. say around 500MB or so per file.
>If anyone could throw some light on how to go about it, if any 
>tool's available,
>what will be memory requirements or any other information,
>kindly reply to my Id please.
>I've seen deltaxml tool, but our file sizes are massive, and also 
>the output of deltaxml is it's own format.
>It's highly urgent.
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>

-- -----------------------------------------------------------------
Robin La Fontaine, Director, Monsell EDM Ltd
DeltaXML: "Change control for XML in XML"
Tel: +44 1684 592 144 Fax: +44 1684 594 504
Email: robin.lafontaine@deltaxml.com      http://www.deltaxml.com


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS