OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Profiling, diff and change tracking best practices?

On Thu, Oct 1, 2009 at 4:40 PM, michael odling-smee
<mike.odlingsmee@gmail.com> wrote:
> Funnily enough I have just started thinking about this for my own project
> with a similar use-case - i.e. understanding the changes between two
> different baselines of an XML document or XML document set.

Great to hear that - I was expecting just that - it is a common
fallacy in the computer world that developers do reinvent the wheel,
while all you need to do is a bit of google-fu and creative
> My high-level thoughts so far are:
> 1.] Add suitable meta-data attributes (e.g. version/create and modify
> date/author) to fairly coarse grained components within the XML data model.

On a bit lower level, have you already though what would be a
complete-enough set of metadata that fits your requirements? I have
tried to follow the Dublin Core model, but it might be overly complex
for your purposes...
- Show quoted text -
> 2.] Create a baseline of the document or set of XML documents set by:
> 2.1] Creating a fairly light weight XML file (perhaps using XSLT) that only
> contains this meta-data. Save this to disk (i.e. create a memento of the
> meta-data)
> 2.2] Saving a copy of the original XML in a version control system/file
> system where it will not be edited further.
> 3.] Later on when trying to do a diff. between the original baseline and
> current:
> 3.1] Using the same mechanism as in step 2.1 create a new memento of the
> current XML document or set of XML documents
> 4.] Compare the two mementos reporting on changes - if required the baseline
> copy of the XML can be used to compute exactly what content has changed (I
> think you need add/delete and update) between the two versions.
> I am still undecided whether both the memento and document copy are required
> - logically the memento is not actually required. However the lightweight
> memento may prove useful if:
> The XML document or set of documents is very large such that it would not be
> desirable to store a complete copy of the document(s).
> To aid with deep differencing optimisation (especially relevant where there
> is a set of XML documents that you are comparing so you only have to parse
> files where differences occur).
> The diff. report is only meant to identify where differences are not what
> they are.
> Anyway I have only had early thoughts on the subject so would glady listen
> to any other suggestions that the community has to offer.

Sounds like a neat approach, but just like you, my initial feeling is
that separation of the metadata is an awkward thing to do indeed and
might make processing a bit too complex - after all to create a simple
delta document, you would need to compare the two mementos then go
back to the original files and locate the changes, I agree that it
might be necessary when dealing with large documents, but in such
cases, I suppose you could aplly stream processing like SAX instead,
especially for comparing things...

I don't know if that's the case in your environment, but in my
scenario, the raw XML is going to be maintained by people, so I am
striving for simplicity. The separation of metadata, like you propose
might mean a bit more complex processing, but the XML that people see,
could in effect be more managable, so I'll certainly have a think
about it...


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS