OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Doing large scale XML processing/transformation?

I have a large amount (120GB) of data stored in various home made SGML formats
and a few non SGML formats. I need a way of transforming these into multiple
XML documents conforming to different DTD's.
Doing these transformations takes quite some time so I'm investegating various
methods of doing this while still keeping the investements at a reasonable
level. I have this idea of a way to solve the problem:

I imagine doing it in distributed fashion along the lines of what Seti@home 
and distributed.net are doing, but without the community bit :-) I would then
setup a "cluster" of cheap off the shelf PC's propably running Linux and let
them work through my data. Another very important thing is some method of 
reusing code in order to keep the development time and expenses minimal. I
think that splitting the conversion job into multipe steps and then letting
each step be an individual program. I think the idea of Unix style pipes is
the best analogy to what I'm thinking of. This would allow me to taylor a line
of conversion steps suitable for the individual source and target formats.

Martin Skøtt