OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Doing large scale XML processing/transformation?

On Sun, Aug 26, 2001 at 12:58:39AM +0200,  wrote:
> I have a large amount (120GB) of data stored in various home made SGML formats
> and a few non SGML formats. I need a way of transforming these into multiple
> XML documents conforming to different DTD's.
> Doing these transformations takes quite some time so I'm investegating various
> methods of doing this while still keeping the investements at a reasonable
> level. I have this idea of a way to solve the problem:
> I imagine doing it in distributed fashion along the lines of what Seti@home 
> and distributed.net are doing, but without the community bit :-) I would then
> setup a "cluster" of cheap off the shelf PC's propably running Linux and let
> them work through my data. Another very important thing is some method of 
> reusing code in order to keep the development time and expenses minimal. I
> think that splitting the conversion job into multipe steps and then letting
> each step be an individual program. I think the idea of Unix style pipes is
> the best analogy to what I'm thinking of. This would allow me to taylor a line
> of conversion steps suitable for the individual source and target formats.

Ooops I missed parts of the message here is the rest, sorry :-)

<rest of message>
This would allow me to tailor a line of conversion steps suitable for the 
individual source and target formats reusing the most common bits of code.
The language is pretty much irrelevant although I would prefer something more
like Perl or Python than C or C++, but thats just for ease of development :-)

Does anyone know a tool like this? I could start doing this myself, but I would
rather prefer an already finished product so I could get working on the 
conversions right away. 
Most of the sites I have been looking at seem pretty keen on XSLT, but I kind
of have this feeling that I need more than just XSLT for this job ;-)
</rest of message>

Martin Skøtt