[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Doing large scale XML processing/transformation?
- From: Martin Skøtt <firstname.lastname@example.org>
- To: email@example.com
- Date: Sun, 26 Aug 2001 01:07:03 +0200
On Sun, Aug 26, 2001 at 12:58:39AM +0200, wrote:
> I have a large amount (120GB) of data stored in various home made SGML formats
> and a few non SGML formats. I need a way of transforming these into multiple
> XML documents conforming to different DTD's.
> Doing these transformations takes quite some time so I'm investegating various
> methods of doing this while still keeping the investements at a reasonable
> level. I have this idea of a way to solve the problem:
> I imagine doing it in distributed fashion along the lines of what Seti@home
> and distributed.net are doing, but without the community bit :-) I would then
> setup a "cluster" of cheap off the shelf PC's propably running Linux and let
> them work through my data. Another very important thing is some method of
> reusing code in order to keep the development time and expenses minimal. I
> think that splitting the conversion job into multipe steps and then letting
> each step be an individual program. I think the idea of Unix style pipes is
> the best analogy to what I'm thinking of. This would allow me to taylor a line
> of conversion steps suitable for the individual source and target formats.
Ooops I missed parts of the message here is the rest, sorry :-)
<rest of message>
This would allow me to tailor a line of conversion steps suitable for the
individual source and target formats reusing the most common bits of code.
The language is pretty much irrelevant although I would prefer something more
like Perl or Python than C or C++, but thats just for ease of development :-)
Does anyone know a tool like this? I could start doing this myself, but I would
rather prefer an already finished product so I could get working on the
conversions right away.
Most of the sites I have been looking at seem pretty keen on XSLT, but I kind
of have this feeling that I need more than just XSLT for this job ;-)
</rest of message>