Hi, Daniel.
There are probably many ways to do this.
One way, perhaps a little crude, would be to use a text/macro editor to
process the files in batch mode first. I've often used Vedit for these
sorts of things. (http://www.vedit.com) The program doesn't do these
kinds of batch things out of the box. You'd have to write a macro, but
the macro language is easy to work with. Of course, you could also do
the same thing in Perl or another scripting language.
Once you've extracted the text you want to each file, the conversion to
XML is another matter. That would depend on *which* XML you mean, i.e.,
what DTD, what sort of text, what are the mapping rules you want to use
and how do you want to tag the resulting XML output. You could continue
to use the text editor for this sort of thing, or if you want a more
"official" method, use XSLT to do the transform.
Hope this helps.
Not sure your level of programming expertise. If you need any more info
(and nobody else on the list comes up with any better answers), I'd be
glad to help with any small scripts/macros. I don't know Perl very well,
but I probably have some Vedit and/or VBScripts floating around
somewhere that could do the job.
- Mark Novembrino
-----Original Message-----
From: Daniel Gresh [mailto:dgresh@lle.rochester.edu]
Sent: Thursday, July 13, 2006 1:12 PM
To: xml-dev@lists.xml.org
Subject: [xml-dev] Copying text from a source, then converting to XML
I have a question about this. Some of the question may not
pertain to XML, but if anyone knows a method, that'd be great.
So, I basically want to automatically search a large number
of documents for certain keywords. When I find that keyword,
I want the paragraph the keyword is in, not the page, to be
copied and pasted somewhere. After that, I want to convert
the pasted text to XML.
Does anyone know a method for doing either of these tasks?
Copying certain paragraphs or substrings of text that have
certain phrases in them, then converting to XML? Perhaps
there is a script of some sort? Or a free program?
Any help would be appreciated.
-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org
<http://www.xml.org>, an initiative of OASIS
<http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription
manager: <http://www.oasis-open.org/mlmanage/index.php>