[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] text to xml conversion
- From: Rick Jelliffe <rjelliffe@allette.com.au>
- To: ycao5@scs.carleton.ca
- Date: Tue, 02 Jun 2009 15:12:10 +1000
ycao5@scs.carleton.ca wrote:
>
> Hello everyone,
>
> I want to ask one question about covering text to xml file. Is
> there any way to attach a schema to a text document and parse it into
> xml according the rules defined in the schema? Can I find such kind a
> tool, otherwise I plan to write one myself. Please give me some
> references. Thanks.
There is one called SP, which is open source from James Clark.
It is parses data files using SGML configuration files and schema, and
is suitable when that file contains Wiki kinds of markup or CSV or
other formats with explicit delimiters, but not so much for more
free-form data. It is probably only worth using if you will have to do
this kind of things many times.
See http://www.xml.com/lpt/a/1377 for an overview of this approach. SP
is industrial strength.
You could convert your XML Schema to an XML DTD, then decorate it with
information to make it an SGML DTD to say:
1) Which delimiters in your text should be substituted for which tags
2) In which contexts this recognition takes place
3) Which tags won't have corresponding delimiters in your file and are
allowed to be implied
The output is XML. SGML has many gotchas for new players, but if you
aleady know HTML and XML and DTDs or XSD, then they will be much easier
to cope with (SGML, XML's precursor, got a bad rep because people needed
to learn the equivalent to XML + HTML bits + this kind of text parsing
system all at the same time.)
I also made a some software that wasn't based on grammars for doing this
task: it was called Psyche in Java and Micah Dubinko also made an
implementation of it (for .NET?) but we never released them. If there is
interest I could drag it out again: it also requires delimiters.
Cheers
Rick Jelliffe
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]