XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] text to xml conversion

ycao5@scs.carleton.ca wrote:
>
> Hello everyone,
>
>     I want to ask one question about covering text to xml file. Is 
> there any way to attach a schema to a text document and parse it into 
> xml according the rules defined in the schema? Can I find such kind a 
> tool, otherwise I plan to write one myself. Please give me some 
> references. Thanks.
There is one called SP, which is open source from James Clark.

It is parses data files using SGML configuration files and schema, and 
is suitable when that file contains Wiki kinds of markup  or CSV or 
other formats with explicit delimiters, but not so much for more 
free-form data. It is probably only worth using if you will have to do 
this kind of things many times.

See http://www.xml.com/lpt/a/1377   for an overview of this approach. SP 
is industrial strength.

You could convert your XML Schema to an XML DTD, then decorate it with 
information to make it an SGML DTD to say:

 1) Which delimiters in your text should be substituted for which tags
 2) In which contexts this recognition takes place
 3) Which tags won't have corresponding delimiters in your file and are 
allowed to be implied

The output is XML. SGML has many gotchas for new players, but if you 
aleady know HTML and XML and DTDs or XSD, then they will be much easier 
to cope with (SGML, XML's precursor, got a bad rep because people needed 
to learn the equivalent to XML + HTML bits + this kind of text parsing 
system all  at the same time.)

I also made a some software that wasn't based on grammars for doing this 
task: it was called Psyche in Java and Micah Dubinko also made an 
implementation of it (for .NET?) but we never released them. If there is 
interest I could drag it out again: it also requires delimiters.

Cheers
Rick Jelliffe


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS