xml-dev - Re: [xml-dev] Preserving the structure of the XML file

Re: [xml-dev] Preserving the structure of the XML file

[ Lists Home | Date Index | Thread Index ]

To: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: [xml-dev] Preserving the structure of the XML file
From: Oleg Dulin <oleg.dulin@opence.net>
Date: Mon, 13 Oct 2003 12:26:12 -0400
Cc: "'xml-dev@lists.xml.org '" <xml-dev@lists.xml.org>
In-reply-to: <5.2.0.9.2.20031010122452.02a07ec8@serrano.hesketh.net>
References: <52CD2DDFC7DBF440BB9F3AC7A0F7181E05BF56@www.bdgsa.net> <52CD2DDFC7DBF440BB9F3AC7A0F7181E05BF56@www.bdgsa.net> <5.2.0.9.2.20031010122452.02a07ec8@serrano.hesketh.net>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5) Gecko/20030916

Simon:

Thank you for your response.

I've experimented with your Ripper a bit. It appears to handle what we 
need. I did notice a bug, though: it appears to stop parsing when it 
encounters a PI without any data. For instance:

<?foo ?> breaks Ripper, while <?foo bar?> is ok

Do you know of any other outstanding issues ?

Ideally, what we need is a parser like RIpper that can capture the 
events into a tree-like structure. I looked at MOE but it appears a lot 
older than Ripper itself. Is there any active work being done on MOE ?

There appears to be another XML parsing technique that appears to 
preserve a lot more  information than SAX  -- it is XNI in Xerces. Of 
course, it is not nearly as complete as Ripper but it is more detailed 
than SAX and is actively used by Xerces. Have you ever evaluated XNI API 
for the purpose of "half-parsing" ? What is your opinion ?

Regards,
Oleg Dulin
Opence, Inc.





Simon St.Laurent wrote:

> At 12:01 PM 10/10/2003 -0400, Oleg Dulin wrote:
>
>> Does anyone know if there are less-lossy XML parsers and serializers 
>> that can capture and reproduce the structure of the input XML file 
>> including tabulation,whitespace, etc. ? We would love to know about
>> experiences with parse/serialization approaches that have a greater 
>> infoset than that provided by SAX and DOM, especially related to 
>> ignorable whitespace and attributes ordering/whitespace.  We are 
>> editing XML and want to preserve the file as much as possible.
>
>
> I've written what I call a half-parser, available (in Java) as part of 
> my Gorille project. It reports every character in the document and 
> stays away (for now) from entity expansion, attribute defaulting, and 
> other infoset excitement. It also has a context object which makes it 
> easier to handle issues like entity values and namespaces.
>
> Gorille is at:
> http://simonstl.com/projects/gorille/
>
> Details on Ripper's API, which should give you a good idea what's 
> included, are at:
> http://simonstl.com/projects/gorille/docs/com/simonstl/gorille/DocProcI.html 
>
> http://simonstl.com/projects/gorille/docs/com/simonstl/gorille/ContextI.html 
>
> http://simonstl.com/projects/gorille/docs/com/simonstl/gorille/Ripper.html 
>
>
> A paper explaining this more thoroughly is at:
> http://www.mulberrytech.com/Extreme/Proceedings/html/2003/StLaurent01/EML2003StLaurent01.html 
>
>
> A presentation on it in English is at:
> http://simonstl.com/articles/halfparse/
>
> A presentation on it in Playmobil (requires SMIL, in RealPlayer One) 
> is at:
> http://simonstl.com/articles/halfparse-smil/
>
> I'm planning a lot more work surrounding this parser, but have a 
> painfully serious shortage of time at the moment.  There should be a 
> lot more in 2004.
>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>

Follow-Ups:
- Re: [xml-dev] Preserving the structure of the XML file
  - From: "Simon St.Laurent" <simonstl@simonstl.com>

References:
- FW: WWW, SW and the Chaos Theory (was RE: [xml-dev] Beyond Ontologies)
  - From: Sergio Rodriguez <srodriguez@bdgsa.net>
- Re: [xml-dev] Preserving the structure of the XML file
  - From: "Simon St.Laurent" <simonstl@simonstl.com>

Prev by Date: RE: [xml-dev] Creating a Complex System using XSLT. Step 1: Create Feedback
Next by Date: Re: [xml-dev] Preserving the structure of the XML file
Previous by thread: Re: [xml-dev] Preserving the structure of the XML file
Next by thread: Re: [xml-dev] Preserving the structure of the XML file
Index(es):
- Date
- Thread