xml-dev - Re: [xml-dev] Preserving the structure of the XML file

Re: [xml-dev] Preserving the structure of the XML file

[ Lists Home | Date Index | Thread Index ]

To: Oleg Dulin <oleg.dulin@opence.net>, "'xml-dev@lists.xml.org '" <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Preserving the structure of the XML file
From: "Simon St.Laurent" <simonstl@simonstl.com>
Date: Fri, 10 Oct 2003 12:33:41 -0400
In-reply-to: <3F86D7F5.5000002@opence.net>
References: <52CD2DDFC7DBF440BB9F3AC7A0F7181E05BF56@www.bdgsa.net><52CD2DDFC7DBF440BB9F3AC7A0F7181E05BF56@www.bdgsa.net>

At 12:01 PM 10/10/2003 -0400, Oleg Dulin wrote:
>Does anyone know if there are less-lossy XML parsers and serializers that 
>can capture and reproduce the structure of the input XML file including 
>tabulation,whitespace, etc. ? We would love to know about
>experiences with parse/serialization approaches that have a greater 
>infoset than that provided by SAX and DOM, especially related to ignorable 
>whitespace and attributes ordering/whitespace.  We are editing XML and 
>want to preserve the file as much as possible.

I've written what I call a half-parser, available (in Java) as part of my 
Gorille project. It reports every character in the document and stays away 
(for now) from entity expansion, attribute defaulting, and other infoset 
excitement. It also has a context object which makes it easier to handle 
issues like entity values and namespaces.

Gorille is at:
http://simonstl.com/projects/gorille/

Details on Ripper's API, which should give you a good idea what's included, 
are at:
http://simonstl.com/projects/gorille/docs/com/simonstl/gorille/DocProcI.html
http://simonstl.com/projects/gorille/docs/com/simonstl/gorille/ContextI.html
http://simonstl.com/projects/gorille/docs/com/simonstl/gorille/Ripper.html

A paper explaining this more thoroughly is at:
http://www.mulberrytech.com/Extreme/Proceedings/html/2003/StLaurent01/EML2003StLaurent01.html

A presentation on it in English is at:
http://simonstl.com/articles/halfparse/

A presentation on it in Playmobil (requires SMIL, in RealPlayer One) is at:
http://simonstl.com/articles/halfparse-smil/

I'm planning a lot more work surrounding this parser, but have a painfully 
serious shortage of time at the moment.  There should be a lot more in 2004.

Follow-Ups:
- Re: [xml-dev] Preserving the structure of the XML file
  - From: Oleg Dulin <oleg.dulin@opence.net>

References:
- FW: WWW, SW and the Chaos Theory (was RE: [xml-dev] Beyond Ontologies)
  - From: Sergio Rodriguez <srodriguez@bdgsa.net>
- Preserving the structure of the XML file
  - From: Oleg Dulin <oleg.dulin@opence.net>

Prev by Date: RE: [xml-dev] fundamental facets - inquiry from the XML SchemaWorking Group
Next by Date: RE: [xml-dev] fundamental facets - inquiry from the XML SchemaWorking Group
Previous by thread: Preserving the structure of the XML file
Next by thread: Re: [xml-dev] Preserving the structure of the XML file
Index(es):
- Date
- Thread