RE: CDATA conversion issue

From: Michael Brennan <Michael_Brennan@allegis.com>

To: 'Lynda VanVleet' <lyndavv@earthlink.net>, xml-dev@lists.xml.org

Date: Wed, 15 Aug 2001 15:59:19 -0700

I haven't used Oracle's parser, and the last time I used the Forte 4GL parser was about 2 years ago, so I can't comment specifically on either of these. However, one thing you can investigate is whether either of these support the LexicalHandler interface (which is a SAX2 standard extension). If so, you can use SAX to parse the document and build the DOM yourself in response to SAX events. This is more work, but the LexicalHandler interface permits your application to be notified of CDATA sections. If neither of these support the LexicalHandler interface, then you may want to explore using another parser. Sun's Crimson (included in their JAXP distribution), Apache Xerces, and Aelfred all support this interface. Also, Microsoft's XML SDK version 3.0 and higher support this interface; if you are running your code on the Microsoft platform, this may be an option.

I would advise against trying to write your own XML parser. There are a number of hidden nuances that are not evident up front; writing an XML parser is not as trivial as it may appear, at first.

-----Original Message-----
From: Lynda VanVleet [mailto:lyndavv@earthlink.net]
Sent: Wednesday, August 15, 2001 3:11 PM
To: xml-dev@lists.xml.org
Subject: CDATA conversion issue

We need to allow people to view/edit XML messages in our application. When we import a document that has a CDATA section into an XML Parser the values are converted using the escape characters and created in the DOM as a text node. (We have tried this with 2 parsers, Sun Forte 4GL and Oracle's Java Parser) Thus, if the document is exported from the DOM, the CDATA section is gone, and the data now contains the escape characters.

Example Starting Doc

<aDoc><![CDATA[<aTag>Hello</aTag>]]></aDoc>

Example Ending Doc

<aDoc><aTag>Hello</aTag></aDoc>

I am sure this is the intended DOM behavior, but it clearly does not satisfy our needs.

Note: If we add a CDATA section programmatically using the DOM, the export is correct. This leads us to our only proposed solution so far which is to write our own code that parses the document and creates a DOM representation of the data not converting the CDATA sections. Hopefully this is not the only solution!

Here is the code we are using (This is using the Sun Forte 4GL but you should be able to read this for the logic.):

//Open a Test File
aFile:File=New;
aFile.SetLocalName('c:\\temp\\ainfile.xml');
aFile.open(sp_am_read);

//Import the document
aDocument:Document=New;
aDocument.ImportDocument(aFile);

//Create an output file
aOutFile:File=New;
aOutFile.SetLocalName('c:\\temp\\aoutfile.xml');
aOutFile.open(sp_am_write);

//Export the docuemnt
aDocument.ExportDocument(aoutFile);

//Close the files.
aFile.close();
aOutFile.close();

I remembered this sort of question occuring on the list in June and looked in the archives but the solution was encoding base64 and that won't work for me. What really scares me is in the archive thread Tim Bray says " I'm always happy to avoid using CDATA sections if I can."

Lynda VanVleet

Lynda Van Vleet
Software Design Engineer
lvanvleet@classiq.com
http://www.classiq.com/