xml-dev - RE: [xml-dev] embedded html

RE: [xml-dev] embedded html

[ Lists Home | Date Index | Thread Index ]

To: julian.reschke@gmx.de, xml-dev@lists.xml.org
Subject: RE: [xml-dev] embedded html
From: Bart.Boogaerts@ASQ.BE
Date: Fri, 21 Dec 2001 16:32:09 +0100

Title: embedded html

hey,

i tried to do what you said but it doesn't seem to work...What am i still doing wrong?

the code just takes the first tags and the rest of html code in the text is not shown, it is omitted.

so when parsing string: "<root>tested before link <SITELINK><a href='test.html'>tests</a></SITELINK>after link</root>";

we get <root>tested before link after link</root>

what should i do to get the whole thing parsed?

source code:

// Java imports
import java.io.File;
import java.io.FileInputStream;
import java.io.*;
import java.io.IOException;

// JDOM imports
import org.jdom.Document;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import org.jdom.Element;

public class SAXtoJDOM {

public Document convert() throws JDOMException, IOException {

    String input="<root>tested before link <SITELINK><a href='javascript:void(0);'>tests</a></SITELINK>after link</root>";
    byte[] bytes = input.getBytes();
    ByteArrayInputStream bais = new ByteArrayInputStream(bytes);

    SAXBuilder builder = new SAXBuilder();
    Document doc = builder.build(bais);

      return doc;
    }

    public static void main(String[] args) {

  try {

            SAXtoJDOM tester = new SAXtoJDOM();
            Document doc = tester.convert();

            // Output the document to System.out
            XMLOutputter outputter = new XMLOutputter();
            outputter.output(doc, System.out);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

regards, bart

-----Original Message-----
From: Julian Reschke [mailto:julian.reschke@gmx.de]
Sent: Thursday, December 20, 2001 10:48 AM
To: Bart.Boogaerts@ASQ.BE; xml-dev@lists.xml.org
Subject: RE: [xml-dev] embedded html

Bart,

if you wan't your XML processor to treat the HTML as markup, not as text, you'll have to *tell* it that it's markup. So,

a) wrap the text into a container tag like <p>

b) parse the string into a JDOM

c) move all child nodes to your BODYTEXT element.

-----Original Message-----
From: Bart.Boogaerts@ASQ.BE [mailto:Bart.Boogaerts@ASQ.BE]
Sent: Thursday, December 20, 2001 11:36 AM
To: xml-dev@lists.xml.org
Subject: RE: [xml-dev] embedded html

Forgot to tell that i was using xmlOutputter from org.jdom

and that in our dtd, these <a> elements are declared, so no validation faults are given...

we just add all elements to the document structure, except this one (included in the text) but is a valid xml tag...

thanks, bart

-----Original Message-----
From: Bart.Boogaerts@ASQ.BE [mailto:Bart.Boogaerts@ASQ.BE]
Sent: Thursday, December 20, 2001 10:14 AM
To: xml-dev@lists.xml.org
Subject: [xml-dev] embedded html

hey,

on one of my projects, i'm working with Jdom (old version jan. 2000) to parse xml (when validating)
If i want to use html tags between xml tags, the tags are replaced by "<" or ">"...

example:

<BODYTEXT>this is just text with a link<a href="link.html">text for link</a>text goes on</BODYTEXT>

the text between the bodytext is just text, coming right out of the database. When getting the text from the database, everything is just fine but when adding this text to the document (while constructing the doc) and parsing it we get the "<" signs replaced by "<" etc.

How do i avoid the replacing of these tags without having to extract the link data of the text and putting it between xml tags?

regards,bart

Prev by Date: RE: [xml-dev] Why would MS want to make XML break on UNIX, Perl, Python, etc ?
Next by Date: Re: [xml-dev] Some comments on the 1.1 draft
Previous by thread: RE: [xml-dev] embedded html
Next by thread: RE: [xml-dev] Lexical vs value spaces (re: Binary content and allowed characters in XML)
Index(es):
- Date
- Thread