[
Lists Home |
Date Index |
Thread Index
]
Title: embedded html
hey,
i
tried to do what you said but it doesn't seem to work...What am i still doing
wrong?
the
code just takes the first tags and the rest of html code in the text is not
shown, it is omitted.
so
when parsing string: "<root>tested before link <SITELINK><a
href='test.html'>tests</a></SITELINK>after
link</root>";
we get
<root>tested before link after link</root>
what
should i do to get the whole thing parsed?
source
code:
//
Java imports import java.io.File; import
java.io.FileInputStream; import java.io.*; import
java.io.IOException;
//
JDOM imports import org.jdom.Document; import
org.jdom.JDOMException; import org.jdom.input.SAXBuilder; import
org.jdom.output.XMLOutputter; import org.jdom.Element;
public
class SAXtoJDOM {
public
Document convert() throws JDOMException, IOException {
String input="<root>tested
before link <SITELINK><a
href='javascript:void(0);'>tests</a></SITELINK>after
link</root>"; byte[] bytes =
input.getBytes(); ByteArrayInputStream bais = new
ByteArrayInputStream(bytes);
SAXBuilder builder =
new SAXBuilder(); Document doc =
builder.build(bais); return
doc; }
public static void main(String[]
args) { try
{
SAXtoJDOM
tester = new
SAXtoJDOM();
Document doc = tester.convert();
// Output the document to
System.out
XMLOutputter outputter = new
XMLOutputter();
outputter.output(doc, System.out);
} catch (Exception e)
{
e.printStackTrace();
} } }
Bart,
if
you wan't your XML processor to treat the HTML as markup, not as text, you'll
have to *tell* it that it's markup. So,
a)
wrap the text into a container tag like <p>
b)
parse the string into a JDOM
c)
move all child nodes to your BODYTEXT element.
Forgot to tell that i was using xmlOutputter from
org.jdom
and that in our dtd, these <a> elements are
declared, so no validation faults are given...
we
just add all elements to the document structure, except this one
(included in the text) but is a valid xml tag...
thanks, bart
hey,
on one of my projects, i'm working with Jdom
(old version jan. 2000) to parse xml (when validating) If i want to use html tags between xml tags, the tags
are replaced by "<" or ">"...
example:
<BODYTEXT>this is just text with a
link<a href="link.html">text for link</a>text goes
on</BODYTEXT>
the text between the bodytext is just text,
coming right out of the database. When getting the text from the database,
everything is just fine but when adding this text to the document (while
constructing the doc) and parsing it we get the "<" signs replaced by
"<" etc.
How do i avoid the replacing of these tags
without having to extract the link data of the text and putting it between
xml tags?
regards,bart
|