OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Truncation parsing <

[ Lists Home | Date Index | Thread Index ]

hi,

i can guess that you use SAX and you record input from characters()?

SAX parsers (such as Xerces) when parsing input "Shipping Address (< 31)"
will deliver actually 3 characters() events (add System.out.rointln()
to characters() to check it) and it _is_ application responsibility to gather whole
input - typical code in content handler that handles it look like this:

  StringBuffer buffer = new StringBuffer();

  public void startElement(String namespaceURI,
     String localName, String qualifiedName, Attributes atts) {
        buffer.clear();
    }

  public void characters(char[] text, int start, int length)
     throws SAXException {
        buffer.append(text, start, length);

    }

  public void endElement(String namespaceURI, String localName,
     String qualifiedName) {
        System.out.println("got="+buffer);
      }

(you can find more about it here:http://www.ibiblio.org/xml/books/xmljava/chapters/ch06s07.html)


alternatively you can use parser that will do this gathering transparently
(is there such beast?) or different API all together like XmlPull API (http://www.xmlpull.org)
that guarantees that TEXT event contains all element content, see also:
http://www.ibiblio.org/xml/books/xmljava/chapters/ch05s09.html

alek
ps. also you can write your own SAX content handler that will make sure to
all your content handler with gathered element content (or somebody has such
handler already written?)


Craig H Fry wrote:

> I am having a problem with truncation when trying to parse an element that
> contains <
> I can recreate the problem with & and > also
>
> Here is a sample of what I am parsing
>
> <element>Shipping Address (&lt; 31)</element>
>
> When parsed, all I get is 31)
>
> When I do println to see what is happening, it is actually parsing in what
> looks like 2 steps. Step one parses up to the &lt;.  It then parses the
> raming portion of the element, overwritting the value it just parsed.
> When parsing a normal element, it only does one step and everything turns
> out fine.
> And input would be appreciated,
>
> Craig Fry
> Programmer/QC
> Zenplex
>
> http://tambora.zenplex.org
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS