xml-dev - Re: [xml-dev] Truncation parsing <

Re: [xml-dev] Truncation parsing <

[ Lists Home | Date Index | Thread Index ]

To: craig@zenplex.com
Subject: Re: [xml-dev] Truncation parsing <
From: Aleksander Slominski <aslom@cs.indiana.edu>
Date: Fri, 02 Aug 2002 14:01:39 -0400
Cc: xml-dev@lists.xml.org
References: <33501.172.16.0.181.1028300737.squirrel@mail.zenplex.com>

hi,

i can guess that you use SAX and you record input from characters()?

SAX parsers (such as Xerces) when parsing input "Shipping Address (&lt; 31)"
will deliver actually 3 characters() events (add System.out.rointln()
to characters() to check it) and it _is_ application responsibility to gather whole
input - typical code in content handler that handles it look like this:

  StringBuffer buffer = new StringBuffer();

  public void startElement(String namespaceURI,
     String localName, String qualifiedName, Attributes atts) {
        buffer.clear();
    }

  public void characters(char[] text, int start, int length)
     throws SAXException {
        buffer.append(text, start, length);

    }

  public void endElement(String namespaceURI, String localName,
     String qualifiedName) {
        System.out.println("got="+buffer);
      }

(you can find more about it here:http://www.ibiblio.org/xml/books/xmljava/chapters/ch06s07.html)


alternatively you can use parser that will do this gathering transparently
(is there such beast?) or different API all together like XmlPull API (http://www.xmlpull.org)
that guarantees that TEXT event contains all element content, see also:
http://www.ibiblio.org/xml/books/xmljava/chapters/ch05s09.html

alek
ps. also you can write your own SAX content handler that will make sure to
all your content handler with gathered element content (or somebody has such
handler already written?)


Craig H Fry wrote:

> I am having a problem with truncation when trying to parse an element that
> contains &lt;
> I can recreate the problem with &amp; and &gt; also
>
> Here is a sample of what I am parsing
>
> <element>Shipping Address (&lt; 31)</element>
>
> When parsed, all I get is 31)
>
> When I do println to see what is happening, it is actually parsing in what
> looks like 2 steps. Step one parses up to the &lt;.  It then parses the
> raming portion of the element, overwritting the value it just parsed.
> When parsing a normal element, it only does one step and everything turns
> out fine.
> And input would be appreciated,
>
> Craig Fry
> Programmer/QC
> Zenplex
>
> http://tambora.zenplex.org
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>

References:
- Truncation parsing <
  - From: "Craig H Fry" <craig@zenplex.com>

Prev by Date: Re: [xml-dev] Keeping ISO 8879 Alive (was RE: [xml-dev] Markupperspective not code)
Next by Date: Re: [xml-dev] Keeping ISO 8879 Alive (was RE: [xml-dev] Markup perspective not code)
Previous by thread: Re: [xml-dev] Truncation parsing <
Next by thread: Programming for Markup vs. Markup for Programming
Index(es):
- Date
- Thread