[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] How to handle "newline" characters in an XML parser.
- From: Liam Quin <liam@w3.org>
- To: Redefined Horizons <redefined.horizons@gmail.com>
- Date: Tue, 5 Dec 2006 14:41:02 -0500
On Tue, Dec 05, 2006 at 11:24:55AM -0800, Redefined Horizons wrote:
> I'm nearing the completion of an open source XML parser in Java. (It's
> an event-based, pull parser.)
why? do we need more parsers? :-)
[...]
> I'm having some trouble figuring out how to handle "newline"
> characters in XML text files on different platforms. I typically
> ignore all whitespace in the parser, but I wanted to count newline
> characters to aid in errror reporting.
You can't ignore whitespace, you have to return it to the application,
except when it's explicitly ignorable because a DTD says so, or when
it's e.g. inside a tag matching the S production.
> I've taken a look at the XML specs, but didn't completely understand
> what they had to say about newline characters.
Can you ask a more specific question? Are you asking when normalization
happens? By newline do you mean the character at Unicode code point 10?
Remember that the spaces inside the desc element in:
<desc>his socks were <em>very</em> <pattern>argyle</pattern>.</desc>
are all important, including the one between </em> and <pattern>.
For error reporting, line counting depends on the platform, and
should probably correspond to using a native text editor on that
platform -- as that's what users will have to use when they
get an error.
Liam
--
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]