OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] How to handle "newline" characters in an XML parser.

On Tue, Dec 05, 2006 at 11:24:55AM -0800, Redefined Horizons wrote:
> I'm nearing the completion of an open source XML parser in Java. (It's
> an event-based, pull parser.)

why?  do we need more parsers? :-)

> I'm having some trouble figuring out how to handle "newline"
> characters in XML text files on different platforms. I typically
> ignore all whitespace in the parser, but I wanted to count newline
> characters to aid in errror reporting.

You can't ignore whitespace, you have to return it to the application,
except when it's explicitly ignorable because a DTD says so, or when
it's e.g. inside a tag matching the S production.

> I've taken a look at the XML specs, but didn't completely understand
> what they had to say about newline characters.

Can you ask a more specific question?  Are you asking when normalization
happens?  By newline do you mean the character at Unicode code point 10?

Remember that the spaces inside the desc element in:
    <desc>his socks were <em>very</em> <pattern>argyle</pattern>.</desc>
are all important, including the one between </em> and <pattern>.

For error reporting, line counting depends on the platform, and
should probably correspond to using a native text editor on that
platform -- as that's what users will have to use when they
get an error.


Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS