Lists Home |
Date Index |
- From: MURATA Makoto <firstname.lastname@example.org>
- To: DML Development List <email@example.com>
- Date: Mon, 03 Aug 1998 11:15:19 +0900
On Sun, 2 Aug 1998, Peter Murray-Rust wrote:
> XML (unlike HTML) does not normalise character content
> and all characters that are not markup are passed to the application.
> Ignorable whitespace is a device that SAX provides to help the application
> decide what action it may be able to take. If you are writing a SAX-based
> application you will need to understand this concept.
I think that CR, LF, or CR+LF are always normalized into LF.
Eric Prud'hommeaux wrote:
> In that regard, it would seem that text is handled differently from
> system identifiers and attribute values.
As for attribute values, we do have different normalization. As
for systems identifiers, I do not understand your point.
> How about leading and trailing whitespace, or tags with just
> whitespace? For example, is "<tag>some text\r\n\t</tag>" reported
> completely as characters and not split into characters("some text")
> and ignorable("\r\n\t")?
Right. They are not split.
>Is the whitespace in "<t1>\n <t2/>\n</t1>"
If 1) the DTD is available, 2) the element type t1 has an element content, and
3) an XML processor uses the DTD to distinguish element content and mixed content,
then the whitespace in <t1> is ignorable.
>I also assume from the XML spec that SAX is acting in the
> role of XML processor and must translate \r's so it would really be
> characters(" some text\n\t").
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)