OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: Character classification

[ Lists Home | Date Index | Thread Index ]
  • From: Istvan Cseri <istvanc@microsoft.com>
  • To: xml-dev@ic.ac.uk, 'Chris Olds' <colds@nwlink.com>
  • Date: Thu, 4 Sep 1997 07:50:21 -0700

The Java parser is using Java char-s and Strings for storage so it is
using Unicode. The GI-s are actually 'atomized' for memory savings and
returned that way. PCDATA is stored in String chunks. The entities are
preserved in special nodes but can be made transparent to the reader
(user) of the parsed tree.

Istvan

> ----------
> From: 	Chris Olds[SMTP:colds@nwlink.com]
> Reply To: 	Chris Olds
> Sent: 	Wednesday, September 03, 1997 4:54 PM
> To: 	xml-dev@ic.ac.uk
> Cc: 	'Tim Bray'; Istvan Cseri
> Subject: 	Re: Character classification
> 
> How are people dealing with UTF-8 vs. unicode vs. Latin-1?  I have
> been
> working on a lexer (using Flex) that assumes the input stream is
> either
> Latin-1 or UTF-8 and returns byte strings to the caller.  Since Java
> chars are Unicode, I assume that the Java XML parsers are doing the
> opposite, right?  Is there any consensus on what form PCDATA or GI
> names
> should take when they are returned to the application?  On a related
> note, when do character entities get replaced - in the lexer or later
> on?  My reading of the draft is that the scanner must do the
> replacement
> if the examples of rescanning are to work.
> 
> 	/cco
> 
> Chris Olds	colds@nwlink.com
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS