OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Character classification

[ Lists Home | Date Index | Thread Index ]
  • From: Chris Olds <colds@nwlink.com>
  • To: xml-dev@ic.ac.uk
  • Date: Wed, 03 Sep 1997 16:54:54 -0700

How are people dealing with UTF-8 vs. unicode vs. Latin-1?  I have been
working on a lexer (using Flex) that assumes the input stream is either
Latin-1 or UTF-8 and returns byte strings to the caller.  Since Java
chars are Unicode, I assume that the Java XML parsers are doing the
opposite, right?  Is there any consensus on what form PCDATA or GI names
should take when they are returned to the application?  On a related
note, when do character entities get replaced - in the lexer or later
on?  My reading of the draft is that the scanner must do the replacement
if the examples of rescanning are to work.


Chris Olds	colds@nwlink.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS