OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: How to specify a Processing Instruction? (better: how to cont rolencoding on saving)

> Well, this is not what I read.  I read that the encoding part 
> of the processing instruction is something that will be used 
> eventually by the server I'm sending it to.  

The XML declaration is not a Processing Instruction, it only resembles a Processing Instruction.

> Right now, I'm 
> just creating the document from scratch so there really is 
> "no" encoding.  I want to set the encoding so that the 
> eventually server will be able to understand the XML 
> document....it expects ISO-8859-1.  This seems a bit like a 
> chicken and the egg....

XML processors are required by the spec to support UTF-8 and UTF-16, support for all other encodings (including ISO-8859-1) is optional.  If your "server" only recognizes ISO-8859-1, then it is non
conformant with the XML 1.0 specification.  Forcing the encoding to be ISO-8859-1 will cause the serialization to fail if there are any characters with Unicode code points > 255 since those cannot be
represented by ISO-8859-1.

Setting the encoding (if possible) is done as the DOM tree is serialized (that is written out a string of bytes).

> I basically want to start with an xml document that has the 
> basic "framework" and then I want to add and change some of 
> the nodes so that they have want the server expects and then 
> I want to send this off.  If I try to read the document in 
> completely blank as only: <?xml version="1.0" 
> encoding="ISO-8859-1"?> It still gives me an error saying it 
> can't read it in.  Of course, at that point, its not only not 
> encoded, there's nothing there at all.

It is not a XML document since an XML document requires one and only one document element.

<?xml version="1.0" encoding="ISO-8859-1"?>

Is an XML document

> This whole issue of how encoding in XML works is the weakest 
> thing.....I went to Borders books yesterday and I looked at 
> every single book on the shelf - a total of more than 
> 40-50....it took me hours and there wasn't more than a page 
> on encoding in any one book.  
> And, even then, it simply 
> explained what encoding was and nothing practical.  How do 
> you create a document from scratch with a particular 
> encoding?  How do you change the encoding of an existing 
> document?  I can't really find any decent documentation on 
> this anywhere.  To top it all off its the heart and soul of 
> how XML works....without it, XML is nothing but a dream.  I 
> could be able to read in an XML document and then there 
> should be a basic MSXML method that will allow me to convert 
> the document or a single node from one encoding style to 
> another....

Again encoding is a property of an XML document when written out as a stream of bytes, it has no meaning while in a DOM tree.  MSXML's unusual use of a ProcessingInstruction node to represent the XML
declaration only describes the former state of the document, at one time it was encoding using whatever.

> it doesn't exist....how can that not exist?  Is 
> there some other method in VB that lets me change from 
> encoding to another?
> I'm sure the problem is that I really don't understand 
> something very basic but I admit it --- it has to be that 
> way, otherwise, there are some major fundamental operations 
> that seem to be critically missing on how to manipulate XML 
> documents for the different encoding styles of the world.

The XML recommendation addressed this by basing XML on Unicode and stating the only required encodings are UTF-8 and UTF-16.  Use of any other encodings is allowed but not required, so if you want
your documents to universally readable, you will encode them in either UTF-8 or UTF-16.

The DOM 3 effort is formulating a specification for DOM saving (http://www.w3.org/TR/DOM-Level-3-ASLS/), in its current draft, it does allow you to specify the encoding to be used, but as an property
of the OutputStream not as anything in the document.

MSXML appears to attempt to preserve the encoding used on load, but will use UTF-8 if the tree was built from scratch.