[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: How to specify a Processing Instruction? (better: how to controlencoding on saving)
- From: "Arnold, Curt" <Curt.Arnold@hyprotech.com>
- To: "'xml-dev@lists.xml.org'" <xml-dev@lists.xml.org>
- Date: Wed, 29 Aug 2001 13:36:13 -0600
> > The XML declaration is not a Processing Instruction, it
> only resembles
> > a
> Processing Instruction.
> >
> Microsoft calls it the PI and has a specific method with
> these words...so I was just using what they seem to indicate
> in their documentation.
Treating the XML Declaration as a Processing Instruction in the DOM is a MSXML'ism. Not expressly forbidden in the DOM spec, but definitely stretching the intention. An XML Declaration and
Processing Instructions are distinct constructs in the XML specification. The XML declaration can only appear at the very start of an XML document, Processing Instructions can appear anywhere in the
document are expressly forbidden from using "xml" as their target (the part after the <?)
>
> > It is not a XML document since an XML document requires one
> and only
> > one
> document element.
> >
> > <?xml version="1.0" encoding="ISO-8859-1"?>
> > <foo/>
> >
> > Is an XML document
>
> Well it still won't read it in. It will only read it in if
> the encoding string is not present. I assume that means its
> not in ISO-8859-1.
It is ISO-8859-1 and also US-ASCII and UTF-8, however it would cease to be all three if there were any code points > 127.
It could mean that your parser does not recognize ISO-8859-1 as a known encoding. The only mandated encodings are UTF-8 and UTF-16. Leaving off "ISO-8859-1" will cause the parser to interpret it as
UTF-8.
> I think I might be starting to understand
> it. If I do, then I think what everyone is saying is that I
> cannot start with an XML template, rather, I have to create
> the XML document completely over again, with the ISO-8859-1
> encoding standard from the start and then hopefully the UPS
> text encoding will stay preserved when the node is added.
I don't think anyone was saying that.
> So, I can't take an existing DOM in whatever originally
> encoding it was, even UTF-8, modify it and save it in a
> differnet encoded byte stream?
Sure you can, MSXML just doesn't provide a method to do it for you.
> > The XML recommendation addressed this by basing XML on Unicode and
> > stating
> the only required encodings are UTF-8 and UTF-16. Use of any
> other encodings is allowed but not required, so if you want
> > your documents to universally readable, you will encode
> them in either
> UTF-8 or UTF-16.
>
> Unfortunely, UPS requires the XML document be in a specific
> format with a certain text element being very specific. I'm
> not in control of their parser or their requirements.
If UPS really requires that all XML documents be encoded in ISO-8859-1, the you could take the string returned from MSXML document.xml function, drop any XML declaration in it, write your own <?xml
version="1.0" encoding="ISO-8859-1"?> to your output file and walk through the string, if a characters value is <= 255 write it directly to the file and if it is greater than 255 write &#xHHHH; or
&#DDDDD where HHHH is a hex representation of the Unicode code point or DDDDD is a decimal representation.
Questions involving the Microsoft XML parser in particular are probably best asked on a microsoft.public.xml instead of xml-dev.