XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] unicode characters within XML documents

I used to see this all the time back in the SGML days, the popular Arbortext editor tended to store everything in ASCII, thus huge rifles full of character references, really puzzling if you opened them in anything but an SGML-native tool. 

On Sun, Jan 16, 2022 at 9:58 AM Michael Kay <mike@saxonica.com> wrote:
My guess would be that Microsoft chose an ASCII encoding for this file rather than a UTF-8 encoding because, at the time, CVS repositories could be very temperamental about file encodings.

End user applications, provided they use a real XML parser, are going to see exactly the same data is if it had all been encoded in UTF-8.

Michael Kay
Saxonica

> On 16 Jan 2022, at 07:03, Mukul Gandhi <mukulg@softwarebytes.org> wrote:
>
> Hi all,
>    I came across, following XML instance document, provided with w3c xml schema test suite,
>
> <doc value="&#x0600;&#x0601;&#x0602;&#x0603;&#x0604;&#x0605;&#x0606;&#x0607;&#x0608;&#x0609;&#x060A;&#x060B;&#x060C;&#x060D;&#x060E;&#x060F;&#x0610;&#x0611;&#x0612;&#x0613;&#x0614;&#x0615;&#x0616;&#x0617;&#x0618;&#x0619;&#x061A;&#x061B;&#x061C;&#x061D;&#x061E;&#x061F;&#x0620;&#x0621;&#x0622;&#x0623;&#x0624;&#x0625;&#x0626;&#x0627;&#x0628;&#x0629;&#x062A;&#x062B;&#x062C;&#x062D;&#x062E;&#x062F;&#x0630;&#x0631;&#x0632;&#x0633;&#x0634;&#x0635;&#x0636;&#x0637;&#x0638;&#x0639;&#x063A;&#x063B;&#x063C;&#x063D;&#x063E;&#x063F;&#x0640;&#x0641;&#x0642;&#x0643;&#x0644;&#x0645;&#x0646;&#x0647;&#x0648;&#x0649;&#x064A;&#x064B;&#x064C;&#x064D;&#x064E;&#x064F;&#x0650;&#x0651;&#x0652;&#x0653;&#x0654;&#x0655;&#x0656;&#x0657;&#x0658;&#x0659;&#x065A;&#x065B;&#x065C;&#x065D;&#x065E;&#x065F;&#x0660;&#x0661;&#x0662;&#x0663;&#x0664;&#x0665;&#x0666;&#x0667;&#x0668;&#x0669;&#x066A;&#x066B;&#x066C;&#x066D;&#x066E;&#x066F;&#x0670;&#x0671;&#x0672;&#x0673;&#x0674;&#x0675;&#x0676;&#x0677;&#x0678;&#x0679;&#x067A;&#x067B;&#x067C;&#x067D;&#x067E;&#x067F;&#x0680;&#x0681;&#x0682;&#x0683;&#x0684;&#x0685;&#x0686;&#x0687;&#x0688;&#x0689;&#x068A;&#x068B;&#x068C;&#x068D;&#x068E;&#x068F;&#x0690;&#x0691;&#x0692;&#x0693;&#x0694;&#x0695;&#x0696;&#x0697;&#x0698;&#x0699;&#x069A;&#x069B;&#x069C;&#x069D;&#x069E;&#x069F;&#x06A0;&#x06A1;&#x06A2;&#x06A3;&#x06A4;&#x06A5;&#x06A6;&#x06A7;&#x06A8;&#x06A9;&#x06AA;&#x06AB;&#x06AC;&#x06AD;&#x06AE;&#x06AF;&#x06B0;&#x06B1;&#x06B2;&#x06B3;&#x06B4;&#x06B5;&#x06B6;&#x06B7;&#x06B8;&#x06B9;&#x06BA;&#x06BB;&#x06BC;&#x06BD;&#x06BE;&#x06BF;&#x06C0;&#x06C1;&#x06C2;&#x06C3;&#x06C4;&#x06C5;&#x06C6;&#x06C7;&#x06C8;&#x06C9;&#x06CA;&#x06CB;&#x06CC;&#x06CD;&#x06CE;&#x06CF;&#x06D0;&#x06D1;&#x06D2;&#x06D3;&#x06D4;&#x06D5;&#x06D6;&#x06D7;&#x06D8;&#x06D9;&#x06DA;&#x06DB;&#x06DC;&#x06DD;&#x06DE;&#x06DF;&#x06E0;&#x06E1;&#x06E2;&#x06E3;&#x06E4;&#x06E5;&#x06E6;&#x06E7;&#x06E8;&#x06E9;&#x06EA;&#x06EB;&#x06EC;&#x06ED;&#x06EE;&#x06EF;&#x06F0;&#x06F1;&#x06F2;&#x06F3;&#x06F4;&#x06F5;&#x06F6;&#x06F7;&#x06F8;&#x06F9;&#x06FA;&#x06FB;&#x06FC;&#x06FD;&#x06FE;&#x06FF;"/>
>
> Within the above mentioned, XML document, the text content of attribute "value" are arabic characters (specified with their unicode code points). I guess, specifying unicode characters with notation &#x.... (as with the example cited above), is a preferred way to mention and transport the related XML documents across software application systems.
>
> My questions please,
> What would, end user applications do with such XML documents? I guess, most likely they'll render them within a UI (then relevant fonts would also be needed) or, get/extract text contents from the XML documents for specific computations (like string comparison, etc). Am I right, on these points?
>
> Any thoughts, on this topic would be great.
>
>
> --
> Regards,
> Mukul Gandhi


_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS