[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] unicode characters within XML documents
- From: Michael Kay <mike@saxonica.com>
- To: Mukul Gandhi <mukulg@softwarebytes.org>
- Date: Sun, 16 Jan 2022 17:58:08 +0000
My guess would be that Microsoft chose an ASCII encoding for this file rather than a UTF-8 encoding because, at the time, CVS repositories could be very temperamental about file encodings.
End user applications, provided they use a real XML parser, are going to see exactly the same data is if it had all been encoded in UTF-8.
Michael Kay
Saxonica
> On 16 Jan 2022, at 07:03, Mukul Gandhi <mukulg@softwarebytes.org> wrote:
>
> Hi all,
> I came across, following XML instance document, provided with w3c xml schema test suite,
>
> <doc value="؀؁؂؃؄؅؆؇؈؉؊؋،؍؎؏ؘؙؚؐؑؒؓؔؕؖؗ؛؜؝؞؟ؠءآأؤإئابةتثجحخدذرزسشصضطظعغػؼؽؾؿـفقكلمنهوىيًٌٍَُِّْٕٖٜٟٓٔٗ٘ٙٚٛٝٞ٠١٢٣٤٥٦٧٨٩٪٫٬٭ٮٯٰٱٲٳٴٵٶٷٸٹٺٻټٽپٿڀځڂڃڄڅچڇڈډڊڋڌڍڎڏڐڑڒړڔڕږڗژڙښڛڜڝڞڟڠڡڢڣڤڥڦڧڨکڪګڬڭڮگڰڱڲڳڴڵڶڷڸڹںڻڼڽھڿۀہۂۃۄۅۆۇۈۉۊۋیۍێۏېۑےۓ۔ەۖۗۘۙۚۛۜ۝۞ۣ۟۠ۡۢۤۥۦۧۨ۩۪ۭ۫۬ۮۯ۰۱۲۳۴۵۶۷۸۹ۺۻۼ۽۾ۿ"/>
>
> Within the above mentioned, XML document, the text content of attribute "value" are arabic characters (specified with their unicode code points). I guess, specifying unicode characters with notation &#x.... (as with the example cited above), is a preferred way to mention and transport the related XML documents across software application systems.
>
> My questions please,
> What would, end user applications do with such XML documents? I guess, most likely they'll render them within a UI (then relevant fonts would also be needed) or, get/extract text contents from the XML documents for specific computations (like string comparison, etc). Am I right, on these points?
>
> Any thoughts, on this topic would be great.
>
>
> --
> Regards,
> Mukul Gandhi
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]