[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Here's how to process XML documents written in German
- From: Hermann Stamm-Wilbrandt <STAMMW@de.ibm.com>
- To: Chris Maloney <voldrani@gmail.com>
- Date: Thu, 31 Jan 2013 15:24:37 +0100
Hi Chris,
> The real lesson here is that you should never make contracts with
> Germans! Problem solved.
> ;-)
>
this asks for a response by a German ;-)
Please look into this BMP table (from [1]):
http://stamm-wilbrandt.de/en/blog/BMP.xsl.html
There are a LOT more Korean, Chinese, Japanese, ... characters
than the only few German specials.
If this (Japanese) XML sample does not show correctly, see [2]:
$ xsltproc identity.xsl interesting.xml
<?xml version="1.0"?>
<面白い>素子</面白い>
$
[1]
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/entry/bmp_xsl_html_basic_multilingual_plane20
[2] http://stamm-wilbrandt.de/en/xsl-list/interesting.xml
Mit besten Gruessen / Best wishes,
Hermann Stamm-Wilbrandt
Level 3 support for XML Compiler team and Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/HermannSW/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
|------------>
| From: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Chris Maloney <voldrani@gmail.com> |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Tony Graham <tgraham@mentea.net>, |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|"xml-dev@lists.xml.org" <xml-dev@lists.xml.org> |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|01/30/2013 10:55 PM |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Re: [xml-dev] Here's how to process XML documents written in German |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
The real lesson here is that you should never make contracts with
Germans! Problem solved.
;-)
On Wed, Jan 30, 2013 at 4:31 PM, Tony Graham <tgraham@mentea.net> wrote:
> On Wed, January 30, 2013 6:47 pm, Costello, Roger L. wrote:
> ...
>> This XPath expression does the job:
>>
>> sum(//Posten[@*[normalize-unicode(name(.)) eq
>> normalize-unicode('währung')][. eq 'EUR']])
>>
>> The normalize-unicode() function converts an attribute name into a
>> standard, canonical form.
>>
>> Lesson Learned:
>>
>> When processing markup with diacritical marks, beware that two
characters
>> may visually appear the same but inside the computer they are
represented
>> very differently. Design XPath expressions accordingly -- use
>> normalize-unicode() to convert markup into canonical form.
>
> The truism "validate at trust boundaries" comes to mind: if you can't
> trust the encoding or normalization form of the XML that you receive,
then
> normalise it as soon as you receive it so all of your XML is consistent
> and you don't have to make your XPaths unreadable.
>
> Your example is much like the example in Section 3.1.1, "Why do we need
> character normalization?" [1] of "Character Model for the World Wide Web
> 1.0: Normalization". That document discusses the advantages of early or
> late normalization as well as more aspects of normalization that most of
> us could think of on our own. Unfortunately its recommendations are in
> flux (and have been since May last year), but your scenario would best be
> handled by 'late normalization' where you normalize the data after it's
> transmitted to you.
>
> Regards,
>
>
> Tony Graham tgraham@mentea.net
> Consultant http://www.mentea.net
> Mentea 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> XML, XSL-FO and XSLT consulting, training and programming
>
> [1] http://www.w3.org/TR/charmod-norm/#sec-WhyNormalization
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]