XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Why the double escape for lt ? (that is <!ENTITY lt"&#38;#60;"> )

Ken,

That’s a good observation, and I like your example as a “my head hurts” case. :-)

However, I’m not sure I understand your point about the escaping of the %’s driving this bit of the architecture. As I see it, this doesn’t drive the necessity for the double escaping.  If the double escaping didn’t exist, I wouldn’t be blocked from keeping the string from being expanded into an entity reference, since I could just use a single escape.  To make sure we’re on the same page, though, I offer this little example:

example.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "example.dtd">
<foo>
&refEntity;
&refEnityWithCharRef;
&refEntityWithEscapedCharRef;
</foo>


 example.dtd
<!-- Define the parameter entry that will be referenced in embedded uses -->
<!ENTITY % anEntity "value">

<!-- do an ordinary parameter entity reference of the same -->
<!ENTITY % refEntityPE "<!ENTITY refEntity  '%anEntity;'>"> %refEntityPE;

<!-- use a character reference to escape the % -->
<!ENTITY % refEnityWithCharRefPE "<!ENTITY refEnityWithCharRef  '&#37;anEntity;'>"> %refEnityWithCharRefPE;

<!-- use an escaped character reference to escape the % -->
<!ENTITY % refEntityWithEscapedCharRefPE "<!ENTITY refEntityWithEscapedCharRef  '&#38;#37;anEntity;'>"> %refEntityWithEscapedCharRefPE;


The output is:
~/> xmllint --noent --loaddtd example.xml 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "example.dtd">
<foo>
value
value
%anEntity;
</foo>

Is this what you are thinking?

If so, it seems to me that if the double escaping didn’t exist, then refEnityWithCharRefPE would be sufficient to get the string “%anEntity;”.

I do see the double escaping allows things like this
<!ENTITY % refEntityWithEmbeddedCharRefPE "<!ENTITY refEntityWithEmbeddedCharRef  '&#37;&#97;nEntity;'>">
that is, being able to construct an entity name using other entities and char refs.  But even without the double escaping that’s still doable, it just requires another entity.

Thank you for contributing to the discussion of this puzzling bit of the standard and helping me think through weird cases more.

david




On Nov 4, 2016, at 5:52 PM, G. Ken Holman <gkholman@CraneSoftwrights.com> wrote:

At 2016-11-04 17:19 -0700, David John Burrowes wrote:
Your comment was useful, and helped pull my mind out of the rut it was in. Still, it doesn’t really seem to offer a definitive reason why internal and external entities are processed differently.

Because the replacement text of an internal entity is determined after parsing that entity for any references to parameter entities that need to be expanded as part of the replacement text.  Since the content is being parsed for the parameter entities, the parsing process resolves the character entities (so as not to inadvertently reference an undesired parameter entity).  The end result of that parsing step becomes the replacement text.

The replacement text of an external entity is manifest in what is in the file and so does not need to be parsed for references to parameter entities.  Without the need to parse the content, the content can be used as is as the replacement text.

The replacement text is then parsed in the context of where it is placed in the stream by the entity reference.

In my XML work for my training material I compose numerous entities using the values of multiple parameter entities.  It was a practice I followed in the SGML days.  It ain't pretty but it does what I need it to do.  Here is an excerpt:

<!ENTITY % imgdir "images/">
<!ENTITY % b "bmp"><!--bitmap extension "gif" or "bmp"-->
<!ENTITY % v "wmf"><!--vector extension "cgm" or "wmf"-->
<!ENTITY % areas    "<!ENTITY areas    SYSTEM '%imgdir;areas.%v;'     NDATA %v;>">%areas;
<!ENTITY % axes     "<!ENTITY axes     SYSTEM '%imgdir;axes.%v;'      NDATA %v;>">%axes;
<!ENTITY % book1    "<!ENTITY book1    SYSTEM '%imgdir;book1.%b;'     NDATA %b;>">%book1;
<!ENTITY % bookalt  "<!ENTITY bookalt  SYSTEM '%imgdir;bookalt.%b;'   NDATA %b;>">%bookalt;

So ... in the general entity replacement for "&areas;" I have the sequence "%imgdir;" which gets expanded because it is a parameter entity reference.  But what if I wanted the string "%imgdir;" instead of the parameter entity reference?  I need to escape the "%", so I need to use &#x25; or &#37; in order to encode the "%" to be a simple "%" and not the reference.  So, internal entity replacement text processing needs to do numeric character reference processing which is part and parcel of entity processing.

Therefore, if you want an entity reference injected into your stream, one found in an external entity is coded as you would think, but one found in an internal entity has to be the result of entity reference processing, thus requiring the double escaping.  You have to compose the replacement string you then want processed.

There is a reason.  A bit arcane, but it wasn't done frivolously.

I hope this helps.

. . . . . . . Ken


--
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK |
Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/x/ |
G Ken Holman _ _ _ _ _ _ _ _ _ _ mailto:gkholman@CraneSoftwrights.com |
Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts |
Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS