xml-dev - Re: [xml-dev] Internal entities removed from XML?

Re: [xml-dev] Internal entities removed from XML?

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Internal entities removed from XML?
From: "Rick Jelliffe" <ricko@allette.com.au>
Date: Fri, 20 Dec 2002 17:44:57 +1100
References: <Pine.LNX.4.44L0.0212192051350.14684-100000@smtp.datapower.com>

From: "Rich Salz" <rsalz@datapower.com>
 
> > Well, assuming SAX-style parsing that is: just deliver entity expansions
> > as a separate characters() callback ... no copies or writes needed at
> > all.
> 
> The intent was to show in-place expansion can be way efficient.

Here is a version of Rich's C code that is exactly the same speed-efficiency if there
are no entity references, and no less space-efficient if there are entity
references. If we find a non-built-in reference, we replace the 
& delimiter with the Unicode Object Replacement character.

Afterwards,  "&" in text is just a regular character and U+FFFC means 
the delimiter "entity reference open". 

Entity expansion would happen lazily, by deferencing the name
when it is needed: no tree structures actually are built. We defer
merging buffers until later: if "later" is a stream, then we never incur
a space-cost of merging buffers or building trees.  (If you are not using 
wchar_t,  but say UTF-8 then you would substitute use 0x1A or some 
appropriate unused control point such as a flow control character. )

int  expand_entities_in_text_node(char* buff, int size)
{
     wchar_t *start, *src;
     for (start = src = buff; --size >= 0; )
     {
         if ((*buff++ = *src++) == '&')
         {
             if (size >= 3
             && src[0] == 'l' && src[1] == 't' && src[2] == ';')
                 buff[-1] = '<', src += 3, size -= 2;
             else if (size >= 4
                  && src[0] == 'a' && src[1] == 'm' && src[2] == 'p'
                  && src[3] == ';')
                 src += 4, size -= 3;
            else buff[-1] = 0xFFFC;  /* flag this as an entity reference */
         }
     }
     return buff - src;
}

(As Tim mentioned, for real code we would also need to cope with the
other builtin references and numeric character references, and there
is no error-handling either. )


Cheers
Rick Jelliffe

Follow-Ups:
- Re: [xml-dev] Internal entities removed from XML?
  - From: Rich Salz <rsalz@datapower.com>

References:
- Re: [xml-dev] Internal entities removed from XML?
  - From: Rich Salz <rsalz@datapower.com>

Prev by Date: RE: [xml-dev] Internal entities removed from XML?
Next by Date: Re: [xml-dev] Internal entities removed from XML?
Previous by thread: Re: [xml-dev] Internal entities removed from XML?
Next by thread: Re: [xml-dev] Internal entities removed from XML?
Index(es):
- Date
- Thread