[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] newline/form feed valid as attribute value?
- From: Dan Shelton <dan.f.shelton@gmail.com>
- To: Michael Kay <mike@saxonica.com>
- Date: Mon, 2 Jul 2012 22:43:43 +0200
On 2 July 2012 22:17, Michael Kay <mike@saxonica.com> wrote:
> It's theoretically impossible to write an XML parser using regular
> expressions alone, because XML is not a regular language.
So what's wrong with the following regex pattern? It was passed around
by Roland Mainz in David Korn's ksh93 mailing list a few weeks ago and
is used as a *core* (there's more prep and postprocess code, but the
parsing alone is done by repeatedly applying the regex to a character
stream) for a xml fragment parser (brackets not postfixed with ?:
capture data and are stored in the 2D array .sh.match):
---------------
dummy="${xmltext//~(Ex-p)(?:
(<!--.+-->)+?| # xml comments
(<[:_[:alnum:]-]+
(?: # attributes
[[:space:]]+
(?: # four different types of name=value syntax
(?:[:_[:alnum:]-]+=[^\"\'[:space:]]+?)| #x='foo=bar huz=123'
(?:[:_[:alnum:]-]+=\"[^\"]*?\")| #x='foo="ba=r o" huz=123'
(?:[:_[:alnum:]-]+=\'[^\']*?\')| #x="foox huz=123"
(?:[:_[:alnum:]-]+) #x="foox huz=123"
)
)*
[[:space:]]*
\/? # start tags which are end tags, too (like <foo\/>)
>)+?| # xml start tags
(<\/[:_[:alnum:]-]+>)+?| # xml end tags
([^><]+) # xml text
)/D}"
---------------
> Form feed and vertical tab are valid characters in XML 1.1 attributes, but
> they must be written as numeric character references. They are not allwoed
> in XML 1.0.
OK
>
> Newline characters (xA) may appear in XML 1.0 attributes and the parser is
> required to normalize them to space characters, unless they are written as
> numeric character references.
>
> Michael Kay
> Saxonica
>
>
>
>
> On 02/07/2012 19:48, Dan Shelton wrote:
>>
>> Does anyone here know whether newlines, form feed or vertical feed are
>> valid within an attribute value (I hope not, otherwise it would make
>> it a lot harder to write a fast xml parser using regex alone)?
>>
>> ---------- Forwarded message ----------
>> From: Dan Shelton<dan.f.shelton@gmail.com>
>> Date: 1 July 2012 17:35
>> Subject: Re: newline/form feed valid as attribute value?
>> To: xml@gnome.org
>>
>>
>> On 1 July 2012 17:06, Dan Shelton<dan.f.shelton@gmail.com> wrote:
>>>
>>> In which context are newlines valid values, i.e. is a newline
>>> character or a form feed a valid character in an attribute value?
>>
>> Clarifying the question:
>> Is this a valid xml fragment:
>> -----cut-----
>> <mytag myattribute="hello
>> world">sometext</mytag>
>> -----cut-----
>> Or as UNIX(TM) shell code:
>> /usr/bin/printf '/<mytag myattribute="hello \n world">sometext</mytag>'
>>
>> Same with form feed:
>> /usr/bin/printf '/<mytag myattribute="hello \f world">sometext</mytag>'
>> Is this valid?
>>
>> Or with vertical feed:
>> /usr/bin/printf '/<mytag myattribute="hello \v world">sometext</mytag>'
>> Is this valid?
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]