XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] newline/form feed valid as attribute value?

On 2 July 2012 22:17, Michael Kay <mike@saxonica.com> wrote:
> It's theoretically impossible to write an XML parser using regular
> expressions alone, because XML is not a regular language.

So what's wrong with the following regex pattern? It was passed around
by Roland Mainz in David Korn's ksh93 mailing list a few weeks ago and
is used as a *core* (there's more prep and postprocess code, but the
parsing alone is done by repeatedly applying the regex to a character
stream) for a xml fragment parser (brackets not postfixed with ?:
capture data and are stored in the 2D array .sh.match):
---------------
dummy="${xmltext//~(Ex-p)(?:
	(<!--.+-->)+?|	# xml comments
	(<[:_[:alnum:]-]+
		(?: # attributes
			[[:space:]]+
			(?: # four different types of name=value syntax
				(?:[:_[:alnum:]-]+=[^\"\'[:space:]]+?)|	#x='foo=bar huz=123'
				(?:[:_[:alnum:]-]+=\"[^\"]*?\")|		#x='foo="ba=r o" huz=123'
				(?:[:_[:alnum:]-]+=\'[^\']*?\')|		#x="foox huz=123"
				(?:[:_[:alnum:]-]+)				#x="foox huz=123"
			)
		)*
		[[:space:]]*
		\/?	# start tags which are end tags, too (like <foo\/>)
	>)+?|				# xml start tags
	(<\/[:_[:alnum:]-]+>)+?|	# xml end tags
	([^><]+)			# xml text
	)/D}"
---------------

> Form feed and vertical tab are valid characters in XML 1.1 attributes, but
> they must be written as numeric character references. They are not allwoed
> in XML 1.0.

OK

>
> Newline characters (xA) may appear in XML 1.0 attributes and the parser is
> required to normalize them to space characters, unless they are written as
> numeric character references.
>
> Michael Kay
> Saxonica
>
>
>
>
> On 02/07/2012 19:48, Dan Shelton wrote:
>>
>> Does anyone here know whether newlines, form feed or vertical feed are
>> valid within an attribute value (I hope not, otherwise it would make
>> it a lot harder to write a fast xml parser using regex alone)?
>>
>> ---------- Forwarded message ----------
>> From: Dan Shelton<dan.f.shelton@gmail.com>
>> Date: 1 July 2012 17:35
>> Subject: Re: newline/form feed valid as attribute value?
>> To: xml@gnome.org
>>
>>
>> On 1 July 2012 17:06, Dan Shelton<dan.f.shelton@gmail.com>  wrote:
>>>
>>> In which context are newlines valid values, i.e. is a newline
>>> character or a form feed a valid character in an attribute value?
>>
>> Clarifying the question:
>> Is this a valid xml fragment:
>> -----cut-----
>> <mytag myattribute="hello
>>   world">sometext</mytag>
>> -----cut-----
>> Or as UNIX(TM) shell code:
>> /usr/bin/printf '/<mytag myattribute="hello \n world">sometext</mytag>'
>>
>> Same with form feed:
>> /usr/bin/printf '/<mytag myattribute="hello \f world">sometext</mytag>'
>> Is this valid?
>>
>> Or with vertical feed:
>> /usr/bin/printf '/<mytag myattribute="hello \v world">sometext</mytag>'
>> Is this valid?
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS