XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XML start tags are wicked complicated

Wicked complicated? Seems a low bar, if anything that is not minimally simple is regarded as complex: fallacy of the excluded middle. And, of course, the simple.st way of expressing a grammar or regular expression may not always be the best (for performance, for tools,for human explanation) 

The more you try to do in a single  production, the more that you need either a more expressive grammar or more complex rules.  So if you treat XML tags as two-levels, the first says whitespace delimits tokens, and the second forms tokens into tagnames and attributes with no consideration for whitespace, each grammar is super "simple".

Rick

On Thu, 20 Jan. 2022, 10:58 Roger L Costello, <costello@mitre.org> wrote:
Hi Folks,

XML start tags have a simple structure, right?

Wrong!

Here are some of the permutations of a start tag:

'<' tag-name '>'
'<' tag-name "/>"
'<' tag-name WSP '>'
'<' tag-name WSP "/>"
'<' tag-name WSP attribute-name '=' "value" '>'
'<' tag-name WSP attribute-name WSP '=' "value" '>'
'<' tag-name WSP attribute-name '=' WSP "value" '>'
'<' tag-name WSP attribute-name WSP '=' WSP "value" '>'
'<' tag-name WSP attribute-name WSP '=' WSP "value" WSP '>'
... a lot more ...

Now, let's play parser: We are scanning and encounter these items
...    '<'
   ...     tag-name
      ...      WSP
         ...       attribute/value pair
            ...       WSP 

Trouble!

What does the WSP (WSP = whitespace) signify? Does it signify:

(a) Space between the first attribute and a second attribute? E.g. WSP attribute-name '=' "value"
(b) Space just prior to the end angle bracket? I.e., WSP '>'

The only way to know the answer is to lookahead beyond the WSP to see what token comes next. But a two-token lookahead requires a more powerful parser than a one-token lookahead parser.

So the next time someone tells you that the structure of an XML start tag is simple, tell 'em it ain't so!

/Roger

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS