XML start tags are wicked complicated

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Roger L Costello <costello@mitre.org>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Wed, 19 Jan 2022 23:58:37 +0000

Hi Folks,

XML start tags have a simple structure, right?

Wrong!

Here are some of the permutations of a start tag:

'<' tag-name '>'
'<' tag-name "/>"
'<' tag-name WSP '>'
'<' tag-name WSP "/>"
'<' tag-name WSP attribute-name '=' "value" '>'
'<' tag-name WSP attribute-name WSP '=' "value" '>'
'<' tag-name WSP attribute-name '=' WSP "value" '>'
'<' tag-name WSP attribute-name WSP '=' WSP "value" '>'
'<' tag-name WSP attribute-name WSP '=' WSP "value" WSP '>'
... a lot more ...

Now, let's play parser: We are scanning and encounter these items
...    '<' 
   ...     tag-name 
      ...      WSP 
         ...       attribute/value pair 
            ...       WSP  

Trouble!

What does the WSP (WSP = whitespace) signify? Does it signify:

(a) Space between the first attribute and a second attribute? E.g. WSP attribute-name '=' "value"
(b) Space just prior to the end angle bracket? I.e., WSP '>'

The only way to know the answer is to lookahead beyond the WSP to see what token comes next. But a two-token lookahead requires a more powerful parser than a one-token lookahead parser.

So the next time someone tells you that the structure of an XML start tag is simple, tell 'em it ain't so!

/Roger

Follow-Ups:
- Re: [xml-dev] XML start tags are wicked complicated
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- Re: [xml-dev] XML start tags are wicked complicated
  - From: Michael Kay <mike@saxonica.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]