[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
XML start tags are wicked complicated
- From: Roger L Costello <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Wed, 19 Jan 2022 23:58:37 +0000
Hi Folks,
XML start tags have a simple structure, right?
Wrong!
Here are some of the permutations of a start tag:
'<' tag-name '>'
'<' tag-name "/>"
'<' tag-name WSP '>'
'<' tag-name WSP "/>"
'<' tag-name WSP attribute-name '=' "value" '>'
'<' tag-name WSP attribute-name WSP '=' "value" '>'
'<' tag-name WSP attribute-name '=' WSP "value" '>'
'<' tag-name WSP attribute-name WSP '=' WSP "value" '>'
'<' tag-name WSP attribute-name WSP '=' WSP "value" WSP '>'
... a lot more ...
Now, let's play parser: We are scanning and encounter these items
... '<'
... tag-name
... WSP
... attribute/value pair
... WSP
Trouble!
What does the WSP (WSP = whitespace) signify? Does it signify:
(a) Space between the first attribute and a second attribute? E.g. WSP attribute-name '=' "value"
(b) Space just prior to the end angle bracket? I.e., WSP '>'
The only way to know the answer is to lookahead beyond the WSP to see what token comes next. But a two-token lookahead requires a more powerful parser than a one-token lookahead parser.
So the next time someone tells you that the structure of an XML start tag is simple, tell 'em it ain't so!
/Roger
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]