Hi Folks,
XML start tags have a simple structure, right?
Wrong!
Here are some of the permutations of a start tag:
'<' tag-name '>'
'<' tag-name "/>"
'<' tag-name WSP '>'
'<' tag-name WSP "/>"
'<' tag-name WSP attribute-name '=' "value" '>'
'<' tag-name WSP attribute-name WSP '=' "value" '>'
'<' tag-name WSP attribute-name '=' WSP "value" '>'
'<' tag-name WSP attribute-name WSP '=' WSP "value" '>'
'<' tag-name WSP attribute-name WSP '=' WSP "value" WSP '>'
... a lot more ...
Now, let's play parser: We are scanning and encounter these items
... '<'
... tag-name
... WSP
... attribute/value pair
... WSP
Trouble!
What does the WSP (WSP = whitespace) signify? Does it signify:
(a) Space between the first attribute and a second attribute? E.g. WSP attribute-name '=' "value"
(b) Space just prior to the end angle bracket? I.e., WSP '>'
The only way to know the answer is to lookahead beyond the WSP to see what token comes next. But a two-token lookahead requires a more powerful parser than a one-token lookahead parser.
So the next time someone tells you that the structure of an XML start tag is simple, tell 'em it ain't so!
/Roger
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php