XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] A question for parsing experts: How to recognize that'<' denotes the beginning of a start tag?

On Tue, 2021-02-16 at 17:52 +0000, Roger L Costello wrote:
> 
> 
> In the scanning process, you encounter a less than ( '<' ) symbol
> You must determine if it denotes the beginning of a start tag.

Wellm i did badly on the last parsing question, let's see if i can do
badly here too :) again before coffee!

> 
> Let c = the character currently being examined.
> Let nextchar = the character following c
> 
> if c == '<' and nextchar != '/' and nexchar != '!' and nextchar !=
> '?' then we are at the beginning of a start tag
> 
> Do you agree? Am I missing any checks?

You need to apply the test in the right place - you're not ging to see
a start tag inside an attribute value or comment or CDATA section or in
the internal subset outside of an entity replacement value (< is
notallowed unescaped in system or public identifiers).

If you do encounter a < in those other contexts, the input is not well-
formed. In places (e.g. public identifiers) the grammar enforces this;
elsewhere (e.g. system identifiers) it's made explicit in the prose.

In entity replacement texts, you don't want to tokenize until the
entity is actually used.

Also, you only have a start-tag (as the spec calls them) if nextchar is
a name start character. For example, <
boy
>
is not allowed, but
<girl
>
is fine is as
<enby>

Liam



-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS